Various Tricks for XSLT

1Introduction

The idea is fairly simple I would say. xsltproc is available on majority of the popular platforms.

2Use XSLT to convert epub books to TeX

A common issue is that XHTML used in epub is with a namespace. So that namespace should be specified in stylesheet header. Then for example xhtml:html can be referenced.

  |<xsl:stylesheet version="1.0"
  |                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  |                xmlns:xhtml="http://www.w3.org/1999/xhtml">
  |  <xsl:output method="text"
5 |              indent="yes" omit-xml-declaration="yes"
  |              encoding = "UTF-8"/>
  |  <xsl:strip-space elements="*"/>

This snippet is what I use to handle ruby in EPUB files.

   |<xsl:template match="xhtml:ruby">
   |  <xsl:variable name="kanji">
   |    <xsl:for-each select="text()">
   |      <xsl:value-of select="."/>
 5 |    </xsl:for-each>
   |  </xsl:variable>
   |  <xsl:text>\ruby[</xsl:text>
   |  <xsl:choose>
   |    <xsl:when test="string-length($kanji) = 1">
10 |      <xsl:text>m</xsl:text>
   |    </xsl:when>
   |      <xsl:when test="count(xhtml:rt) = 1">
   |        <xsl:text>g</xsl:text>
   |      </xsl:when>
15 |      <xsl:otherwise><xsl:text>j</xsl:text></xsl:otherwise>
   |  </xsl:choose>
   |  <xsl:text>]{</xsl:text><xsl:value-of select="$kanji"/>
   |  <xsl:text>}{</xsl:text>
   |  <xsl:for-each select="xhtml:rt">
20 |    <xsl:value-of select="." />
   |    <xsl:if test="position() != last()"><xsl:text>|</xsl:text></xsl:if>
   |  </xsl:for-each>
   |  <xsl:text>}</xsl:text>
   |</xsl:template>

For example, <ruby>霊<rt>れい</rt>前<rt>ぜん</rt></ruby>, rendered as れいぜん, can be converted to \ruby[j]{霊前}{れい|ぜん}.

3Add ruby to DocBook

The DocBook RELAX NG grammar can be extended so HTML ruby is allowed as an inline text element.

   |# DocBook 5 with ruby tag
   |namespace db = "http://docbook.org/ns/docbook"
   |namespace h = "http://www.w3.org/1999/xhtml"
   |default namespace = "http://docbook.org/ns/docbook"
 5 |    
   |include "docbook5.rnc" {
   |       db.extension.inlines = h.ruby
   |}
   | 
10 |h.ruby = element h:ruby {
   |    db._any.attribute*,
   |    ((text | db._any | h.ruby), (h.ruby.rt | (h.ruby.rp, h.ruby.rt, h.ruby.rp))?)+
   |}
   |h.ruby.rp = element h:rp { (db._any.attribute | text)* }
15 |h.ruby.rt = element h:rt { (db._any.attribute | text)* }

This definition should cover much of the examples given by the HTML specification on ruby.

<h:ruby><h:ruby>東<h:rt>とう</h:rt>南<h:rt>なん</h:rt></h:ruby><h:rt>たつみ</h:rt></h:ruby>の方角

とうなんたつみの方角

Since ruby is prefixed by HTML namespace, no additional XSLT customization is needed.

4Validating

If you only have xmllint:

$ xmllint --noout --relaxng docbook-5.0.1/rng/docbook.rng doc.xml

To convert RELAX NG compact syntax to the XML syntax, you would need trang. I prefer writing RELAX NG in its compact syntax.

5DocBook xslTNG

I have recently switch to this version to have better code listing support. The setup is confusing, but still worth it and it follows the DocBook semantic better, with optimized CSS paged media, etc.

It is written in XSLT 3.0 so there are more available features, but that also means it requires Saxon. The JAR is self-contained so there are not much worries about portability.

6Setup Emacs for DocBook 5 and XSLT 3.0

The nxml-mode containted in Emacs has only DocBook 4 grammar installed. By downloading RELAX NG schema in compact syntax, and replace that to /etc/schema/docbook.rnc or edit schemas.xml in the same directory to enable editing support in Emacs.

The same can be done for update XSLT grammar to 3.0.

7Content MathML

These lines can apply the content markup to presentation markup of MathML in DocBook.

  |<xsl:import href="ctop.xsl"/>
  |<xsl:template match="mml:math" mode="m:docbook">
  |  <xsl:apply-templates select="." mode="c2p"/>
  |</xsl:template>

The ctop.xsl is available from David Carlisle's web stylesheets.

A few examples:

  |<math xmlns="http://www.w3.org/1998/Math/MathML">
  |  <apply><eq/>
  |  <apply><diff/>
  |    <bvar><ci>x</ci></bvar>
5 |    <apply><sin/><ci>x</ci></apply>
  |  </apply>
  |  <apply><cos/><ci>x</ci></apply>
  |</apply>
d sin x d x = cos x
Example 1Rendered by XSLT
d sin x d x = cos x
Example 2Rendered by Temml with \frac{d \sin x}{d x} = \cos x
   |<math xmlns="http://www.w3.org/1998/Math/MathML">
   |  <apply>
   |    <times/>
   |    <apply>
 5 |      <plus/>
   |      <cn>12</cn>
   |      <ci>x</ci>
   |    </apply>
   |    <apply>
10 |      <power/>
   |      <ci>y</ci>
   |      <cn>2</cn>
   |    </apply>
   |  </apply>
15 |</math>
( 12 + x ) y 2
Example 3Notice the parenthesis insertion

8MusicXML to PMX

PMX is the macro preprocessor for MusiXTeX, for some reason xml2pmx cannot parse MusicXML produced by Opusmodus, thus I need this stylesheet to do part of the conversion.

   |<?xml version="1.0" encoding="UTF-8"?>
   |<xsl:stylesheet version="1.0"
   |                xmlns:data="_local.uri"
   |                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 5 |  <xsl:output method="text"/>
   |  <data:typemap>
   |        <entry key="breve">9</entry>
   |        <entry key="whole">0</entry>
   |        <entry key="half">2</entry>
10 |        <entry key="quarter">4</entry>
   |        <entry key="eighth">8</entry>
   |        <entry key="16th">1</entry>
   |        <entry key="32nd">3</entry>
   |        <entry key="64th">6</entry>
15 |  </data:typemap>
   |  <xsl:template match="/">
   |    <xsl:apply-templates select="score-partwise/part"/>
   |  </xsl:template>
   |  <xsl:template match="part">
20 |    <xsl:for-each select="measure">
   |      <xsl:apply-templates select="."/>
   |    </xsl:for-each>
   |  </xsl:template>
   |  <xsl:template match="measure">
25 |    <xsl:text>% measure </xsl:text>
   |    <xsl:value-of select="@number" />
   |    <xsl:text>&#xA;</xsl:text>
   |    <xsl:for-each select="note">
   |      <xsl:apply-templates select="."/>
30 |    </xsl:for-each>
   |    <xsl:text>/&#xA;</xsl:text>
   |  </xsl:template>
   |  <xsl:template match="note">
   |    <xsl:variable name="_ntype" select="type"/>
35 |    <xsl:variable name="ntype" select="document('')/xsl:stylesheet/data:typemap/entry[@key=$_ntype]"/>
   |    <xsl:if test="tie[@type='start']">
   |      <xsl:text>( </xsl:text>
   |    </xsl:if>
   |    <xsl:choose>
40 |      <xsl:when test="rest">
   |        <xsl:text>r</xsl:text>
   |        <xsl:value-of select="$ntype"/>
   |      </xsl:when>
   |      <xsl:when test="chord">
45 |        <xsl:text>z</xsl:text>
   |        <xsl:value-of select='translate(pitch/step,"ABCDEFG", "abcdefg")'/>
   |        <xsl:value-of select="pitch/octave"/>
   |      </xsl:when>
   |      <xsl:otherwise>
50 |        <xsl:value-of select='translate(pitch/step,"ABCDEFG", "abcdefg")'/>
   |        <xsl:value-of select="$ntype"/>
   |        <xsl:value-of select="pitch/octave"/>
   |        <xsl:if test="accidental">
   |          <xsl:value-of select="substring(accidental, 1, 1)"/>
55 |        </xsl:if>
   |        <xsl:if test="dot">
   |          <xsl:text>d</xsl:text>
   |        </xsl:if>
   |      </xsl:otherwise>
60 |    </xsl:choose>
   |    <xsl:text> </xsl:text>
   |    <xsl:if test="tie[@type='stop']">
   |      <xsl:text>) </xsl:text>
   |    </xsl:if>
65 |  </xsl:template>
   |</xsl:stylesheet>

I'm working on convert the stylesheet to XSLT 3.0, so that defining maps would be easier.

9Validation

9.1Schematron

This is another XML schema language that is designed to be transformed by XSLT to become another XSLT stylesheet to analysis the target XML file. It can impose more complex constraint that RELAX NG cannot specify, with XPath expressions.

Note that although it is possible to embed Schematron into RELAX NG, it is still required to be extracted from the RELAX NG schema to be used for validation in most case.

9.2NVDL

This is supported by Jing, however the features of Jing are relatively undocumented. By default, it would strip out elements that are in a different namespace. Thus <attach/> rule needs to be used to validate against the original schema.

Below is an example of NVDL I used to validate articles I wrote in DocBook.

   |<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0" startMode="init">
   |  <mode name="init">
   |    <namespace ns="http://docbook.org/ns/docbook">
   |      <validate schema="rng/docbookrb.rng" useMode="verymml"/>
 5 |      <validate schema="sch/docbookxi.sch" useMode="allowmml"/>
   |    </namespace>
   |  </mode>
   |  <mode name="verymml">
   |    <namespace ns="http://www.w3.org/1998/Math/MathML">
10 |      <attach/>
   |      <validate schema="mathml3/mathml3.rng"/>
   |    </namespace>
   |    <namespace ns="http://www.w3.org/1999/xhtml">
   |      <attach/>
15 |    </namespace>
   |    <namespace ns="http://www.w3.org/2001/XInclude">
   |      <attach/>
   |    </namespace>
   |  </mode>
20 |  <mode name="allowmml">
   |    <namespace ns="http://www.w3.org/1998/Math/MathML">
   |      <allow/>
   |    </namespace>
   |    <namespace ns="http://www.w3.org/1999/xhtml">
25 |      <allow/>
   |    </namespace>
   |    <namespace ns="http://www.w3.org/2001/XInclude">
   |      <allow/>
   |    </namespace>
30 |  </mode>
   |</rules>