Various Tricks for XSLT

1 Introduction

The idea is fairly simple I would say. xsltproc is available on majority of the popular platforms.

2 Use XSLT to convert epub books to TeX

A common issue is that XHTML used in epub is with a namespace. So that namespace should be specified in stylesheet header. Then for example xhtml:html can be referenced.

  |<xsl:stylesheet version="1.0"
  |                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  |                xmlns:xhtml="http://www.w3.org/1999/xhtml">
  |  <xsl:output method="text"
5 |              indent="yes" omit-xml-declaration="yes"
  |              encoding = "UTF-8"/>
  |  <xsl:strip-space elements="*"/>

This snippet is what I use to handle ruby in EPUB files.

   |<xsl:template match="xhtml:ruby">
   |  <xsl:variable name="kanji">
   |    <xsl:for-each select="text()">
   |      <xsl:value-of select="."/>
 5 |    </xsl:for-each>
   |  </xsl:variable>
   |  <xsl:text>\ruby[</xsl:text>
   |  <xsl:choose>
   |    <xsl:when test="string-length($kanji) = 1">
10 |      <xsl:text>m</xsl:text>
   |    </xsl:when>
   |      <xsl:when test="count(xhtml:rt) = 1">
   |        <xsl:text>g</xsl:text>
   |      </xsl:when>
15 |      <xsl:otherwise><xsl:text>j</xsl:text></xsl:otherwise>
   |  </xsl:choose>
   |  <xsl:text>]{</xsl:text><xsl:value-of select="$kanji"/>
   |  <xsl:text>}{</xsl:text>
   |  <xsl:for-each select="xhtml:rt">
20 |    <xsl:value-of select="." />
   |    <xsl:if test="position() != last()"><xsl:text>|</xsl:text></xsl:if>
   |  </xsl:for-each>
   |  <xsl:text>}</xsl:text>
   |</xsl:template>

For example, <ruby>霊<rt>れい</rt>前<rt>ぜん</rt></ruby>, rendered as れいぜん, can be converted to \ruby[j]{霊前}{れい|ぜん}.

3 Add ruby to DocBook

The DocBook RELAX NG grammar can be extended so HTML ruby is allowed as an inline text element.

   |# DocBook 5 with ruby tag
   |namespace db = "http://docbook.org/ns/docbook"
   |namespace h = "http://www.w3.org/1999/xhtml"
   |default namespace = "http://docbook.org/ns/docbook"
 5 |    
   |include "docbook5.rnc" {
   |       db.extension.inlines = h.ruby
   |}
   | 
10 |h.ruby = element h:ruby {
   |    db._any.attribute*,
   |    ((text | db._any | h.ruby), (h.ruby.rt | (h.ruby.rp, h.ruby.rt, h.ruby.rp))?)+
   |}
   |h.ruby.rp = element h:rp { (db._any.attribute | text)* }
15 |h.ruby.rt = element h:rt { (db._any.attribute | text)* }

This definition should cover much of the examples given by the HTML specification on ruby.

<h:ruby><h:ruby>東<h:rt>とう</h:rt>南<h:rt>なん</h:rt></h:ruby><h:rt>たつみ</h:rt></h:ruby>の方角

とうなんたつみの方角

Since ruby is prefixed by HTML namespace, no additional XSLT customization is needed.

4 Validating

If you only have xmllint:

$ xmllint --noout --relaxng docbook-5.0.1/rng/docbook.rng doc.xml

5 DocBook xslTNG

I have recently switch to this version to have better code listing support. The setup is confusing, but still worth it and it follows the DocBook semantic better, with optimized CSS paged media, etc.

It is written in XSLT 3.0 so there are more available features, but that also means it requires Saxon. The JAR is self-contained so there are not much worries about portability.

5.1 mediaobject path

This function fix the problem for mediaobject path when the final HTML pages are not in the same directory produced.

   |<xsl:param name="v:mediaobject-input-base-uri" as="xs:string"
   |           select="'media/'" />
   | 
   |<xsl:param name="v:mediaobject-output-base-uri" as="xs:string"
 5 |           select="'./'"/>
   | 
   |<xsl:function name="f:resolve-object-uri" as="xs:string">
   |  <xsl:param name="uri" as="xs:string"/>
   | 
10 |  <xsl:variable name="input-uri"
   |                select="substring-after($uri, $v:mediaobject-input-base-uri)"/>
   |  <xsl:variable name="output-uri"
   |                select="if (exists($v:mediaobject-output-base-uri))
   |                        then $v:mediaobject-output-base-uri || $input-uri
15 |                        else $input-uri"/>
   | 
   |  <xsl:sequence select="$output-uri"/>
   |</xsl:function>

6 Setup Emacs for DocBook 5 and XSLT 3.0

The nxml-mode containted in Emacs has only DocBook 4 grammar installed. By downloading RELAX NG schema in compact syntax, and replace that to /etc/schema/docbook.rnc or edit schemas.xml in the same directory to enable editing support in Emacs.

The same can be done for update XSLT grammar to 3.0.

7 Content MathML

These lines can apply the content markup to presentation markup of MathML in DocBook.

  |<xsl:import href="ctop.xsl"/>
  |<xsl:template match="mml:math" mode="m:docbook">
  |  <xsl:apply-templates select="." mode="c2p"/>
  |</xsl:template>

The ctop.xsl is available from David Carlisle's web stylesheets.

A few examples:

  |<math xmlns="http://www.w3.org/1998/Math/MathML">
  |  <apply><eq/>
  |  <apply><diff/>
  |    <bvar><ci>x</ci></bvar>
5 |    <apply><sin/><ci>x</ci></apply>
  |  </apply>
  |  <apply><cos/><ci>x</ci></apply>
  |</apply>
d sin x d x = cos x
Example 7.1 Rendered by XSLT
d sin x d x = cos x
Example 7.2 Rendered by Temml with \frac{d \sin x}{d x} = \cos x
   |<math xmlns="http://www.w3.org/1998/Math/MathML">
   |  <apply>
   |    <times/>
   |    <apply>
 5 |      <plus/>
   |      <cn>12</cn>
   |      <ci>x</ci>
   |    </apply>
   |    <apply>
10 |      <power/>
   |      <ci>y</ci>
   |      <cn>2</cn>
   |    </apply>
   |  </apply>
15 |</math>
( 12 + x ) y 2
Example 7.3 Notice the parenthesis insertion

8 MusicXML to PMX

PMX is the macro preprocessor for MusiXTeX, for some reason xml2pmx cannot parse MusicXML produced by Opusmodus, thus I need this stylesheet to do part of the conversion.

   |<?xml version="1.0" encoding="UTF-8"?>
   |<xsl:stylesheet version="1.0"
   |                xmlns:data="_local.uri"
   |                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 5 |  <xsl:output method="text"/>
   |  <data:typemap>
   |        <entry key="breve">9</entry>
   |        <entry key="whole">0</entry>
   |        <entry key="half">2</entry>
10 |        <entry key="quarter">4</entry>
   |        <entry key="eighth">8</entry>
   |        <entry key="16th">1</entry>
   |        <entry key="32nd">3</entry>
   |        <entry key="64th">6</entry>
15 |  </data:typemap>
   |  <xsl:template match="/">
   |    <xsl:apply-templates select="score-partwise/part"/>
   |  </xsl:template>
   |  <xsl:template match="part">
20 |    <xsl:for-each select="measure">
   |      <xsl:apply-templates select="."/>
   |    </xsl:for-each>
   |  </xsl:template>
   |  <xsl:template match="measure">
25 |    <xsl:text>% measure </xsl:text>
   |    <xsl:value-of select="@number" />
   |    <xsl:text>&#xA;</xsl:text>
   |    <xsl:for-each select="note">
   |      <xsl:apply-templates select="."/>
30 |    </xsl:for-each>
   |    <xsl:text>/&#xA;</xsl:text>
   |  </xsl:template>
   |  <xsl:template match="note">
   |    <xsl:variable name="_ntype" select="type"/>
35 |    <xsl:variable name="ntype" select="document('')/xsl:stylesheet/data:typemap/entry[@key=$_ntype]"/>
   |    <xsl:if test="tie[@type='start']">
   |      <xsl:text>( </xsl:text>
   |    </xsl:if>
   |    <xsl:choose>
40 |      <xsl:when test="rest">
   |        <xsl:text>r</xsl:text>
   |        <xsl:value-of select="$ntype"/>
   |      </xsl:when>
   |      <xsl:when test="chord">
45 |        <xsl:text>z</xsl:text>
   |        <xsl:value-of select='translate(pitch/step,"ABCDEFG", "abcdefg")'/>
   |        <xsl:value-of select="pitch/octave"/>
   |      </xsl:when>
   |      <xsl:otherwise>
50 |        <xsl:value-of select='translate(pitch/step,"ABCDEFG", "abcdefg")'/>
   |        <xsl:value-of select="$ntype"/>
   |        <xsl:value-of select="pitch/octave"/>
   |        <xsl:if test="accidental">
   |          <xsl:value-of select="substring(accidental, 1, 1)"/>
55 |        </xsl:if>
   |        <xsl:if test="dot">
   |          <xsl:text>d</xsl:text>
   |        </xsl:if>
   |      </xsl:otherwise>
60 |    </xsl:choose>
   |    <xsl:text> </xsl:text>
   |    <xsl:if test="tie[@type='stop']">
   |      <xsl:text>) </xsl:text>
   |    </xsl:if>
65 |  </xsl:template>
   |</xsl:stylesheet>