1. Introduction
The idea is fairly simple I would say. xsltproc
is
available on majority of the popular platforms.
2. Use XSLT to convert epub books to TeX
A common issue is that XHTML used in epub is with a namespace.
So that namespace should be specified in stylesheet header. Then
for example xhtml:html
can be referenced.
|<xsl:stylesheet version="1.0"
|xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
|xmlns:xhtml="http://www.w3.org/1999/xhtml">
|<xsl:output method="text"
5 |indent="yes" omit-xml-declaration="yes"
|encoding = "UTF-8"/>
|<xsl:strip-space elements="*"/>
This snippet is what I use to handle ruby
in EPUB
files.
|<xsl:template match="xhtml:ruby">
|<xsl:variable name="kanji">
|<xsl:for-each select="text()">
|<xsl:value-of select="."/>
5 |</xsl:for-each>
|</xsl:variable>
|<xsl:text>\ruby[</xsl:text>
|<xsl:choose>
|<xsl:when test="string-length($kanji) = 1">
10 |<xsl:text>m</xsl:text>
|</xsl:when>
|<xsl:when test="count(xhtml:rt) = 1">
|<xsl:text>g</xsl:text>
|</xsl:when>
15 |<xsl:otherwise><xsl:text>j</xsl:text></xsl:otherwise>
|</xsl:choose>
|<xsl:text>]{</xsl:text><xsl:value-of select="$kanji"/>
|<xsl:text>}{</xsl:text>
|<xsl:for-each select="xhtml:rt">
20 |<xsl:value-of select="." />
|<xsl:if test="position() != last()"><xsl:text>|</xsl:text></xsl:if>
|</xsl:for-each>
|<xsl:text>}</xsl:text>
|</xsl:template>
For example, <ruby>霊<rt>れい</rt>前<rt>ぜん</rt></ruby>
,
rendered as 霊前, can be
converted to \ruby[j]{霊前}{れい|ぜん}
.
3. Add
ruby
to DocBook
The DocBook RELAX NG grammar can be extended so HTML
ruby
is allowed as an inline text element.
|# DocBook 5 with ruby tag
|namespace db = "http://docbook.org/ns/docbook"
|namespace h = "http://www.w3.org/1999/xhtml"
|default namespace = "http://docbook.org/ns/docbook"
5 |
|include "docbook5.rnc" {
| db.extension.inlines = h.ruby
|}
|
10 |h.ruby = element h:ruby {
| db._any.attribute*,
| ((text | db._any | h.ruby), (h.ruby.rt | (h.ruby.rp, h.ruby.rt, h.ruby.rp))?)+
|}
|h.ruby.rp = element h:rp { (db._any.attribute | text)* }
15 |h.ruby.rt = element h:rt { (db._any.attribute | text)* }
This definition should cover much of the examples given by the HTML specification on ruby.
<h:ruby><h:ruby>東<h:rt>とう</h:rt>南<h:rt>なん</h:rt></h:ruby><h:rt>たつみ</h:rt></h:ruby>の方角
東南の方角
Since ruby
is prefixed by HTML namespace, no
additional XSLT customization is needed.
4. Validating
If you only have xmllint
:
$ xmllint --noout --relaxng docbook-5.0.1/rng/docbook.rng
doc.xml
To convert RELAX NG compact syntax to the XML syntax, you would need trang. I prefer writing RELAX NG in its compact syntax.
5. DocBook xslTNG
I have recently switch to this version to have better code listing support. The setup is confusing, but still worth it and it follows the DocBook semantic better, with optimized CSS paged media, etc.
It is written in XSLT 3.0 so there are more available features, but that also means it requires Saxon. The JAR is self-contained so there are not much worries about portability.
6. Setup Emacs for DocBook 5 and XSLT 3.0
The nxml-mode
containted in Emacs has only DocBook
4 grammar installed. By downloading RELAX NG
schema in compact syntax, and replace that to /etc/schema/docbook.rnc
or edit schemas.xml
in the same directory to enable
editing support in Emacs.
The same can be done for update XSLT grammar to 3.0.
7. Content MathML
These lines can apply the content markup to presentation markup of MathML in DocBook.
|<xsl:import href="ctop.xsl"/>
|<xsl:template match="mml:math" mode="m:docbook">
|<xsl:apply-templates select="." mode="c2p"/>
|</xsl:template>
The ctop.xsl
is available from
David Carlisle's web stylesheets.
A few examples:
|<math xmlns="http://www.w3.org/1998/Math/MathML">
|<apply><eq/>
|<apply><diff/>
|<bvar><ci>x</ci></bvar>
5 |<apply><sin/><ci>x</ci></apply>
|</apply>
|<apply><cos/><ci>x</ci></apply>
|</apply>
\frac{d \sin x}{d x} = \cos x
|<math xmlns="http://www.w3.org/1998/Math/MathML">
|<apply>
|<times/>
|<apply>
5 |<plus/>
|<cn>12</cn>
|<ci>x</ci>
|</apply>
|<apply>
10 |<power/>
|<ci>y</ci>
|<cn>2</cn>
|</apply>
|</apply>
15 |</math>
8. MusicXML to PMX
PMX is the macro preprocessor for MusiXTeX, for some reason
xml2pmx
cannot parse MusicXML produced by Opusmodus,
thus I need this stylesheet to do part of the conversion.
|<?xml version="1.0" encoding="UTF-8"?>
|<xsl:stylesheet version="1.0"
|xmlns:data="_local.uri"
|xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
5 |<xsl:output method="text"/>
|<data:typemap>
|<entry key="breve">9</entry>
|<entry key="whole">0</entry>
|<entry key="half">2</entry>
10 |<entry key="quarter">4</entry>
|<entry key="eighth">8</entry>
|<entry key="16th">1</entry>
|<entry key="32nd">3</entry>
|<entry key="64th">6</entry>
15 |</data:typemap>
|<xsl:template match="/">
|<xsl:apply-templates select="score-partwise/part"/>
|</xsl:template>
|<xsl:template match="part">
20 |<xsl:for-each select="measure">
|<xsl:apply-templates select="."/>
|</xsl:for-each>
|</xsl:template>
|<xsl:template match="measure">
25 |<xsl:text>% measure </xsl:text>
|<xsl:value-of select="@number" />
|<xsl:text>
</xsl:text>
|<xsl:for-each select="note">
|<xsl:apply-templates select="."/>
30 |</xsl:for-each>
|<xsl:text>/
</xsl:text>
|</xsl:template>
|<xsl:template match="note">
|<xsl:variable name="_ntype" select="type"/>
35 |<xsl:variable name="ntype" select="document('')/xsl:stylesheet/data:typemap/entry[@key=$_ntype]"/>
|<xsl:if test="tie[@type='start']">
|<xsl:text>( </xsl:text>
|</xsl:if>
|<xsl:choose>
40 |<xsl:when test="rest">
|<xsl:text>r</xsl:text>
|<xsl:value-of select="$ntype"/>
|</xsl:when>
|<xsl:when test="chord">
45 |<xsl:text>z</xsl:text>
|<xsl:value-of select='translate(pitch/step,"ABCDEFG", "abcdefg")'/>
|<xsl:value-of select="pitch/octave"/>
|</xsl:when>
|<xsl:otherwise>
50 |<xsl:value-of select='translate(pitch/step,"ABCDEFG", "abcdefg")'/>
|<xsl:value-of select="$ntype"/>
|<xsl:value-of select="pitch/octave"/>
|<xsl:if test="accidental">
|<xsl:value-of select="substring(accidental, 1, 1)"/>
55 |</xsl:if>
|<xsl:if test="dot">
|<xsl:text>d</xsl:text>
|</xsl:if>
|</xsl:otherwise>
60 |</xsl:choose>
|<xsl:text> </xsl:text>
|<xsl:if test="tie[@type='stop']">
|<xsl:text>) </xsl:text>
|</xsl:if>
65 |</xsl:template>
|</xsl:stylesheet>
I'm working on convert the stylesheet to XSLT 3.0, so that defining maps would be easier.
9. Validation
9.1. Schematron
This is another XML schema language that is designed to be transformed by XSLT to become another XSLT stylesheet to analysis the target XML file. It can impose more complex constraint that RELAX NG cannot specify, with XPath expressions.
Note that although it is possible to embed Schematron into RELAX NG, it is still required to be extracted from the RELAX NG schema to be used for validation in most case.
9.2. NVDL
This is supported by Jing, however the features of Jing are
relatively undocumented. By default, it would strip out elements
that are in a different namespace. Thus
<attach/>
rule needs to be used to validate
against the original schema.
Below is an example of NVDL I used to validate articles I wrote in DocBook.
|<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0" startMode="init">
|<mode name="init">
|<namespace ns="http://docbook.org/ns/docbook">
|<validate schema="rng/docbookrb.rng" useMode="verymml"/>
5 |<validate schema="sch/docbookxi.sch" useMode="allowmml"/>
|</namespace>
|</mode>
|<mode name="verymml">
|<namespace ns="http://www.w3.org/1998/Math/MathML">
10 |<attach/>
|<validate schema="mathml3/mathml3.rng"/>
|</namespace>
|<namespace ns="http://www.w3.org/1999/xhtml">
|<attach/>
15 |</namespace>
|<namespace ns="http://www.w3.org/2001/XInclude">
|<attach/>
|</namespace>
|</mode>
20 |<mode name="allowmml">
|<namespace ns="http://www.w3.org/1998/Math/MathML">
|<allow/>
|</namespace>
|<namespace ns="http://www.w3.org/1999/xhtml">
25 |<allow/>
|</namespace>
|<namespace ns="http://www.w3.org/2001/XInclude">
|<allow/>
|</namespace>
30 |</mode>
|</rules>