TXL

1Introduction

It took me a little while to start understanding what TXL can do and how to use it. After success in applying XSLT for typesetting, and a failed attempt at an optimizing C compiler written in Common Lisp because the parser combinator library is bad at handling ANSI C, I rekindled my interest in this programming language.

I have a negative opinion on compiler optimization as I rooted to be an APL programmer. Why would someone even try to program in a inefficient way so compiler can optimize their program? If the programming language does not let you specify the program in the most direct way with the intended efficiency, it is probably a badly designed programming language.

However, although the motivation to me behind using TXL is for program optimization, TXL originated from a lack of compiler features back in the day of early programming language design ages. Not to speak of OOP extension, even the elsif and case statement can be a luxury.

Again, this writing is not about teaching TXL, but a general guide on how to understand the various design decisions made in it and how it is a programming language of its own kind.

2A Perlis Language

A language that doesn't affect the way you think about programming is not worth knowing.

Alan Perlis

TXL is a preprocessor. It parses input using a grammar description and applies a transform program to the parsed result.

The transform language used by TXL is closely related to two programming languages: Refal and XSLT.

Let's talk about XSLT first. It transforms XML documents, and is a pure functional language. An XSLT program is defined by a collection of transforms called templates. They either have targets matched using XPath specifiers in the current context of a node, or have a name to be explicitly referenced by apply template XSLT calls. By default, templates are only applied once unless otherwise explicitly programmed to be iterative.

The basic elements of the transform language of TXL, on the other hand, are rules and functions. Rules are matched in the entire document/program context using unification and are applied iteratively by default, while functions are more like the default XSLT template, matched only once in the current node context. The unification and iterative application closely resembles the evaluation model of Refal, at least in my opinion.

Unlike XSLT, which is optionally typed using XML Schema, TXL is strongly typed against the grammar. A replacement rule needs to be of the same type before and after the replacement. One way to work around that when the result needs to be a different type of node than the original is to construct against an empty node. Another more hacky method is using the ability to override grammar to blend the original and target types together, which is a common technique when TXL is used to transform a program in one programming language to another. This makes TXL less find grained compared to the nanopass framework, where the result of each pass can be a derived but different language.

3Grammar as the First-Class Citizen

In most language transformation tools, the grammar is an implementation detail — you feed it a description and it gives you a parser. In TXL, the grammar is the program. You do not write a grammar and then separately write transformation code that happens to be aware of it; the grammar definitions and the rules that rewrite parsed terms are written in the same file, interleaved, and the type system of the rules is derived directly from the grammar nonterminals.

This has a practical consequence that becomes obvious once you try to extend an existing TXL program for a language you did not define yourself. TXL allows grammar rules to be overridden: a later definition of a nonterminal can replace or augment an earlier one. This is the mechanism behind the common idiom of borrowing a base grammar for a language and adding new nonterminals for the source and target constructs you care about, without touching the original grammar file. The approach is less surgical than nanopass but considerably simpler to set up for a single transformation pass over real-world source code.

Consider a minimal illustration. Suppose the base grammar defines:

  |define statement
  |    [assignment_statement]
  |  | [if_statement]
  |  | [while_statement]
5 |end define

To add a new construct without modifying this definition, one writes:

  |redefine statement
  |    [statement]
  |  | [new_construct]
  |end redefine

The original alternatives are preserved by the [statement] reference; the new alternative is appended. The rule that rewrites new_construct back into standard statements can then be written separately, keeping the transformation logic close to the grammar extension that motivates it.

4Backtracking and Determinism

A rule in TXL succeeds or fails. When a rule fails to match, TXL does not raise an error by default; it simply leaves the current node unchanged and moves on. This silent non-match behaviour is what makes the iterative application of rules safe: a rule will fire everywhere it can and do nothing elsewhere. The consequence is that TXL programs tend to be written in a style closer to logic programming than to procedural rewriting systems — you describe what a valid match looks like and what it should become, and the engine handles traversal.

Functions, by contrast, are expected to match exactly once in their calling context. A function that fails to match is a program error. This distinction between the try everywhere, silently skip semantics of rules and the match exactly here, fail loudly semantics of functions maps roughly onto the distinction between a schema-directed rewrite and a named template call in XSLT, which should be familiar territory for anyone who has written non-trivial stylesheets.

One subtlety that surprises newcomers is that the iterative application of a rule is not a simple fixed-point iteration over the original tree. TXL applies rules in a depth-first, outside-in traversal, and the result of a successful rewrite is itself subject to further rewriting in the same pass. This means a poorly written rule that always matches its own output will loop indefinitely. The discipline required is similar to writing terminating rewrite systems in term rewriting theory, and TXL provides no automatic termination check.