The XSLT Processing Model

This tutorial continues the discussion of how XSLT stylesheets process their input information to create a new output. It also covers namespaces and languages, which are important to keep in mind when transforming from one XML language into another.

The XSLT processing model: overview

So let's come back now and consider in more detail how an XSLT stylesheet runs and what it does. What we're talking about here is something called the XSLT processing model and it is essentially the set of rules that direct how the stylesheet will run and what it will do, in what order. These rules are actually fairly simple once you're familiar with them, although they're not perfectly intuitive at the outset--so we're going to step through them in detail with an actual example and follow how they work.

Essentially we are starting with the root of the input document, and working our way through that tree, based on the templates we find in the stylesheet.

This tutorial will cover in more detail how an XSLT stylesheet runs. What we're talking about here is something called the XSLT processing model: essentially the set of rules that direct how the stylesheet will run and what it will do, in what order. You can see the set of rules in the top-left corner of the slide. These rules are actually fairly simple once you're familiar with them, although they're not perfectly intuitive at the outset—so we're going to step through them in detail with an actual example and follow how they work.

Basically, we are starting with the root of the input document, and working our way through that tree, based on the templates we find in the stylesheet.

The XSLT processing model: matching the root

So what's the first rule? we remember this from our earlier, more informal look: start with the root of the input document (what is it?) and then ask: is there a template that matches this element I'm considering? (is there?)

The first thing we want to do when we start an XSLT document is look at the root element of the input document. In the case of TEI documents, this will be the TEI element! Next we want to see if there is a template that matches the root element. In the case of this stylesheet, there is a template that matches TEI. The next slide will cover how this template is applied in an output document

The XSLT processing model: applying a template

If I do find a template that matches the element I'm considering, I apply that template: in this case, what is it doing? (writing out the first few layers of the output tree, and including a little bit of literal text)

If there is a template that matches the element you're considering, apply it. In the case above, you can see that the template applies three output elements: html, head, and title. Within title there is literal text, which is written into the output document.

For HTML output, it is very common to have a template like this matched to the TEIroot element, setting up the structure of the HTML document.

The XSLT processing model: processing children

So what did we just do? We applied a template, which entails: putting out the output elements writing out any literal text and one thing further: if there are instructions to apply templates, then process the children of the matched element

So what is the element we just matched? (TEI)

And it does include instructions to apply templates, which means that we... (process the children of the matched element, i.e. the children of TEI

So what are these children? text: what do we do with this? What rule applies in this case? (no template matches it: so we apply built-in processing rules, which say that we... (spit out any text, and process any children)

So what are the children of text?...front: what do we do with this? What rule applies in this case?

There's a template that matches front, but what does it tell us to do? What rule applies in this case?... there are no instructions to apply templates, so this part of the process stops there. There's no output from front.

Does it stop altogether? Or are there other loose ends that will keep the process going?

So what did we just do? We applied a template, which entails putting out the output elements, writing out any literal text (in this case Test Document</head>]]>). The next step is to process the children of the matched element, but only if xsl:apply-templates element is present!

Since there is the xsl:apply-templates, take a minute to think about: what element is matched in this scenario? what are the children? how would we process those children? We will discuss the answers to these questions below.

In this case, the TEI element is matched. And since the xsl:apply-templates element is present we do process the children.

The only child of TEI is text in this example. Note that there is no template that matches text in our stylesheet. However, the process doesn't stop there. When no template is matched for the child of a given element, the stylesheet uses built-in processing rules that instruct that we spit out any text and process any children.

The first child of text in this case is front. So what happens here? You will notice that the xsl:template that matches front has no content (and therefore no xsl:apply-templates). This signifies to the processor that the front element should be ignored, and no children should be processed. Therefore, nothing correlating to front appears in the output document.

The processor will then move to the next child of text, which will be discussed in the next slide.

The XSLT processing model: chugging along

There's another child of text, namely body, so our built-in stylesheet rule of process the children applies here and allows us to proceed to the body element

So what happens here? (another output element is generated, and inside it, additional templates will be applied)

The next child of text is body. Take a minute to get a sense of what happens with the template matching body.

This template instructs the processor to transform the TEI element body into the HTML element body. The xsl:apply-templates instruction indicates that the processor should process the children of body as we saw earlier with TEI.

The XSLT processing model: processing more children

Next we start processing the children of body, and we have two templates here that do somewhat similar things; what are we matching here?

What if we had wanted to just match any head in the input document?

Why do it this way? Why distinguish between two different locations for head

Next we start processing the children of body. Note that there are two templates here that look somewhat similar. This example is a little bit more complicated than the ones we've dealt with before, but the same principle we've been discussing applies here. In both cases, the first part of the value for match (before the slash) indicates a context for the element. In the first case the context is body. So the matched element is the head that is the child of body. In the second example, the match value indicates the head that is the child of div. The syntax here is XPath (a way of navigating the document tree which will be introduced in the next tutorial in this primer).

What if we had wanted to just match any head in the input document? In that case, we would have simply used a template that matched head on its own. Instead of providing context (i.e. the specifications for body and div), we would have simply made the value on match for xsl:template equal to head.

However, it may be useful to think about why we would distinguish between two different locations for head. As you can see from the output, the two different TEI head elements were converted into the HTML elements of h1 and h2. If you are familiar with HTML, you probably know that h1 and h2 usually display differently. The numbered h elements in HTML typically indicate a hierarchy of headings, ie. h2 is a sub-heading of h1. In this case, we are allowing our document heading (the TEI head that is the child of body) to be displayed differently than our chapter headings. There are many instances in which we would want elements to function differently depending on their context. For example, we may want the persNames in our structured personography to display differently than the persNames in the body of our text. Providing context allows us to differentiate between elements depending on their context.

Since the xsl:apply-templates element is present in both head templates and neither element has any children, the processor simply spits out the text in those elements in the output document.

The XSLT processing model: a final round

Finally we're getting to the last of the children in the input document... What is happening here? What rules apply when we get to emph? (There's no template that matches, so we apply the built-in rules, which say...if what we're processing is text, spit out the text)

Finally we're getting to the last of the children in the input document... What is happening here? Just as with TEI and body, the TEI p element is matched and translated to the HTML p element in the output document. Since the xsl:apply-templates instruction is present, the children are processed.

What rules apply when we get to the emph element in our input document? As you can see, there's no template that matches, so we apply the built-in rules, which say process the children. However there are no children: just text. As we saw earlier, if what we're processing is text, we simply spit out the text.

The XSLT processing model: a last look

Now we can back up again and study the whole thing as a finished product: the input, the stylesheet, and the output. Any questions?

Now here we see the finished product. If you find it useful, take a few minutes to review the preceding slides to get a sense of the process as a whole.

Namespaces and languages

You've probably already noticed that we're dealing here with three languages: The language of the XSLT stylesheet itself (which is a language containing elements like tempate and apply-templates) The language of the input document (in this case, TEI) The language of the output document (in this case, HTML)

Within the stylesheet itself, we need to keep these three different languages distinct from one another, so that the processor always knows what piece of what tree it is dealing with. We do this with something called namespaces. (Does everyone understand namespaces? Quick review on the next slide if necessary...)

Each of these languages plays a specific role in the stylesheet ecology and gets referenced in a distinctive way: Let's take the simplest first: the output tree, which is being treated transparently in our examples: it doesn't use a namespace prefix, and this is because we have declared that the entire stylesheet is in the HTML namespace (we did this with the namespace declaration attribute-like thingy: xmlns) The next fairly simple case is the input tree, which also looks as if it's not getting a namespace. How are we keeping this separate from the output tree? The trick here is that the input tree is always accessed via these match and select (and similar) attributes. These attributes all access the input tree via XPath, and up at the top, we provided a default namespace for all XPaths (via xpath-default-namespace) And finally, the stylesheet document has its namespace specified as the XSL namespace (the default namespace for the stylesheet is already set to HTML) so all of the stylesheet elements have a namespace prefix.

As you've probably noticed, we are dealing with three different languages during this process: the input language (in this case, TEI), the language of the output document (in this case HTML), and the language of the XSLT stylesheet itself (which contains elements like template and apply-templates.

Within the stylesheet itself, it is important that we keep these languages distinct from each other, so that the processor knows what piece of the tree it is dealing with. We differentiate between the languages using namespaces.

If you do not know about namespaces (or if you feel you could use a refresher), take this time to continue to the next slide for an overview. You can come back to this slide for any additional information that you need.

Each of the languages used (input, output, and XSL) plays a specific role in the stylesheet ecology and gets referenced in a distinctive way.

The output tree, as you can see, doesn't use a namespace prefix on each element. This is because we have already specified the namespace using the xmlns attribute on xsl:stylesheet (See the blue section on the stylesheet).

The next fairly simple case is the input tree, which also looks as if it's not getting a namespace in the stylesheet. How are we keeping this separate from the output tree? The trick here is that the input tree is always accessed via the match and select (and similar) attributes. These attributes all access the input tree via XPath, and up at the top, we provided a default namespace for all XPaths (via xpath-default-namespace)

And finally, the stylesheet document has its namespace specified as the XSL namespace (see the bit that says xmlns:xsl=). Since the default namespace of the document is HTML, we must use the the XSL prefix for all the elements we want to use that are in the XSL namespace.

Namespaces review

Without the genus, we don't know what animal these species are: glauca: a pine tree (Picea glauca) or a small yellow flower (Agoceris glauca)? leucocephalus: a cactus (Pilosocereus leucocephalus) or a bald eagle (Haliaeetus leucocephalus)?

Without knowing the language, we don't know these words mean: the (English definite article or a French hot drink?) bad (English adjective or German noun for bath?)

Without a namespace designation, we don't know what these elements mean: p (TEI paragraph or HTML block element?) div (TEI textual division or HTML grouping element?) fileDesc (TEI or EAD?)

With the namespace, all is clear: tei:p html:div ead:fileDesc

The namespace prefix is somewhat like a genus or language name: it tells us more precisely what language we are speaking (and hence what the semantics of the element are)

Namespaces function like genus names in taxonomy or like languages. They allow us to understand the context for given words. So for example, glauca can specify different species, depending upon the genus name, and the can mean be either an English article, or the French word for tea. Similarly, p means something different in HTML than it does in the TEI. Namespaces allow us to clear up this confusion, by providing the language that a given element is being used in.

The namespace prefix is somewhat like a genus or language name: it tells us more precisely what language we are speaking (and hence what the semantics of the element are).

This tutorial is complete, please see links below to continue: Proceed to next tutorial in Transformation and Publication Primer Return to Transformation and Publication Primer Return to main tutorial page