At 05:00 AM 3/19/2004, you wrote:
I am trying to do a sort of mail merge for creating wills and have been
advised that an XSL transform is the best route to go down.
The data is in XML format and I am just starting to convert the massive
(200+ pages) html template into an XSLT document.
The XML data is formatted as follows:
<AnswerSet title = "Test File">
<Answer name = "ABGRbefore">
<Answer name = "Female">
<Answer name = "ticdesc">
<TextValue>my collection of teapots</TextValue>
There are around 3,000 elements in the XML file in total.
I have so far worked out that at a simplistic level I can use the
following XSL for extracting the data:
<xsl:if test="TFValue = 'true'">
<p>the user is female.</p>
<xsl:if test="TFValue = 'false'">
<p>the user is male.</p>
Is this the most efficient way of extracting the data?
It is not inefficient. Whether it is the most efficient depends on what you
mean by "extract".
Each time I want to extract a value, is the Processor having to loop
through the XML file or does it do it in a single pass?
Generally, the latter. That is, since an XSLT processor usually works with
an entire tree of data already parsed in memory, it doesn't have to "loop
through the file" in the way you might think of it. But actually, how the
processor does it need not concern you. You only need to understand (a)
that processing is exactly optimized for this kind of stuff, and (b) what
the general XSLT processing model is and how it can be applied to your
In effect, this is exactly what you are asking:
I could break the template down into more manageable chunks, but am not
sue how to import one template into another.
Which is exactly the point. (And keep an eye on other ongoing threads:
several people are asking related questions.)
How a stylesheet is architected in XSLT depends primarily on the relation
between the structure of the source, and the structure of the result. If
the structure of the result mainly mirrors that of the source (as an
XML-encoded document may be transformed into an HTML "styled" version that
pretty much presents the same information organized in the same way), the
XSLT engine can be put to work by a stylesheet very straightforwardly -- by
default (without your having to do anything) it works by traversing the
input tree and building output as it goes. This is done by your mainly
staying out of the way; stylesheets of this kind have nothing but templates
written to match nodes from the input to be processed as they are
encountered, which can be very simple and elegant even in cases where
source documents vary widely in the particulars of their organization. (You
wouldn't ordinarily expect a set of technical manuals all to have exactly
the same organization; with this method, one stylesheet can cope with the
whole range). These are called "push" stylesheets in the business.
If your data has to be rearranged significantly, however, its content not
merely presented and embellished but funnelled into an entirely unrelated
organization, the simple push technique doesn't work. At this point
templates are used not just to catch things as they come and mark them as
they go, but actually to step in and rearrange things. They can become like
miniature queries into the source, breaking things out, performing tests,
wrapping up the data in different ways, or even directing the processor
where to go next. This is what is described as the "pull" model.
Most actual working stylesheets include a combination of pull and push.
They'll have pull where they need to rearrange the data into some known
structure, but they'll use push (characterized by template matches and
apply-templates instructions) where their output's structure mirrors their
input. Often template that match (handle) particular pieces of the input
document will have miniature pulls inside of them.
Your code above, with the tests, the for-each and the XPath in select
attributes, is characteristic of "pull" code, and seems to come naturally
to people who are experienced with database querying technologies (since
that's similar to what you're doing). The best XSLT practitioners also let
the processor do plenty of pushing, however. (I actually think of it as
being like tai chi, the Chinese martial art, but that's another topic.)
(Interestingly, what is often left out of the discussion about "push" and
"pull", particularly when we're singing the virtues of push, is that the
entire stylesheet by default is a big "pull", which is why push works so
nicely. When you start pulling, you're beginning to mess with what the
stylesheet does by itself, so you can easily get into trouble by pulling
when you could just allow it to push.)
Now, the interesting thing about a merge-type application as compared to
the "classic" or plain-vanilla XSLT transform is that you have two input
documents (or input streams), not just one, in addition to your stylesheet.
This raises the questions: pull or push? and if you're going to rely on
push, which source document does the pushing? (You could actually let both
do some pushing, but let's not go into that. :-)
The best answer to this is prompted by seeing what the different documents'
roles are in your architecture, as well as long-term maintenance concerns
(how and whether this stylesheet and its sources may need to evolve in future).
A pure "pull" approach might work for you if you are literally doing
nothing but populating a known document with values snagged from another
one. Look up the little-used feature "literal result document as
stylesheet" if you want to see a shortcut into this approach. (Eventually
you will also need an extension function to create multiple output files
... but worry about that later.)
If things get at all complex, however, you may find you need to do some
pushing, at which point you have to contrive things more flexibly. Since
you have one document serving as a "template" (non-XSLT sense of that
term), another as a kind of little database, it may make sense to let the
processor push the template through, querying the data in the other
document only where it needs to (pulling it).
In this case you would have a set of templates that match documents in your
boilerplate document (which you might want to make your main source file).
Mostly these templates just copy the boilerplate through. (You will want to
look up "identity transforms" to see how this is done.) Occasionally,
however, they query into the resource document to snag particular bits of
information. (Look up the document() function for this.)
Longer-term, there is still a problem with this approach in that you have
to run the processor once per output document, which can be a chore if you
have a big pile of names. This can be handled as well by wrapping your
logic in a routine that iterates over the set of names (again either by
pulling or letting them be pushed); but you need to implement the
per-document processing first.
Before I embark on a massive conversion process, I'm just wondering if
I am going down the right route.
What alternatives would you consider? XML lets you take control of your own
data at every level all the way down. This can be considered a good thing
for very many reasons.
A good book or two on XSLT, plus searching the net for such keywords as
"XSL processing model" would be a help. In particular, you want to
understand how templates match, how apply-templates works, what the
processor does when you don't tell it anything else (there are built-in
templates), and what role XPath plays (look at the difference between
"select" and "match").
Also, keep in mind that building a merge routine is not really a beginner
application. Though it's not all that hard to do, it raises architectural
issues that can be hard to understand when you don't yet know how XSLT
works with a single input document.
"Thus I make my own use of the telegraph, without consulting
the directors, like the sparrows, which I perceive use it
extensively for a perch." -- Thoreau
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list