Hi Doug,
At 03:07 PM 3/11/2005, you wrote:
I have learned on this list that matching is almost always better than
selecting.
Well, actually I'd say the two go together, and each needs to be used in
light of what you're doing with the other. (For example, remember that
apply-templates instructions can also select nodes from the tree, and this
is actually quite useful and important.)
Getting the hang of how this happens between templates and how, therefore,
templates work to "steer" the input tree into the output, is basic to
mastering XSLT.
As so often, there are good reasons for doing either, but those reasons
often don't apply when newbies do one thing or another for no particular
reason at all, other than that they're rattling the code until they happen
to get something to work. This is certainly fair, as far as it goes (I too
learn interfaces by poking at them) -- but when you want to go further ...
it helps to know "why".
So if you find that your script cares about some text() elements and not
others, then you probably do not want to use the ignore text() elements
template because it then forces your script to use a select to get the
desired text() element.
Pretty much, yes, subject to certain refinements. I might use an analogy
and suggest that it's like setting your spam filter to throw everything
away except what you tell it to (whitelisting), when you could more easily
tell it just what to throw away (blacklisting). Sometimes whitelisting is
in fact a better approach (and this is like XSLT "pulling" of values from
the source: nothing gets in but what you ask for). But in most cases (at
least in XSLT) it's simpler and easier just to let things through except
for just those things you don't want. Then you're not caught by surprise
because something you wanted, but neglected to ask for (for whatever
reason), fails to appear.
Since text nodes are by definition "leaf" nodes in the data model (they
have no children), the practical differences in this particular case only
emerge when things get complex -- but typically, at least on
loosely-structured data such as most "documentary" data, that happens
pretty soon; and because the complexity can be in the source data, the
stylesheet itself doesn't have to get very complex for things to go awry.
But managing exactly this kind of complexity is what the XSLT processing
model is really good at, so there's rarely a good reason to work against it.
Sometimes XSLT newbies try using <xsl:template match="*"/> (suppress all
elements by default) in a similar way to solve such "problems" as are
introduced by an over-quick reliance on xsl:value-of and such constructs,
instead of on the default processing. This can really cause havoc.
Is there a big difference between select="./text()" and select="." In
the examples below? How does this impact performance and scalability?
Example:
<doc>
unwanted text
<element>desired text</element>
</doc>
<xsl:template match="text()"/>
<xsl:template match="element">
<xsl:value-of select="./text()"/>
</xsl:template>
vs.
<xsl:template match="element">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="element/text()">
<xsl:value-of select="."/>
</xsl:template>
As far as efficiency and performance of processing, I doubt there's much
significant difference between these. But I don't much care, either, since
method #2 is clearly, to my eye, preferable and will scale better. Using
method #2 I don't have to write explicit instructions for every other kind
of text node I want, whereas in method #1, every new text node I want gives
me work to do, to override my override.
But the really interesting thing here is that the templates you've offered
in method #2 are, in fact, perfect echoes of what would happen to the
templates that would apply to those same nodes if you provided no templates
at all. Because the built-in templates
<xsl:template match="*">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="text()">
<xsl:value-of select="."/>
</xsl:template>
will do the same thing with the element element and its child text node as
the templates above, in #2 ... this means you could leave those templates
out and get exactly the same result.
In other words, you don't need to do #2 because it's what the processor
will already do without asking.
Doing nothing at all is both reasonably efficient (just let the processor
do its thing) and really easy to maintain.
Finally, a minor nit:
select="./text()" is short for
select="self::node()/child::text()"
this amounts to exactly the same thing as
select="child::text()", which is long for
select="text()".
So you can say
select="text()"
(leaving off the first step in the path), and things will be fine.
Cheers,
Wendell
======================================================================
Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
----------------------------------------------------------------------
Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================