Re: [xsl] XSLT to remove characters and whitespaces

Subject: Re: [xsl] XSLT to remove characters and whitespaces
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Mon, 10 Jul 2006 12:03:51 -0400
Georg,

At 04:35 AM 7/10/2006, you wrote:
thank you very much for your help. My approach underlied some
misunderstandings of the used xslt elements and xpath functions. For
"normalize-space" i thought that "whitespaces" means "spaces" and not
tab, newline, and carriage return too. Now that i know I will have to
remove many translate() functions in my previous stylesheets :-).

Heh. Fortunately, the extra calls to translate() are probably doing no harm except slowing things down a bit. (IIRC there was one small potential bug in your handling of the LF character.)


As you point out your solution is similar to the "identity template"
at Michaels "XSLT2.0", Page 243, which i didn't mentioned before. I
wonder why he uses "@*|node()" instead of "*" for the matching.

Well, "@*|node()" is shorthand for


"attribute::* | child::node()"

which is to say, any attribute or any child node whatsoever (whether element, comment, or processing instruction), in other words any node at all (except those pesky namespace nodes they're tangling with in another thread). Or the root. To match the root node you have to say "/".

"*" is short for "child::*", that is any child element (not comments or processing instructions), which I guessed would be sufficient for your case.

I know these things because my head is crammed with lots of bits of information about XPath and XSLT that are not taught in school (unless you count this list as a kind of school). They were picked up over time from the likes of Tony Graham, David Carlisle, Mike Kay, and many other developers and users who have frequented the halls of XSLT especially in the early years. (Sometimes those are metaphorical halls, as here; sometimes they're literal, as when we meet at conferences. :-)

 If it
matches an attribute (@*) what would the template do with it? Your
solution using "*" seems to me more logical and does the job too. The
question is: why?

Matching attributes was unnecessary in your case since we had the line


<xsl:copy-of select="@*"/>

which copies attributes, thus making it unnecessary to match them with templates of their own. (If you have no attributes to copy it makes it even easier.)

You can certainly match attributes, like the classic identity transformation (the one in Mike's book, which he copies from the XSLT Recommendation at http://www.w3.org/TR/xslt). But as always, such a match will never occur (even given the template) unless attributes are also -selected- by an apply-templates instruction at some point. The reason to do this would be if you wished to modify the attributes too, not just copy them. If your requirement included whitespace-munging attribute values as well, we might have done this.

Thus:

<xsl:template match="*"> <!-- matches elements -->
<xsl:copy>
<xsl:copy-of select="@*"/> <!-- copies attributes -->
<xsl:apply-templates/> <!-- selects child nodes for template processing -->
</xsl:copy>
</xsl:template>


and

<xsl:template match="*|@*"> <!-- matches elements and attributes -->
  <xsl:copy>
    <xsl:apply-templates select="*|@*"/> <!-- selects
      child elements and attributes for template processing -->
  </xsl:copy>
</xsl:template>

do exactly the same thing. The first is slightly more direct; the second allows you to customize your handling of particular kinds of attributes (by overriding this template with their own particular templates) as well as elements. (This was a non-issue for you since you were only messing with the text nodes.)

Neither of them match or process comments or processing instructions; if any of those are present in the source and you don't wish to drop them, you have to select and match them too. That's when the catch-all node test, "node()", becomes useful.

Sorry if i'm a litte bit slow-witted in this issue.

I don't consider it slow-witted not to be able to guess things that are in themselves rather cryptic and obscure. The important thing is to be willing to ask questions and seek out answers for them. By rehearsing the answers on this list, we create more experts who can provide guidance in future to anyone who might get stuck -- as well as innovate solutions not yet in the literature, on the basis not just of haphazard guesswork but of a sound understanding.


Cheers,
Wendell

Current Thread