Re: [xsl] XSLT to remove characters and whitespaces

Subject: Re: [xsl] XSLT to remove characters and whitespaces
From: "Georg Hohmann" <georg.hohmann@xxxxxxxxx>
Date: Mon, 10 Jul 2006 10:35:34 +0200
Hello Wendell,

thank you very much for your help. My approach underlied some
misunderstandings of the used xslt elements and xpath functions. For
"normalize-space" i thought that "whitespaces" means "spaces" and not
tab, newline, and carriage return too. Now that i know I will have to
remove many translate() functions in my previous stylesheets :-).

As you point out your solution is similar to the "identity template"
at Michaels "XSLT2.0", Page 243, which i didn't mentioned before. I
wonder why he uses "@*|node()" instead of "*" for the matching. If it
matches an attribute (@*) what would the template do with it? Your
solution using "*" seems to me more logical and does the job too. The
question is: why?

Sorry if i'm a litte bit slow-witted in this issue.

Regards,
Georg




2006/7/7, Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>:
Hi Georg,

A couple of things:

I'm unsure of why you are normalizing the spaces after converting CRs
and tabs to spaces, and stripping line feeds, with translate() (in
two separate operations). Why not simply normalize the spaces, since
that takes care of line feeds and tabs? (The parser should already
have normalized CRs away so they shouldn't even be there.)

More basically, and this is what accounts for your problem: you are
matching elements, creating new elements with the same names (any
reason not to use the simpler xsl:copy instruction?), writing out
their string values (i.e. all the text inside the elements) and then
descending the tree to do the same. This results in your string
values being written out over and over again, every time an ancestor
element gets processed.

So if your input were

<greeting>
   <to>Georg</to>
   <from>XSL-List</from>
   <text>Hey, how are tricks?</text>
</greeting>

you'll get

<greeting>GeorgXSL-ListHey, how are tricks?
   <to>Georg</to>
   <from>XSL-List</from>
   <text>Hey, how are tricks?</text>
</greeting>

since the greeting element gets its text value copied before its own
element contents are traversed.

Instead of this, you only want to normalize values of the *text*
nodes, letting element nodes take care of themselves ... so:

<xsl:template match="text()"/>
   <xsl:value-of select="normalize-space()"/>
</xsl:template>

<xsl:template match="*">
   <xsl:copy>
     <xsl:copy-of select="@*"/>
     <xsl:apply-templates/>
   </xsl:copy>
</xsl:template>

... as you can see, fairly simple, and a garden-variety near-identity
transform.

Cheers,
Wendell

  At 06:33 AM 7/7/2006, you wrote:
>Hello,
>
>i have a xml file with some content in it which contains some unwanted
>carriage returns and whitespaces. Now I'm trying to write a stylesheet
>which makes an exact copy of the source file but without the returns
>and whitespaces. I thought this should work:
>
><?xml version="1.0" encoding="UTF-8"?>
><xsl:stylesheet version="2.0"
>xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
><xsl:output name="stripped" method="xml" version="1.0"
>encoding="UTF-8" indent="yes"/>
><xsl:strip-space elements="*"/>
><xsl:template match="/">
>   <xsl:result-document format="stripped" href="result.xml">
>      <xsl:apply-templates/>
>   </xsl:result-document>
></xsl:template>
><xsl:template match="*">
>   <xsl:element name="{name()}">
>      <xsl:value-of select="normalize-space(translate(translate(.,
>'&#x0d;&#x0a;', ' '), '&#09;', ' '))"/>
>   <xsl:apply-templates/>
>   </xsl:element>
></xsl:template>
></xsl:stylesheet>
>
>But the output is a mess in parts. What am I doing wrong?

Current Thread