Re: [xsl] XSLT to remove characters and whitespaces

Subject: Re: [xsl] XSLT to remove characters and whitespaces
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Fri, 07 Jul 2006 11:08:41 -0400
Hi Georg,

A couple of things:

I'm unsure of why you are normalizing the spaces after converting CRs and tabs to spaces, and stripping line feeds, with translate() (in two separate operations). Why not simply normalize the spaces, since that takes care of line feeds and tabs? (The parser should already have normalized CRs away so they shouldn't even be there.)

More basically, and this is what accounts for your problem: you are matching elements, creating new elements with the same names (any reason not to use the simpler xsl:copy instruction?), writing out their string values (i.e. all the text inside the elements) and then descending the tree to do the same. This results in your string values being written out over and over again, every time an ancestor element gets processed.

So if your input were

<greeting>
  <to>Georg</to>
  <from>XSL-List</from>
  <text>Hey, how are tricks?</text>
</greeting>

you'll get

<greeting>GeorgXSL-ListHey, how are tricks?
  <to>Georg</to>
  <from>XSL-List</from>
  <text>Hey, how are tricks?</text>
</greeting>

since the greeting element gets its text value copied before its own element contents are traversed.

Instead of this, you only want to normalize values of the *text* nodes, letting element nodes take care of themselves ... so:

<xsl:template match="text()"/>
  <xsl:value-of select="normalize-space()"/>
</xsl:template>

<xsl:template match="*">
  <xsl:copy>
    <xsl:copy-of select="@*"/>
    <xsl:apply-templates/>
  </xsl:copy>
</xsl:template>

... as you can see, fairly simple, and a garden-variety near-identity transform.

Cheers,
Wendell

At 06:33 AM 7/7/2006, you wrote:
Hello,

i have a xml file with some content in it which contains some unwanted
carriage returns and whitespaces. Now I'm trying to write a stylesheet
which makes an exact copy of the source file but without the returns
and whitespaces. I thought this should work:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
<xsl:output name="stripped" method="xml" version="1.0"
encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:result-document format="stripped" href="result.xml">
<xsl:apply-templates/>
</xsl:result-document>
</xsl:template>
<xsl:template match="*">
<xsl:element name="{name()}">
<xsl:value-of select="normalize-space(translate(translate(.,
'&#x0d;&#x0a;', ' '), '&#09;', ' '))"/>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>


But the output is a mess in parts. What am I doing wrong?

Current Thread