Subject: Re: [xsl] XSLT to remove characters and whitespaces From: "Georg Hohmann" <georg.hohmann@xxxxxxxxx> Date: Mon, 10 Jul 2006 10:35:34 +0200 |
thank you very much for your help. My approach underlied some misunderstandings of the used xslt elements and xpath functions. For "normalize-space" i thought that "whitespaces" means "spaces" and not tab, newline, and carriage return too. Now that i know I will have to remove many translate() functions in my previous stylesheets :-).
As you point out your solution is similar to the "identity template" at Michaels "XSLT2.0", Page 243, which i didn't mentioned before. I wonder why he uses "@*|node()" instead of "*" for the matching. If it matches an attribute (@*) what would the template do with it? Your solution using "*" seems to me more logical and does the job too. The question is: why?
Regards, Georg
Hi Georg,
A couple of things:
I'm unsure of why you are normalizing the spaces after converting CRs and tabs to spaces, and stripping line feeds, with translate() (in two separate operations). Why not simply normalize the spaces, since that takes care of line feeds and tabs? (The parser should already have normalized CRs away so they shouldn't even be there.)
More basically, and this is what accounts for your problem: you are matching elements, creating new elements with the same names (any reason not to use the simpler xsl:copy instruction?), writing out their string values (i.e. all the text inside the elements) and then descending the tree to do the same. This results in your string values being written out over and over again, every time an ancestor element gets processed.
So if your input were
<greeting> <to>Georg</to> <from>XSL-List</from> <text>Hey, how are tricks?</text> </greeting>
you'll get
<greeting>GeorgXSL-ListHey, how are tricks? <to>Georg</to> <from>XSL-List</from> <text>Hey, how are tricks?</text> </greeting>
since the greeting element gets its text value copied before its own element contents are traversed.
Instead of this, you only want to normalize values of the *text* nodes, letting element nodes take care of themselves ... so:
<xsl:template match="text()"/> <xsl:value-of select="normalize-space()"/> </xsl:template>
<xsl:template match="*"> <xsl:copy> <xsl:copy-of select="@*"/> <xsl:apply-templates/> </xsl:copy> </xsl:template>
... as you can see, fairly simple, and a garden-variety near-identity transform.
Cheers, Wendell
At 06:33 AM 7/7/2006, you wrote: >Hello, > >i have a xml file with some content in it which contains some unwanted >carriage returns and whitespaces. Now I'm trying to write a stylesheet >which makes an exact copy of the source file but without the returns >and whitespaces. I thought this should work: > ><?xml version="1.0" encoding="UTF-8"?> ><xsl:stylesheet version="2.0" >xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> ><xsl:output name="stripped" method="xml" version="1.0" >encoding="UTF-8" indent="yes"/> ><xsl:strip-space elements="*"/> ><xsl:template match="/"> > <xsl:result-document format="stripped" href="result.xml"> > <xsl:apply-templates/> > </xsl:result-document> ></xsl:template> ><xsl:template match="*"> > <xsl:element name="{name()}"> > <xsl:value-of select="normalize-space(translate(translate(., >'
', ' '), '	', ' '))"/> > <xsl:apply-templates/> > </xsl:element> ></xsl:template> ></xsl:stylesheet> > >But the output is a mess in parts. What am I doing wrong?
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] XSLT to remove characters, Wendell Piez | Thread | RE: [xsl] XSLT to remove characters, Michael Kay |
Re: [xsl] Identity transform (case , Dimitre Novatchev | Date | RE: [xsl] How count the Table cell , Byomokesh |
Month |