Re: [xsl] normalize-space and sequence

Subject: Re: [xsl] normalize-space and sequence
From: Abel Braaksma <abel.online@xxxxxxxxx>
Date: Thu, 27 Sep 2007 00:24:27 +0200
Mathieu Malaterre wrote:

Hum. I am still not clear why this is called implicitly with {} but I
need to call explicitly string-joing before normalize-space.

If you have a series of items, that is, if you have a sequence, the default method for serializing it will be by putting a space between them. Compare:


<xsl:value-of select=" 'test1', 'test2', 'test3' " />

and

<xsl:value-of select=" 'test1', 'test2', 'test3' " separator="" />

The output of the first is: test1 test2 test3
while the output of the second is: test1test2test3

Only the instruction xsl:value-of has this option where you can specify a separator. If you want to do that elsewhere, you have to use string-join. normalize-space() will not help you here, because the sequence of items is still a sequence:

<xsl:value-of select="for $ in ('test1', 'test2', 'test3') return normalize-space($i)" />

will still put a space between each item in the sequence. However, if you surround the whole expression with string-join (and a zero width joiner) it will yield the same results as with the @separator attribute.

I replaced it with:

<xsl:template match="@*">
<xsl:attribute name="{name()}" select="normalize-space(.)"/>
</xsl:template>

See:
http://gdcm.svn.sourceforge.net/viewvc/gdcm/Sandbox/oo2.xsl?r1=1144&r2=1145

which will make the default templates on attributes and text normalize
space.

Running the xsl code still leave whitespace at end/beginning of my output strings:

http://gdcm.svn.sourceforge.net/viewvc/gdcm/trunk/Source/InformationObjectDefinition/ModuleAttributes.xml

Am I missing something ? I'd like to avoid duplicating the string-join
+ normalize-space all over my xsl code.

Maybe I am missing something. I find the links rather confusing. The XSLT that you point at clearly has xsl:output method="xml", but the xml you are pointing at is not valid XML, not even valid XHTML. The code you show with the XML link may be output by a output method="html". Do you mix stylesheets? Can you show a small bit for input and output that illustrates your problem? Why do we see wrong XHTML?


But back to your problem. Your question still holds, of course: how to get rid of extra whitespace at end or beginning of your strings. I wonder whether you mean at the beginning and end of strings that are part of a sequence, or whether you mean at the end and beginning of the resulting string after serialization (allowing spaces inside the string). Again, a small but explanatory input/output + xslt code sample would really help, along with what you want the output to become.

Sorry, I am adrift.... Surely Michael Kay and/or David meant using these output templates when the nodes still have to be processed. If the nodes are already being processed, it will add little to use an extra default template. You can use xsl:next-match, but that can be tricky to get right in your situation. Instead, I'd use a different opening and re-process the nodes that you want to get rid of the extra whitespace:

<xsl:template match="/">
  <xsl:variable name="original">
     <xsl:apply-templates />
  </xsl:template>

  <xsl:apply-templates select="$original/*" mode="remove-ws" />
</xsl:template>

<xsl:template match="node()">
   <xsl:copy>
      <xsl:apply-templates select="node() | @*" />
   </xsl:copy>
</xsl:template>

<xsl:template match="*/text()">
  <xsl:sequence select="replace(., '^ +| +$', '')" />
</xsl:template>

<xsl:template match="@*">
   <xsl:attribute name="{name()}" select="replace(., '^ +| +$', '')" />
</xsl:template>


This little addition will remove the necessity to remove the whitespace on every single node. Instead of the replace() function you can use normalize-space, but if you have significant whitespace elsewhere, then remember that it will also be normalized.


Finally, if you need this more than once (i.e. for several of your stylesheets), then put the above in a separate stylesheet and use xsl:import on it. Now you may wonder how ever the import will be called and after that your normal stylesheet takes over the handling. Well, I think the best answer to that is using the following in your master stylesheet to make sure the templates in the imported stylesheet are called, and that still the normal processing takes place (with precedence for your master stylesheet):

<xsl:template match="/">
   <xsl:next-match />
</xsl:template>


Maybe I totally misunderstood your question, but if I did, please respond with a tiny example of what you are after: a small but working xslt document, a small input and a small output xml document, plus an extra output xml containing what you want different. You know, with xml it is like with pictures: one xml says more than a thousand words, but thousand xmls (or very large ones) say nothing. ;)


Cheers,
-- Abel Braaksma

Current Thread