RE: [xsl] Mistake in tokenizing under Saxon 8.2

Subject: RE: [xsl] Mistake in tokenizing under Saxon 8.2
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Fri, 21 Jan 2005 10:40:49 -0000
The spaces that you get in your output are not being copied from the input,
they are being generated by virtue of the rule that a single space is
inserted between adjacent atomic values delivered in the result of a content
constructor. This space isn't inserted between a string and a node, only
between two strings.

(The reason for this rule is primarily for the case where you are generating
list-valued simple content, e.g. ("red", "blue", "green"). It's less
satisfactory when generating complex content. It means that you need to
understand the rather subtle distinction between a string and a text node:
<xsl:value-of select="'a'"/> outputs a text node, while <xsl:sequence
select="'a'"/> outputs a string. <xsl:copy-of/> produces whatever it is
given, which in this case is a string. Text nodes are output without
generating separator spaces.)

I think that if you output a text node containing a single space either side
of the <a> element, you will get the required effect. You can do this with

<xsl:text> </xsl:text> 

Michael Kay
http://www.saxonica.com/

> -----Original Message-----
> From: Nicholas Hemley [mailto:Nicholas.Hemley@xxxxxxxxxxxxxxx] 
> Sent: 21 January 2005 09:50
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] Mistake in tokenizing under Saxon 8.2
> 
> Hello,
> 
> I presume that I have made a mistake somewhere in the stylesheet when
> using the tokenize function under Saxon 8.2 - for some reason I am
> loosing the whitespace chars around the matched regular expression.
> 
> For example, the following pattern:
> text text [link,alt,link_text] text
> 
> should be transformed to:
> 
> text text <a href="link" alt="alt">link text</a> text
> 
> BUT
> 
> I am loosing the whitespasce characters around the <a> as follows:
> 
> text text<a href="link" alt="alt">link text</a>text
>               ^                                               
>          
>  ^
> Why is this please? All the other whitespace chars are copied OK, even
> though I am tokenising on whitespace.
> 
> If I use a &nbsp; in the stylesheet to compensate for the 
> loss, it adds
> two spaces, not one, which is wierd, so this is not currently a viable
> solution.
> 
> Any input appreciated!
> 
> Many thanks,
> Nic.
> 
> Appendix: Stylesheet Snippet
> 
>   <xsl:template match="/html/body/P|p">
>     <!-- copy node plus select contents -->
>     <xsl:copy>
> 
>               <xsl:variable name="tokens" select="tokenize(.,'\s+')"/>
> 
>               <xsl:for-each select="$tokens">
> 
>                 <xsl:choose>
>                   <xsl:when test='matches(.,"\[(.*),(.*),(.*)\]")'>
> 
>                     <xsl:variable name="elValue" select="."/>
> 
>                       <xsl:analyze-string select="$elValue"
> regex="\[(.*),(.*),(.*)\]">
> 
>                         <xsl:matching-substring>
>                           <a href="{regex-group(1)}">
>                                   <xsl:attribute name='alt'>
>                                     <xsl:value-of
> select='replace(regex-group(3), "_"," ")'/>
>                                   </xsl:attribute>
>                                   <xsl:value-of
> select='replace(regex-group(2), "_"," ")'/>
>                            </a>
>                         </xsl:matching-substring>
> 
>                       </xsl:analyze-string>
> 
>                   </xsl:when>
>                   <xsl:otherwise>
>                     <xsl:copy-of select="."/>
>                   </xsl:otherwise>
>                 </xsl:choose>
>               </xsl:for-each>
>     </xsl:copy>
>   </xsl:template>
> 
> 
> 
> **********************************************************************
> The information contained in this message may be confidential 
> or legally privileged and is intended for the addressee only, 
> If you have received this message in error or there are any 
> problems please notify the originator immediately. The 
> unauthorised use, disclosure, copying or alteration of this 
> message is strictly forbidden.
> **********************************************************************

Current Thread