RE: [xsl] disappearing line breaks within an element

Subject: RE: [xsl] disappearing line breaks within an element
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Thu, 22 Jan 2009 16:26:32 -0000
Interesting.

If I take this document

<e>a&#xd;b&#xd;c&#xd;d</e>

and do an identity copy with method="xml", I get

<e>a&#xD;b&#xD;c&#xD;d</e>

but if I use method="html", I get

<e>abcd</e>

The code seems to be doing this quite consciously but I can't at the moment
see why. There doesn't seem to be anything to justify it in the
serialization spec. On the other hand, outputting &#xD; to an HTML document
isn't going to do much good in terms of what you see in the browser.

The &#xd; is there in the input tree, and operations that manipulate it seem
to work fine, for example 

  <xsl:output method="html" use-character-maps="crlf"/>
  <xsl:character-map name="crlf">
    <xsl:output-character character="&#xd;" string="#"/>
  </xsl:character-map>

  <xsl:template match="/">
    <xsl:copy-of select="."/>
  </xsl:template>

gives me the output

<e>a#b#c#d</e>

Michael Kay
http://www.saxonica.com/
 

> -----Original Message-----
> From: cavecatem@xxxxxxxxxxxxx [mailto:cavecatem@xxxxxxxxxxxxx] 
> Sent: 22 January 2009 17:03
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] disappearing line breaks within an element
> 
> Dear List,
> 
> I've been struggeling with an entity problem all day, and the 
> things I tried didn't work out.
> 
> I'm working with Saxxon 8B and XSLT 2.0.
> 
> I have an imput file wich contains XML-output from a sybase database. 
> Long text fields may contain line breaks which appear in the 
> XML as &#xD;
> 
> 
> Example:
> <UserField Type="longtext" Name="some name">I'm a veeery long 
> text.I'm a veeery long text in some text.I'm
>                         a veeery long text in some text.I'm a 
> veeery long text in
>                         some text.&#xD;I'm a veeery long text 
> in .I'm a veeery long text in some text.I'm a veeery long
>                         text in some text.I'm a veeery long 
> text in .&#xD;&#xD;&#xD;I'm a veeery long text in 
>                         .I'm a veeery long text in some 
> text.I'm a veeery long
>                         text in some text.I'm a veeery long 
> text in.</UserField>
>                         
> When I transform the XML to HTML, they just dissappear. 
> I searched in the  mailing list archive and I seem to have 
> understood that when the file is parsed, the  &#xD; becomes a 
> line break. I suppose this is why I fail with the following 
> tokenization?
> 
>  <xsl:for-each select="fn:tokenize(.,'&#xD;' )">
>      <xsl:if test="(fn:string-length(fn:normalize-space (.)) &gt; 0)">
>           <xsl:value-of select="."/>
>             <br/>
>           </xsl:if>
>  </xsl:for-each>
> 
> 
> If so, how do I do this?
> 
> I also tried using a character map, but still it does not work out.
> <xsl:character-map name="break">
>         
>             <xsl:output-character character="&#160;"
>                 string="&amp;nbsp;" />
>             <xsl:output-character character="&#xD;" 
> string="&lt;br/>"/>
>             
>            
>     </xsl:character-map>
>     
>  Someone mentioned that with XSLT 2.0 there was a way to 
> convert the parsed entity back to an entity, but I didn't 
> understand how and where I would have to do that.
>  Could someone explain or point me to a book? I tried with 
> Frank Bongers' but unfortunately, I haven't found anything 
> dealing with line break codes within an element (the 
> ebook-Version my library provides has no bookmarks or 
> cross-references which makes it rather unwildy, so maybe I 
> failed to find it).
>  
>  I'd appreaciate any help (and will remember to send my 
> thanks to the list address and not the digest address as 
> happend last time ;-)
>  
>  
>  Regards
>  CJ

Current Thread