[xsl] Where did my tabs go? Trying to understand xsl:value-of, tabs and the separator attribute

Subject: [xsl] Where did my tabs go? Trying to understand xsl:value-of, tabs and the separator attribute
From: Abel Braaksma <abel.online@xxxxxxxxx>
Date: Mon, 16 Oct 2006 20:11:18 +0200
Hi List!

I do a lot of XML-2-TEXT processing, of which some are tab-delimited based. I have some troubles understanding what <xsl:value-of /> (and other constructs) do with tab characters, when you explicitly do not want them to be normalized to spaces. Here's what I trialled and errorred:

Two entities:
<!ENTITY tab "&#x09;" >
<!ENTITY separator "&#xE0F1;" >

A character map:
   <xsl:character-map name="separator">
       <xsl:output-character character="&separator;" string="&tab;"/>

An applied output method:
<xsl:output method="text" indent="no" use-character-maps="separator" />

A variable:
<xsl:variable name="tabchar" select="'&#x09;'" />

With the following statements gives:
(a tab)   <xsl:value-of select="$tabchar" />
(a tab)   <xsl:value-of select="'&#x09;'" />
(no tab)  <xsl:value-of select="'&tab;'" />
(no tab)  <xsl:value-of select="'&separator;'" />

(tabs)    <xsl:value-of select="somenode" separator="{$tabchar}" />
(tabs)    <xsl:value-of select="somenode" separator="&#x09;" />
(no tabs) <xsl:value-of select="somenode" separator="{&tab;}" />
(no tabs) <xsl:value-of select="somenode" separator="{&separator;}" />

Basically the same story applies to other instructions, like copy-of, <xsl:text> etc. I was under the impression that it didn't matter whether you had a numerical entity reference of something, or a named entity reference. If I replace the tab mapping for something different, say " ", the spaces are kept. Or for "|||", the string will be kept.

Though I can resolve this by using a global variable, or intersperse my code everywhere with xsl:text, things go worse when using functions etc returning strings with tabs. Mostly, they are lost somewhere along the process. So my hope was on using a character map, so that I can freely substitute the separator in the end with a tab character. It works for everything, including a bunch of spaces, but not for tab characters

With newlines, btw, it is even a bit different: it works when putting the numerical character reference into the character map (the eqv. does not work for tabs), it does not work if I put the named entity inside the character map.

I am sure I missed a theory lesson somewhere. And I am not so sure anymore if this has a nice solution. If someone can enlighten me, I would be most thankful. I have been working around this issue with odd solutions for weeks now.

-- Abel Braaksma

NB: I have tried the character map literal string with a literal tab, a numerical char reference and a named char reference, all the same results.

Current Thread