Re: [xsl] Where did my tabs go? Trying to understand xsl:value-of, tabs and the separator attribute

Subject: Re: [xsl] Where did my tabs go? Trying to understand xsl:value-of, tabs and the separator attribute
From: Abel Braaksma <abel.online@xxxxxxxxx>
Date: Mon, 16 Oct 2006 23:06:54 +0200
David Carlisle wrote:

why can't I use the following to the same success?

now I have a moral dilemma, do I answer "it's obvious that won't work"
or do I answer "that's what I tried in my first draft reply"... hmmmmm

:-)


essentialy the reason is that amp is _defined_ to be already double
quoted, precisely so that use of amp survives this entity expansion
still as a quoted character that is nt taken as markup, but
Basically this is an edge case where intuition or simplifications like
"entities expanded first" don't really help. The xml spec specifies a
particular algorithm for normalising attribute values, and how it
interacts with character and entity reference. Most of the time it just
does "the obvious thing" but sometimes like here there's no avoiding
just stepping through the algorithm and seeing what happens. (Or as I
just did in fact use a parser like rxp and trust that Richard read the
specified algorithm carefully.

I guess this is more about XML than it is about XSLT, but since XSLT *is* XML, I thought to look it up, now that I understand where to look. The following example actually explained enough, just by looking at it ( http://www.w3.org/TR/xml/#intern-replacement )


The following declarations:
<!ENTITY % pub    "&#xc9;ditions Gallimard" >
<!ENTITY   rights "All rights reserved" >
<!ENTITY   book   "La Peste: Albert Camus,
&#xA9; 1947 %pub;. &rights;" >

Becomes the following replacement string:
La Peste: Albert Camus,
) 1947 Iditions Gallimard. &rights;

Then:
The general-entity reference "&rights;" would be expanded should the reference "&book;" appear in the document's content or an attribute value.


My guess is that this also happens when using named predefined entity references. So, you are absolutely right, there's a difference in expansion of named entity references and numeric entity references, where the latter are replaced in place, and the former actually end up in the document. In our scenario this would give:

This declaration
<!ENTITY tab "&amp;#x09;">

Becomes this replacement string: &amp;#x09;

This replacement string is used inside the XML document (which is the xslt stylesheet), and after parsing becomes the literal string: &#x09; Only if it were parsed again, it would be replaced with a [tab] character. Whereas the following (from your resolution):

This declaration
<!ENTITY tab "&38;#x09;">

Becomes this replacement string: &#x09;

This then, will be expanded in the XML as a literal [tab] character.

Never knew there were so much trickery involved. I wonder what normalize-space(normalize-space('&tab;')) would do. Will it remove the whitespace? Well, never mind, to be sure of correct handling, I think I stick to the safe-haven of character maps, now that I learned how to apply them.

Thanks for all the help, David!

Cheers,
-- Abel Braaksma

Current Thread