Subject: Re: [xsl] Why does the tokenize() function behave strangely when I use ENTITIES and variables? From: "G. Ken Holman g.ken.holman@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Thu, 7 Apr 2016 14:20:40 -0000 |
I have a stylesheet which reads a text file and tokenizes it. The token delimiter is two consecutive newline characters (hex 0A, hex 0A).
If I use the tokenize() function like this:
tokenize($text-file, '

')
then the text file is correctly tokenized.
But if I create an entity:
<!DOCTYPE xsl:stylesheet [ <!ENTITY line-separator '
'> ]>
and a variable whose value is two line-separators:
<xsl:variable name="rule-separator" select="'&line-separator;&line-separator;'"/>
and then use the variable with the tokenize() function:
tokenize($text-file, $rule-separator)
then the text file is not tokenized correctly. Specifically, the XSLT processor uses two consecutive space characters (hex 20, hex 20) as the token delimiter rather than two consecutive newline characters (hex 0A, hex 0A) as the token delimiter.
Do you know why this is happening?
https://www.w3.org/TR/REC-xml/#AVNormalize Subsection 3, bullet 1 states that a character reference is appended. Subsection 3, bullet 3 states that any white-space character found in the attribute value is normalized to a space.
How do I fix it?
https://www.w3.org/TR/REC-xml/#sec-references "An entity reference refers to the *content* of a named entity." (my emphasis)
t:\>type ent.xsl <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE xsl:stylesheet [ <!ENTITY line-separator1 '
'> <!ENTITY line-separator2 '&#x0A;'> ]> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:template match="/"> <xsl:value-of select="'1=',string-to-codepoints('&line-separator1;'), '
2=',string-to-codepoints('&line-separator2;'), '
3=',string-to-codepoints('
')"/> </xsl:template>
</xsl:stylesheet> t:\>xslt2 ent.xsl ent.xsl 1= 32 2= 10 3= 10 t:\>
-- Check our site for free XML, XSLT, XSL-FO and UBL developer resources | Streaming hands-on XSLT/XPath 2 training @US$45: http://goo.gl/Dd9qBK | Crane Softwrights Ltd. _ _ _ _ _ _ http://www.CraneSoftwrights.com/s/ | G Ken Holman _ _ _ _ _ _ _ _ _ _ mailto:gkholman@xxxxxxxxxxxxxxxxxxxx | Google+ blog _ _ _ _ _ http://plus.google.com/+GKenHolman-Crane/posts | Legal business disclaimers: _ _ http://www.CraneSoftwrights.com/legal |
--- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Why does the tokenize() f, Michael Kay mike@xxx | Thread | [xsl] I output a tab character, the, Costello, Roger L. c |
Re: [xsl] Why does the tokenize() f, Michael Kay mike@xxx | Date | [xsl] I output a tab character, the, Costello, Roger L. c |
Month |