RE: [xsl] Character entities in attribute values

Subject: RE: [xsl] Character entities in attribute values
From: "Michael Kay" <mhk@xxxxxxxxx>
Date: Wed, 23 Apr 2003 19:17:28 +0100
It looks like a simple explanation - you were using a product with a
serious bug in it.

Michael Kay
Software AG
home: Michael.H.Kay@xxxxxxxxxxxx
work: Michael.Kay@xxxxxxxxxxxxxx 

> -----Original Message-----
> From: owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx 
> [mailto:owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx] On Behalf Of 
> mark_fletcher@xxxxxxxxxxxxxx
> Sent: 23 April 2003 18:01
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: Re: [xsl] Character entities in attribute values
> 
> 
> 
> Hi Mike (and others who have responded),
> 
> First, I've found and fixed the problem.  I'm using 
> Arbortext's E3 product to do my processing and there was an 
> instruction in their internal code to write out non-ASCII 
> characters as numeric character references.  So, that's how 
> the accented unicode characters in the tag attributes became 
> character references.  Once I fixed that problem, the HTML 
> output was fine, as there were no ampersands in any of the 
> attribute values.
> 
> However, it still sounds like you're all saying that even 
> when a character reference does exist in an attribute value, 
> I should not be seeing escaped ampersands when that attribute 
> value is output as text.  Well, if anyone's interested (and 
> I'm not sure why you would be, at this point ;-) here's a 
> sample of my previous input and output data and my xsl code 
> that demonstrates the problem I was having:
> 
> source xml tag:
> 
> <xref linkend="i090f42a68009c2c9" book_code="cmkt" 
> book_title="Guide Marketing du syst&#xe8;me GRC de 
> PeopleSoft, version 8.8" chapter_title="D&#xe9;finition des 
> entit&#xe9;s de l'application Marketing de PeopleSoft" 
> XREF_type="3" target_title="D&#xe9;finition des entit&#xe9;s 
> de l'application Marketing de PeopleSoft" 
> chapter_type="Chapitre" file_name="cmkt03.htm"/>
> 
> xsl template for this element:
> 
> <xsl:template name="xref">
>   <A 
> HREF="../../{@book_code}/htm/{@file_name}#{@linkend}"><xsl:value-of
> select="@target_title"/></A>
> </xsl:template>
> 
> html output:
> 
> <A 
> HREF="../../cmkt/htm/cmkt03.htm#i090f42a68009c2c9">D&amp;#xe9;finition
> des entit&amp;#xe9;s de l'application Marketing de PeopleSoft</A>
> 
> 
> 
> 
> Mark Fletcher
> PeopleSoft Language Engineering
> 925.694.3753
> mark_fletcher@xxxxxxxxxxxxxx
> 
> 
> 
>                                                               
>                                                             
>                       "Mike Brown"                            
>                                                             
>                       <mike@xxxxxxxx>                   To:   
>     xsl-list@xxxxxxxxxxxxxxxxxxxxxx                         
>                       Sent by:                          cc:   
>                                                             
>                       owner-xsl-list@xxxxxxxxxxx        
> Subject:  Re: [xsl] Character entities in attribute values        
>                       rrytech.com                             
>                                                             
>                                                               
>                                                             
>                                                               
>                                                             
>                       04/23/2003 06:05 AM                     
>                                                             
>                       Please respond to xsl-list              
>                                                             
>                                                               
>                                                             
>                                                               
>                                                             
> 
> 
> 
> 
> 
> mark_fletcher@xxxxxxxxxxxxxx wrote:
> > the output text looks something like this: &amp;eacute; instead of 
> > this: &eacute;
> 
> First please realize that when you output XML or HTML, the 
> XSLT processor is (effectively, not necessarily) running a 
> node tree through a serializer, and the serializer is what is 
> escaping "&" and "<" and certain other characters appearing 
> in places where they would otherwise be confused with markup.
> 
> If you're getting &amp;eacute; in the output, then you must 
> have put the 8 characters "&" "e" "a" "c" "u" "t" "e" ";" 
> into an attribute node (or text node, but you mentioned 
> attribute) in your result tree, perhaps by copying this text 
> from the source tree. Since you told the processor you wanted the
> *node* to contain those 8 characters, rather than 1 entity 
> reference, it serialized the node in such a way that you'd 
> get the characters when the output document is parsed. In 
> other words, it preserved the semantics of the data, clearly 
> distinguishing between character data and the structures 
> implied by markup.
> 
> Given that the XML parser feeding parsed data to the XSLT 
> processor would have interpreted "&eacute;" in your original 
> source document as a reference to the entity named acute, 
> there's no way the 8 characters could have ended up in your 
> source tree unless you did one of the following:
>  - explicitly constructed that string in your stylesheet
>  - copied text that was originally written like &amp;eacute;
>  - copied text that was originally written like <![CDATA[&eacute;]]>
> 
> Both of the latter two mean exactly the same thing, and since 
> the most common FAQ and misconception on this list (well, one 
> of the most common) is the mistaken assumptions people make 
> about what CDATA sections are, I'm going to guess that 
> whoever made your XML decided to try to use it as a transport 
> for entity-laden, non-well-formed HTML, saying that this data 
> is just text, not markup. Then you tried to use XSLT to copy 
> it through, and were surprised to see that you can't use XSLT 
> to pretend character data is actually markup.
> 
> However, as others have mentioned, this is just a wild guess. 
> Explain more about what you're doing, with sample code (brief).
> 
> 
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
> 
> 
> 
> 
> 
> 
> 
> 
> 
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
> 


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread