Subject: RE: [xsl] Character entities in attribute values From: "Michael Kay" <mhk@xxxxxxxxx> Date: Wed, 23 Apr 2003 19:17:28 +0100 |
It looks like a simple explanation - you were using a product with a serious bug in it. Michael Kay Software AG home: Michael.H.Kay@xxxxxxxxxxxx work: Michael.Kay@xxxxxxxxxxxxxx > -----Original Message----- > From: owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx > [mailto:owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx] On Behalf Of > mark_fletcher@xxxxxxxxxxxxxx > Sent: 23 April 2003 18:01 > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Subject: Re: [xsl] Character entities in attribute values > > > > Hi Mike (and others who have responded), > > First, I've found and fixed the problem. I'm using > Arbortext's E3 product to do my processing and there was an > instruction in their internal code to write out non-ASCII > characters as numeric character references. So, that's how > the accented unicode characters in the tag attributes became > character references. Once I fixed that problem, the HTML > output was fine, as there were no ampersands in any of the > attribute values. > > However, it still sounds like you're all saying that even > when a character reference does exist in an attribute value, > I should not be seeing escaped ampersands when that attribute > value is output as text. Well, if anyone's interested (and > I'm not sure why you would be, at this point ;-) here's a > sample of my previous input and output data and my xsl code > that demonstrates the problem I was having: > > source xml tag: > > <xref linkend="i090f42a68009c2c9" book_code="cmkt" > book_title="Guide Marketing du système GRC de > PeopleSoft, version 8.8" chapter_title="Définition des > entités de l'application Marketing de PeopleSoft" > XREF_type="3" target_title="Définition des entités > de l'application Marketing de PeopleSoft" > chapter_type="Chapitre" file_name="cmkt03.htm"/> > > xsl template for this element: > > <xsl:template name="xref"> > <A > HREF="../../{@book_code}/htm/{@file_name}#{@linkend}"><xsl:value-of > select="@target_title"/></A> > </xsl:template> > > html output: > > <A > HREF="../../cmkt/htm/cmkt03.htm#i090f42a68009c2c9">D&#xe9;finition > des entit&#xe9;s de l'application Marketing de PeopleSoft</A> > > > > > Mark Fletcher > PeopleSoft Language Engineering > 925.694.3753 > mark_fletcher@xxxxxxxxxxxxxx > > > > > > "Mike Brown" > > <mike@xxxxxxxx> To: > xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Sent by: cc: > > owner-xsl-list@xxxxxxxxxxx > Subject: Re: [xsl] Character entities in attribute values > rrytech.com > > > > > > 04/23/2003 06:05 AM > > Please respond to xsl-list > > > > > > > > > > > mark_fletcher@xxxxxxxxxxxxxx wrote: > > the output text looks something like this: &eacute; instead of > > this: é > > First please realize that when you output XML or HTML, the > XSLT processor is (effectively, not necessarily) running a > node tree through a serializer, and the serializer is what is > escaping "&" and "<" and certain other characters appearing > in places where they would otherwise be confused with markup. > > If you're getting &eacute; in the output, then you must > have put the 8 characters "&" "e" "a" "c" "u" "t" "e" ";" > into an attribute node (or text node, but you mentioned > attribute) in your result tree, perhaps by copying this text > from the source tree. Since you told the processor you wanted the > *node* to contain those 8 characters, rather than 1 entity > reference, it serialized the node in such a way that you'd > get the characters when the output document is parsed. In > other words, it preserved the semantics of the data, clearly > distinguishing between character data and the structures > implied by markup. > > Given that the XML parser feeding parsed data to the XSLT > processor would have interpreted "é" in your original > source document as a reference to the entity named acute, > there's no way the 8 characters could have ended up in your > source tree unless you did one of the following: > - explicitly constructed that string in your stylesheet > - copied text that was originally written like &eacute; > - copied text that was originally written like <![CDATA[é]]> > > Both of the latter two mean exactly the same thing, and since > the most common FAQ and misconception on this list (well, one > of the most common) is the mistaken assumptions people make > about what CDATA sections are, I'm going to guess that > whoever made your XML decided to try to use it as a transport > for entity-laden, non-well-formed HTML, saying that this data > is just text, not markup. Then you tried to use XSLT to copy > it through, and were surprised to see that you can't use XSLT > to pretend character data is actually markup. > > However, as others have mentioned, this is just a wild guess. > Explain more about what you're doing, with sample code (brief). > > > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list > > > > > > > > > > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Character entities in att, mark_fletcher | Thread | [xsl] concatenated key, I-Lin Kuo |
Re: [xsl] xsl-fo header problems, G. Ken Holman | Date | [xsl] Variable Depth Section Tabs i, burnett |
Month |