Subject: Re: [xsl] HTML tags from XML content From: Mike Brown <mike@xxxxxxxx> Date: Sat, 10 Feb 2001 02:20:19 -0700 (MST) |
Mingbo Qin wrote: > Hell, Well, damn! :) Hello, I think you mean. > I am trying to transform an XML file to HTML format. The original XML > elements contain some HTML formatting tags in "&lt;P&gt;" format. Garbage in, garbage out! The usual problem is the XML contains "<P>", but you've got it doubly-escaped, don't you. This indicates a bigger problem. You are using XML for something it was really not designed for. You (or whoever is writing this XML) are cramming structured markup into a character data content of an element, and then trying to extract it and treat it as something other than the text it literally represents. First, acknowledge the fact that "&lt;P&gt;" in XML means nothing more than the sequence of 9 characters & l t ; P & g t ; ...That is, as far as the XML parser and the XPath/XSLT processor is concerned, this not a <P> start tag representing one of the boundaries of a 'P' element; it's just a text string. Next, consider that an XSLT processor, if you tell it to emit HTML, is going to treat such a string in the result tree as just text, and it will serialize it in such a way that it will not be confused with markup. Thus, the "&" characters are going to be escaped as & again upon output. The remaining characters don't need to be escaped. So the output will be "&lt;P&gt;" in the HTML. Your browser will render this as "<P>" as you have noted. > I want this to be converted to "<P>". My guess is even if somehow I can make > this to happen, a "<P>" string will be diplayed on the browser. XSLT gives processors the option of supporting the disable-output-escaping="yes" attribute on the XSLT instruction elements that result in the creation of text nodes (xsl:value-of and xsl:text). If the processor supports it, the text node will be emitted with output escaping disabled, so you could conceivably get "<P>" in the HTML, which will be rendered as "<P>" in the browser. So your guess is correct. There are reasons why disable-output-escaping is bad. It is optional, for one thing, so you can't be assured that your code will be portable. It also can result in the production of output that does not conform to standards and thus may not be able to be read back in. In the case of plain old HTML this is of little concern, since browsers expect to get tag soup anyway, but in the case of XHTML or any other XML, it's a big deal and something you want to avoid. What you want to do is translate occurrences of the characters & l t ; and & g t ; in a string to just < and >, respectively. Since the XPath translate() function only works with single characters, you will have to do this with a recursive named template. I have an example of this technique at http://skew.org/xml/stylesheets/replace/ Good luck, and try to do something about that XML. XML is just not a good carrier for HTML. Maybe run the HTML through Tidy first to make it XHTML, so you don't have to worry about this stuff and can just use xsl:copy-of? - Mike ____________________________________________________________________ Mike J. Brown, software engineer at My XML/XSL resources: webb.net in Denver, Colorado, USA http://skew.org/xml/ XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] HTML tags from XML content, Mingbo Qin | Thread | Re: [xsl] HTML tags from XML conten, Peter Flynn |
Re: [xsl] International Characters , Michael Beddow | Date | Re: [xsl] HTML tags from XML conten, Peter Flynn |
Month |