Subject: Re: [xsl] extracting data in CDATA block of a XML document From: Mike Brown <mike@xxxxxxxx> Date: Fri, 23 Aug 2002 10:09:44 -0600 (MDT) |
Srinivas Ch wrote: > Now I need to extract all the elements between the > <![CDATA[ and ]]> and write it into a new xml file. This is a FAQ, but we all like to give long-winded answers rather than point you to www.dpawson.co.uk. The other answers to your question so far have been trying to tell you: 1. What you want is not possible with XSLT, at least not in a way that is reliable. We aren't going to tell you the unreliable way because you need to approach this problem differently if you don't want to get burned. 2. It was a poor design decision to embed structured markup in the character data content of an XML element. Character data is by definition NOT MARKUP. 3. CDATA sections are a convenience for document authors and are relevant for input only. They just keep you from having to escape "<" and "&" in character data. It means "this looks like markup but it isn't really". The idea is that <foo><![CDATA[<bar/>]]></foo> and <foo><bar/></foo> mean exactly the same thing: An element named 'foo' containing the 6 characters '<bar/>'; NOT an element named 'foo' containing an empty element named 'bar'. If you wanted the latter, you'd have written <foo><bar/></foo>. In XPath/XSLT you deal with a node tree that is set up quite similarly: element 'foo' in no namespace | |__text '<bar/>' The text node is going to be what you see there, regardless of whether you used a CDATA section in the original document. Since you want XML output, your question is how do you produce a result tree that looks like this element 'bar' in no namespace And the answer is, that's pretty darn difficult because you would have to mimic the duties of an XML parser, tearing apart the string in the text node in order to build the right nodes in the result tree. The workaround that some idiot is going to suggest with a "hey it works for me!" but not realizing how unportable it is, is going to involve leaving the text node unchanged but flagging it as an exceptional case for unmodified serialization, so that it will be emitted as a string of what could very well be total garbage in the middle of proper, well-formed XML. And that's assuming you're serializing the result tree, which isn't always a good assumption (in a browser-based processor you're likely to be passing it as a DOM). - Mike ____________________________________________________________________________ mike j. brown | xml/xslt: http://skew.org/xml/ denver/boulder, colorado, usa | resume: http://skew.org/~mike/resume/ XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] extracting data in CDATA , David Carlisle | Thread | RE: [xsl] extracting data in CDATA , Curtis Burisch |
Re: [xsl] 8bit ascii encoding, Mike Brown | Date | [xsl] Creating sveral pages while m, jody |
Month |