Re: [xsl] problem with processing CDATA tags in xml

Subject: Re: [xsl] problem with processing CDATA tags in xml
From: David Carlisle <davidc@xxxxxxxxx>
Date: Thu, 08 Apr 2010 13:32:26 +0100
On 08/04/2010 13:01, Robby Pelssers wrote:
Ok....

I need to clarify one thing...

Their product schema does not allow a<Value> to have subtags...

or rather elements don't have element content.


That's why they use CDATA.

a bad workaround (compared to fixing the input schema) as in particular you lose a lot of validation that the input is at least well formed.
Which is the course of the present difficulty.


And in my opinion that's not so bad since from a data point of view these html tags are pure a rendition thing.


If you are going to quote the XML fragment as CDATA it is your responsibility to check that what you are quoting is well formed XML, since the XML parser will not do so. the posted fragment was not well formed, so it seems reasonable that an error is generated at some point once the fragment is unquoted.


If you want to do automatic fixup to the quoted fragments (which is often necessary when processing feeds for example with spurious "html" markup in them) then the thing to do is parse the fragment using an excessively lenient parser such as tag soup, tidy or my own htmlparse
but exactly what errors they will tolerate depends on the parser. I'm not sure what those three do with an unquoted < as occurs in your fragment for example.


But basically I see 2 options from the responses:

(1): Use cdata-sections attribute  on<xsl:output>
(2): make changes to the schema for all elements which may have html tags as children


I still see a problem with (1)... in the end when serializing to html I still want to disable-output-escaping so the browser will recognize<sub> and<sup> as tags instead of plain text... but then the greater then '>' will result in invalid xml.


And I'm not sure if (2) will be accepted since this will involve quite a bit of work to implement the changes.


________________________________________________________________________ The Numerical Algorithms Group Ltd is a company registered in England and Wales with company number 1249803. The registered office is: Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. ________________________________________________________________________


Current Thread