Re: [xsl] Handling CDATA element

Subject: Re: [xsl] Handling CDATA element
From: Jon Gorman <jonathan.gorman@xxxxxxxxx>
Date: Thu, 9 Feb 2006 10:36:58 -0600
On 2/9/06, Thorsten Scherler <thorsten@xxxxxxxxxx> wrote:
> Hi all,
> I have a question regarding the CDATA element.

*cough* not an  element *cough*

Look in the archives of this list for CDATA and I'm sure you'll find
plenty of people dealing with this problem one way or another.

> My problem is the following. I have a rss feed like:

*shudders* embedded html in RSS is always a pain.  Of course, most
newsreaders would probably burst into flames if you used namespaces or
the like.  (Can't remember if that's even allowed.  Some sites have
multiple feed types, bless them).

> That looses the markup information but result in well-formed markup. I
> prefer well-formed over well-presented, but best would be both. ;-)

In other words, you want to have your cake and eat it too.  Well,
there's no good way to do this in XSLT as far as I know.  It requires
the input to be well-formed.  You could attempt some odd several pass
solution.  The exact method would depend on the quality of the html.

If you can't trust the html to even be corrct, you could generate an
in-between format that clearly marks the html section.  Then create
several html files from them.  Convert the html files using something
like tidy, then re-assemble the files into one file.  It would be a
pain though.

If you think the input is mostly well-formed sgml you can skip the
tidy part, have it generate the inbetween format as a smgl file, and
use a converter to make that XML, then rerun it through a XSLT
processor.  This will take care of things like <br> and convert them
to <br />.

Otherwise you'll need to create an sgml parser in XSLT.  Good luck.

Jon Gorman

Current Thread