Re: [xsl] manipulating string

Subject: Re: [xsl] manipulating string
From: Jon Gorman <jonathan.gorman@xxxxxxxxx>
Date: Fri, 16 Sep 2005 16:48:17 -0500
> Basically I
> want to convert the HTML tags in CDATA to equivalent
> FO tags using XSL. How can this be accomplished?

This is an FAQ.  In fact, it's such an FAQ searching CDATA xslt will
return the faq entry for this list as the top return in Google and
Yahoo ;).  Granted, it tends to talk more about trying to output it
rather than take it in as input.

CDATA does not contain tags nor elements.  It contains things that
look like tags and elements.  But they're not.  CDATA just says that
"this area all characters are escaped", so doing a CDATA area is just
the same as (I'm escaping characters here just in case they get

               <ul><li> TouchTex</li>
               <li> Convertible</li>
               <li> Pockets <br/>with pencil</li></ul>]]>

is the same as

 <description>&lt;ul>&lt;li> TouchTex&lt;/li>
               &lt;li> Convertible&lt;/li>
               &lt;li> Pockets &lt;br/>with pencil&lt;/li>&lt;/ul>

So what you are really asking is how do I treat the strings that look like

There are a couple of options.

1)  The first and formost is to change the input to use namespaces.
Most people don't have the luxury of changing their input though.

2)  Transform those < in a two step process.  This might result in an
invalid file that can't be processed.  Tidy, might, just might save
you.  Use either xslt or perl for this (and check out d-o-e)

3)  Convert the string and put the info into an element.  For simple
cases this can be straightforward, but I would be nervous about trying
to actively do it much.  There was a post on ignoring html not too
long ago.  You could try to adapt that if the input is somewhat
predictable (of course, since there is no way to validate it you can't
be sure).

Searching the archives for this problem returned a lot of hits, but
here is a link from
Wendell Piez that might explain it better than I am.  (this problem is
also described occasionally as "embedded html" even though it's not
really "html")

Jon Gorman

Current Thread