RE: [xsl] Un-escape and re-transform

Subject: RE: [xsl] Un-escape and re-transform
From: "Robert C. Lyons" <boblyons@xxxxxxxxxx>
Date: Tue, 10 Apr 2001 10:47:02 -0400
Bas writes:
> My Content Provider delivers XML files with partially escaped HTML tags,
> example:
> <content>
>         <web>
>                 &lt;P>This is text.&lt;/P>
>                 &lt;P>This is more text.&lt;/P>
>         </web>
> </content>
> My quest is to replace the "&lt;" by the un-escaped "<" character, and
> redo the XSLT for that <P>...</P> bit.


I would beg the Content Provider to place well-formed
HTML (or XHTML) in the XML documents (rather than HTML,
in which the markup is escaped).

A few weeks ago, we had the exact same problem.
We were lucky, since the sender of the XML data was
willing to embed well-formed HTML in the XML document.

I hope that you are as lucky.
If not, then perhaps you could use the following XSLT
stylesheet to unescape the markup:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="";

  <xsl:output method="xml" indent="yes"/>

  <xsl:template match="text()">
    <xsl:value-of disable-output-escaping="yes" select="."/>

  <xsl:template priority="-1"
                match="@* | * | text() | processing-instruction() |
    <!-- Identity transformation. -->
           select="@* | * | text() | processing-instruction() | comment()"/>


The problem with this approach is that it
will unescape markup characters that are
not really markup. For example:

    &lt;P>C'est dommage. :-&lt; &lt;/P>

If there's any chance that the escaped
HTML will contain markup characters that are
not really markup, then I think you'll need
to write a more sophisticated unescape

Hope this helps.


<sig name    = 'Bob Lyons'
     title   = 'B2B Integration Consultant'
     company = 'Unidex, Inc.'
     phone   = '+1-732-975-9877'
     email   = 'boblyons@xxxxxxxxxx'
     url     = ''
     product = 'XML Convert: transforms flat files to XML and vice versa' />

 XSL-List info and archive:

Current Thread