RE: [xsl] Un-escape and re-transform

Subject: RE: [xsl] Un-escape and re-transform
From: "Robert C. Lyons" <boblyons@xxxxxxxxxx>
Date: Tue, 10 Apr 2001 10:47:02 -0400
Bas writes:
> My Content Provider delivers XML files with partially escaped HTML tags,
for
> example:
> <content>
>         <web>
>                 &lt;P>This is text.&lt;/P>
>                 &lt;P>This is more text.&lt;/P>
>         </web>
> </content>
>
> My quest is to replace the "&lt;" by the un-escaped "<" character, and
then
> redo the XSLT for that <P>...</P> bit.

Bas,

I would beg the Content Provider to place well-formed
HTML (or XHTML) in the XML documents (rather than HTML,
in which the markup is escaped).

A few weeks ago, we had the exact same problem.
We were lucky, since the sender of the XML data was
willing to embed well-formed HTML in the XML document.

I hope that you are as lucky.
If not, then perhaps you could use the following XSLT
stylesheet to unescape the markup:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
version="1.0">

  <xsl:output method="xml" indent="yes"/>

  <xsl:template match="text()">
    <xsl:value-of disable-output-escaping="yes" select="."/>
  </xsl:template>

  <xsl:template priority="-1"
                match="@* | * | text() | processing-instruction() |
comment()">
    <!-- Identity transformation. -->
    <xsl:copy>
      <xsl:apply-templates
           select="@* | * | text() | processing-instruction() | comment()"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

The problem with this approach is that it
will unescape markup characters that are
not really markup. For example:

<content>
  <web>
    &lt;P>C'est dommage. :-&lt; &lt;/P>
  </web>
</content>

If there's any chance that the escaped
HTML will contain markup characters that are
not really markup, then I think you'll need
to write a more sophisticated unescape
algorithm.

Hope this helps.

Bob

<sig name    = 'Bob Lyons'
     title   = 'B2B Integration Consultant'
     company = 'Unidex, Inc.'
     phone   = '+1-732-975-9877'
     email   = 'boblyons@xxxxxxxxxx'
     url     = 'http://www.unidex.com/'
     product = 'XML Convert: transforms flat files to XML and vice versa' />


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread