Re: [xsl] Process HTML with XSL

Subject: Re: [xsl] Process HTML with XSL
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Fri, 08 Jul 2005 18:03:49 -0400
Dear Esteban,

At 05:25 PM 7/8/2005, you wrote:
The problem, XML page:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="xsl/main.xsl" type="text/xsl"?>
<home title="Title Test">
<![CDATA[
    <table width="100%" border="0" cellspacing="3" cellpadding="3">
      <tr>
        <td width="50%"><img src="/images/logo.png" / alt=""></td>
        <td width="50%"><p>Text</p></td>
      </tr>
    </table>
]]>
</home>


XSL page (extract).


[...]
<xsl:value-of select='/home' disable-output-escaping="no" />


What I want its put this HTML content inside the XSL page, and of course draw the tables and every else, but only write the text as is, but the browser is not process the HTML, just write it :(.

No, the browser isn't processing the HTML ... because it isn't really HTML.


What has happened here is that in your source document, your HTML has been hidden from the parser -- that's what the CDATA marked section is doing. Its entire purpose is to tell the downstream application "don't treat this like code, treat it like text". Accordingly, when an XML structure is built from the input, it doesn't contain any HTML elements at all -- only text that happens to be HTML code (but the system doesn't know that of course).

When you write the value out in your XSLT, what you get is just a string -- all the characters that would ordinarily make this code (in this case, the "<" character) are escaped.

Actually one solution to this is right in front of you: change disable-output-escaping to "yes". As the name of the attribute indicates, this will switch off the usual behavior of escaping "<" characters, so when the file is written the result will contain the HTML without "hiding" it.

Unfortunately, however, although that solution is easy, it's not a very good one. For one (you can read in the archives or in the XSL FAQ all about disable-output-escaping and why the experts say not to use it), you may not be writing a file -- if you're not (for example, if you're processing in a Mozilla client) then d-o-e won't work for you. (It's an optional feature of the language.) For another, even if d-o-e does work, you have no guarantee that the HTML code will actually work -- that it's parseable. It could be broken; there's no good way to know without trying it.

A stronger method would be to pre-process this input to "unhide" the HTML, parse it and make it XML if necessary, and take it from there. That way you'd uncover all problems early, and wouldn't have unpleasant surprises.

For this, you could use disable-output-escaping, write to a file, then check that file for correctness and/or fix it. A common tool for this job is called "HTML Tidy" (find with Google), which will turn most HTML into well-formed XML for you. At that point, you'd have HTML that your stylesheet processor could actually do something with (since it wouldn't be hidden).

My guess is that what's actually happening here will be more visible to you if you inspect the result of your transform directly (don't just look at it in a browser, but look at its source code). Then try flipping the switch on disable-output-escaping from "no" to "yes", run the transform and look again ... that'll give you an idea of what's happening.

But you didn't say what processor you're using; they don't all support disable-output-escaping.

Good luck,
Wendell


====================================================================== Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ======================================================================

Current Thread