Re: [xsl] Generating beautiful HTML Source Code

Subject: Re: [xsl] Generating beautiful HTML Source Code
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Wed, 15 Oct 2003 16:34:39 -0400

indent="yes" says "include whatever whitespace you like to pretty-print the serialized output" -- BUT

1. It's optional and not all processors do it
2. Even where it's supported, the implementors' idea of what the output should look like is almost sure to be at odds with your idea. Beauty is in the eye of the beholder, etc.

At 02:25 PM 10/15/2003, you wrote:
So - don't set indent="yes" but to "no".  Hmmm... I
assumed this property was a feature to enable the whitespace which I
specifically *placed* there as in:

<xsl:template name="example">
     <block>    </block>

</block> </xsl:template>

Wouldn't this property set "yes" preserve the above?

No, the indent setting on xsl:output has no bearing on how whitespace in the stylesheet is handled: it's only a hint to the serializer to format your output for you when it writes your transform result to a file. How it does that is generally not in your control.

All whitespace in the stylesheet will be stripped unless you (a) include non-whitespace text with it (in the same text node), or (b) wrap it in an <xsl:text> instruction. (I think setting xml:space="preserve" is also supposed to work, though I've never tried it.)

I would also let everyone know that I found the following note in the MSXML

Note   The Microsoft® XSLT processor will process all of a document's white
space only if the preserveWhiteSpace property has been set to True prior to
loading the document into the DOM. For more information, see How the MSXML
Processor Parses White Space.

I am determined to find a solution!  Wendell, your suggestions were fairly
complicated and I did not understand.  Might you illustrate a simple example
or lead me somewhere on the net that they are doing this?

Basically my suggestion boils down to:

1. Transform your document in such a way that *no* new whitespace is introduced by any component (stylesheet or serializer), and gratuitous whitespace is removed from the source using xsl:strip-space. (If you have a DTD or schema it's easy to determine where whitespace in the source will be "insignificant" or gratuitous -- it's within any element that contains element-only content. Elements that may contain text content, or mixed, should be set to preserve-space.)
2. Run a second pass over your result that does nothing but insert whitespace right where you want it in your HTML. Indent will also be set to "no" (since you still want the serializer to keep its mitts off); but the stylesheet will include rules like:

<xsl:template match="li">
  <xsl:for-each select="ancestor::ul | ancestor::ol">
    <xsl:text>  </xsl:text>
    <xsl:copy-of select="@*"/>

...which has the effect of indenting an <li> two spaces for every <ol> or <ul> ancestor it has. (This stylesheet is an identity transform except for templates to match whatever HTML elements require whitespace.)

Note that many would argue that XSLT is not an optimal tool for performing what is, in effect, fancy string-munging. Note also that this kind of job is not what XSLT is optimized for -- how the result is written to a file (if it is) is out of scope for XSLT viewed narrowly.

But an advantage of this method is that stylesheet #2 can be reused on any XHTML. It's basically a "reformat" stylesheet that follows the Karl rules for pretty-printing HTML (whatever they may be). While it's work, it only needs to be done once.

This kind of post-process (whether XSLT or not) is the *only* way to get your output file formatted exactly the way you want it. Given this, most people sigh and learn to live with what their XSL engine's serializer does, or what Tidy or a similar routine does, which won't give you such fine control, but is often Good Enough.


====================================================================== Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx Mulberry Technologies, Inc. 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ======================================================================

XSL-List info and archive:

Current Thread