Re: [xsl] Wrapping pieces of content separately

Subject: Re: [xsl] Wrapping pieces of content separately
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Fri, 25 Aug 2006 10:31:04 -0400
Hi,

To add to what Jay says, this is going to be a common problem as the requirement to generate schema-valid XHTML becomes more common, as it now seems to be doing.

Jay's approach is one way to do it. There are also others; which is best probably depends. If you are maintaining your stylesheets in connection with a source format (with a schema), relating the logic of paragraph-splitting to the declarations of that schema may be the most dependable, but Jay's approach (which examines the placement of text nodes) has the virtue of flexibility and doesn't require extra effort when the source format changes. One of the down sides of Jay's way is that you may have issues with white space (though those too could be coded around)....

Another difference will be whether you can use XSLT 2.0 or not. With 2.0 pipelining there are various ways you can annotate your input document to make it easier to split the paragraphs around the lists, tables etc.; and you also have very nice grouping constructs native to the language.

Eventually we'll probably see web tutorials about this. Unfortunately for me, this probably goes on list #3 (things I'd do if I had not only another life to do them, but extra time in that other life) so someone else may get the credit (and deserve it).

Cheers,
Wendell

At 05:07 PM 8/24/2006, Jay wrote:
Hi, Emily,

As it happens, I've solved this problem in the past. The trick to is
processing text nodes according to their context. If a p element has
children other than text nodes, then you don't want that p element to be a p
element; you want it to be a series of elements. If a p element has just
text nodes (which really means just one text node, but that doesn't matter),
then it should end up in a p element

The way to do it is to catch the text nodes of p elements that have
non-text-node children and wrap those text nodes in p elements. I've done
that in the following stylsheet:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>

<xsl:output method="xml" indent="yes"/>

  <xsl:template match="doc">
    <out>
      <xsl:apply-templates/>
    </out>
  </xsl:template>

  <xsl:template match="p[*]">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="p">
    <p><xsl:apply-templates/></p>
  </xsl:template>

  <xsl:template match="text()[parent::p[*]]">
    <p><xsl:value-of select="."/></p>
  </xsl:template>

  <xsl:template match="ul">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="r">
    <p><xsl:apply-templates/></p>
  </xsl:template>

  <xsl:template match="table">
    <p><xsl:apply-templates/></p>
  </xsl:template>

  <xsl:template match="li">
    <p><xsl:apply-templates/></p>
  </xsl:template>

</xsl:stylesheet>

I got the desired output when I applied this stylesheet to your input (after
I corrected it to have a document element and a closing tag for the ul
element).

HTH

Jay Bryant
Bryant Communication Services

----- Original Message -----
From: <Emily.Garrett@xxxxxxxxxxx>
To: <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
Sent: Thursday, August 24, 2006 1:41 PM
Subject: [xsl] Wrapping pieces of content separately




I'm trying to convert XML into WordML and going from a recursive structure to linear is difficult. I would like to try to read in XML and output in tags like this in a recursive (normal) manner as an intermediate step:

<p>title</p>
<p>paragraph</p>
<p>Here is a table from <r>hyperlink</r> which is below. <table>sldfkj
lsdfj lsd f</table> The above table was very small. <ul> <p>list
title</p> <p><li>Item 1</li></p><p><li>Item 2</li></p>The rest of it.
</p>

I would want to end up with this structure shown below (I don't need to
keep the <ul>), where I would take the nested <p> elements out and
structure them in a linear manner.  The content in the root level <p>
gets wrapped in separate <p> elements. This is the structure I need to
create,  but I would use WordML format in this step.:

<p>title</p>
<p>paragraph</p>
<p>Here is a table from <r>hyperlink</r> > which is below. </p>
<table>sldfkj lsdfj lsd f</table>
<p>The above table was very small. </p>
<p>list title</p>
<p><li>Item 1</li></p>
<p><li>Item 2</li></p>
<p>The rest of it.</p>

The problem is that content at the root needs to be in <p> tags, such as
"The above table was very small" and "The rest of it".   How do I
instruct the processing of it, not knowing the order?   Using
xsl:apply-templates will put <p> elements nested in other <p> elements
which won't work.  I need them to be divided into linear segments. I
thought if I could change them into a simpler set of tags, it might be
easier.  But I may not even need the intermediate step.

Thanks for any help you can provide.


Emily Garrett

Current Thread