[xsl] Problems with mixed content and inline elements when transforming XHTML into another XML format

Subject: [xsl] Problems with mixed content and inline elements when transforming XHTML into another XML format
From: Tony Kinnis <kinnist@xxxxxxxxx>
Date: Wed, 22 Feb 2006 14:28:53 -0800 (PST)
Hello all,

I have been trying to solve this problem for a few days now and I have
had no luck. I am hoping someone here can help me out with this.

I need to parse XHTML and transform it into another XML format. I am
sure that the XHTML is valid and well formed (I am running it through
HTMLTidy). The first problem I encountered was the notion of mixed
elements. Something like...

<div>
     My name is <b>bob</>. What is yours?
    <ul>
         <li>foo</li>
         <li>bar</li>
    </ul>
</div>

I found a utility script on the web that can turn mixed content into
element content. I am guessing some of you have seen this script
before.

<xsl:template match="text()[normalize-space(.)][../*]">        
        <xsl:element name="textnode">
            <xsl:value-of select="."/>
        </xsl:element>
    </xsl:template>
    
    <xsl:template match="@*|node()">   
        <xsl:copy>            
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

This makes the above post look like...

<div>
     <textnode>My name is </textnode><b>bob</><textnode>. What is
yours?</textnode>
    <ul>
         <li>foo</li>
         <li>bar</li>
    </ul>
</div>

However, what I would really like to do is have the bold tags included
inside of the textnode tag so that it looks like...

<div>
     <textnode>My name is <b>bob</>. What is yours?</textnode>
    <ul>
         <li>foo</li>
         <li>bar</li>
    </ul>
</div>

In other words I would like to treat the <b> element as text and not an
element. There is a finite set of tags I would like to be treated as
simple text. These are considered in-line elements in html.
<b><i><em><strong><u>

An alternative, and better solution, would be wrapping all text through
the document in the textnode element including the in-line elmements
mentioned above. The  xml I will finally output from the transformation
of the xhtml requires all text be wrapped in a special displaytext tag
including the in-line elements mentioned above. By placing every piece
of text, including the in-line text tags above, in a textnode I could
easily pass the document through another template that says...

   <xsl:template match="textnode[normalize-space(.)]">
        <xsl:element name="displaytext">
            <xsl:apply-templates/>
        </xsl:element>
    </xsl:template> 

This would make things much easier.

Below are the xsl processor and xsl version. I am not tied to Saxon if
another processor could do the job, provided it can be used within Java
and ports across platforms (windows, unix, etc).

Processor: Saxon8B
XSL Version: 2.0

Thanks in advance for your help.

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Current Thread