RE: [xsl] Problems with mixed content and inline elements when transforming XHTML into another XML format

Subject: RE: [xsl] Problems with mixed content and inline elements when transforming XHTML into another XML format
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Wed, 22 Feb 2006 23:40:55 -0000
You're using XSLT 2.0 so this can be solved using grouping constructs.

Forget the templates that create <textnode> elements.

You want something like this, which causes adjacent "inline" nodes to be
grouped under a new element, with a function to decide whether a node is an
"inline" node:

<xsl:template match="div">
  <xsl:copy>
    <xsl:for-each-group select="node()"
group-adjacent="f:is-inline(node())">
      <xsl:choose>
        <xsl:when test="current-grouping-key()">
          <textnode><xsl:copy-of select="current-group()"/></textnode>
        </xsl:when>
        <xsl:otherwise>
          <xsl:copy-of select="current-group()"/>
        </
      </
    </
  </
</

<xsl:function name="f:is-inline" as="xs:boolean">
  <xsl:param name="node" as="node()"/>
  <xsl:sequence select="$node instanceof text() or
$node[self::u|self::b|self::i]"/>
</xsl:function>

Michael Kay
http://www.saxonica.com/
   

> -----Original Message-----
> From: Tony Kinnis [mailto:kinnist@xxxxxxxxx] 
> Sent: 22 February 2006 22:29
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] Problems with mixed content and inline 
> elements when transforming XHTML into another XML format
> 
> Hello all,
> 
> I have been trying to solve this problem for a few days now and I have
> had no luck. I am hoping someone here can help me out with this.
> 
> I need to parse XHTML and transform it into another XML format. I am
> sure that the XHTML is valid and well formed (I am running it through
> HTMLTidy). The first problem I encountered was the notion of mixed
> elements. Something like...
> 
> <div>
>      My name is <b>bob</>. What is yours?
>     <ul>
>          <li>foo</li>
>          <li>bar</li>
>     </ul>
> </div>
> 
> I found a utility script on the web that can turn mixed content into
> element content. I am guessing some of you have seen this script
> before.
> 
> <xsl:template match="text()[normalize-space(.)][../*]">        
>         <xsl:element name="textnode">
>             <xsl:value-of select="."/>
>         </xsl:element>
>     </xsl:template>
>     
>     <xsl:template match="@*|node()">   
>         <xsl:copy>            
>             <xsl:apply-templates select="@*|node()"/>
>         </xsl:copy>
>     </xsl:template>
> 
> This makes the above post look like...
> 
> <div>
>      <textnode>My name is </textnode><b>bob</><textnode>. What is
> yours?</textnode>
>     <ul>
>          <li>foo</li>
>          <li>bar</li>
>     </ul>
> </div>
> 
> However, what I would really like to do is have the bold tags included
> inside of the textnode tag so that it looks like...
> 
> <div>
>      <textnode>My name is <b>bob</>. What is yours?</textnode>
>     <ul>
>          <li>foo</li>
>          <li>bar</li>
>     </ul>
> </div>
> 
> In other words I would like to treat the <b> element as text 
> and not an
> element. There is a finite set of tags I would like to be treated as
> simple text. These are considered in-line elements in html.
> <b><i><em><strong><u>
> 
> An alternative, and better solution, would be wrapping all 
> text through
> the document in the textnode element including the in-line elmements
> mentioned above. The  xml I will finally output from the 
> transformation
> of the xhtml requires all text be wrapped in a special displaytext tag
> including the in-line elements mentioned above. By placing every piece
> of text, including the in-line text tags above, in a textnode I could
> easily pass the document through another template that says...
> 
>    <xsl:template match="textnode[normalize-space(.)]">
>         <xsl:element name="displaytext">
>             <xsl:apply-templates/>
>         </xsl:element>
>     </xsl:template> 
> 
> This would make things much easier.
> 
> Below are the xsl processor and xsl version. I am not tied to Saxon if
> another processor could do the job, provided it can be used 
> within Java
> and ports across platforms (windows, unix, etc).
> 
> Processor: Saxon8B
> XSL Version: 2.0
> 
> Thanks in advance for your help.
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 

Current Thread