Subject: [xsl] Problems with mixed content and inline elements when transforming XHTML into another XML format From: Tony Kinnis <kinnist@xxxxxxxxx> Date: Wed, 22 Feb 2006 14:28:53 -0800 (PST) |
Hello all, I have been trying to solve this problem for a few days now and I have had no luck. I am hoping someone here can help me out with this. I need to parse XHTML and transform it into another XML format. I am sure that the XHTML is valid and well formed (I am running it through HTMLTidy). The first problem I encountered was the notion of mixed elements. Something like... <div> My name is <b>bob</>. What is yours? <ul> <li>foo</li> <li>bar</li> </ul> </div> I found a utility script on the web that can turn mixed content into element content. I am guessing some of you have seen this script before. <xsl:template match="text()[normalize-space(.)][../*]"> <xsl:element name="textnode"> <xsl:value-of select="."/> </xsl:element> </xsl:template> <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> This makes the above post look like... <div> <textnode>My name is </textnode><b>bob</><textnode>. What is yours?</textnode> <ul> <li>foo</li> <li>bar</li> </ul> </div> However, what I would really like to do is have the bold tags included inside of the textnode tag so that it looks like... <div> <textnode>My name is <b>bob</>. What is yours?</textnode> <ul> <li>foo</li> <li>bar</li> </ul> </div> In other words I would like to treat the <b> element as text and not an element. There is a finite set of tags I would like to be treated as simple text. These are considered in-line elements in html. <b><i><em><strong><u> An alternative, and better solution, would be wrapping all text through the document in the textnode element including the in-line elmements mentioned above. The xml I will finally output from the transformation of the xhtml requires all text be wrapped in a special displaytext tag including the in-line elements mentioned above. By placing every piece of text, including the in-line text tags above, in a textnode I could easily pass the document through another template that says... <xsl:template match="textnode[normalize-space(.)]"> <xsl:element name="displaytext"> <xsl:apply-templates/> </xsl:element> </xsl:template> This would make things much easier. Below are the xsl processor and xsl version. I am not tied to Saxon if another processor could do the job, provided it can be used within Java and ports across platforms (windows, unix, etc). Processor: Saxon8B XSL Version: 2.0 Thanks in advance for your help. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] How to prevent that XML a, Wendell Piez | Thread | RE: [xsl] Problems with mixed conte, Michael Kay |
Re: [xsl] Multilanguage help, Florent Georges | Date | RE: [xsl] Problems with mixed conte, Michael Kay |
Month |