Subject: RE: [xsl] Problems with mixed content and inline elements when transforming XHTML into another XML format From: "Michael Kay" <mike@xxxxxxxxxxxx> Date: Wed, 22 Feb 2006 23:40:55 -0000 |
You're using XSLT 2.0 so this can be solved using grouping constructs. Forget the templates that create <textnode> elements. You want something like this, which causes adjacent "inline" nodes to be grouped under a new element, with a function to decide whether a node is an "inline" node: <xsl:template match="div"> <xsl:copy> <xsl:for-each-group select="node()" group-adjacent="f:is-inline(node())"> <xsl:choose> <xsl:when test="current-grouping-key()"> <textnode><xsl:copy-of select="current-group()"/></textnode> </xsl:when> <xsl:otherwise> <xsl:copy-of select="current-group()"/> </ </ </ </ </ <xsl:function name="f:is-inline" as="xs:boolean"> <xsl:param name="node" as="node()"/> <xsl:sequence select="$node instanceof text() or $node[self::u|self::b|self::i]"/> </xsl:function> Michael Kay http://www.saxonica.com/ > -----Original Message----- > From: Tony Kinnis [mailto:kinnist@xxxxxxxxx] > Sent: 22 February 2006 22:29 > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Subject: [xsl] Problems with mixed content and inline > elements when transforming XHTML into another XML format > > Hello all, > > I have been trying to solve this problem for a few days now and I have > had no luck. I am hoping someone here can help me out with this. > > I need to parse XHTML and transform it into another XML format. I am > sure that the XHTML is valid and well formed (I am running it through > HTMLTidy). The first problem I encountered was the notion of mixed > elements. Something like... > > <div> > My name is <b>bob</>. What is yours? > <ul> > <li>foo</li> > <li>bar</li> > </ul> > </div> > > I found a utility script on the web that can turn mixed content into > element content. I am guessing some of you have seen this script > before. > > <xsl:template match="text()[normalize-space(.)][../*]"> > <xsl:element name="textnode"> > <xsl:value-of select="."/> > </xsl:element> > </xsl:template> > > <xsl:template match="@*|node()"> > <xsl:copy> > <xsl:apply-templates select="@*|node()"/> > </xsl:copy> > </xsl:template> > > This makes the above post look like... > > <div> > <textnode>My name is </textnode><b>bob</><textnode>. What is > yours?</textnode> > <ul> > <li>foo</li> > <li>bar</li> > </ul> > </div> > > However, what I would really like to do is have the bold tags included > inside of the textnode tag so that it looks like... > > <div> > <textnode>My name is <b>bob</>. What is yours?</textnode> > <ul> > <li>foo</li> > <li>bar</li> > </ul> > </div> > > In other words I would like to treat the <b> element as text > and not an > element. There is a finite set of tags I would like to be treated as > simple text. These are considered in-line elements in html. > <b><i><em><strong><u> > > An alternative, and better solution, would be wrapping all > text through > the document in the textnode element including the in-line elmements > mentioned above. The xml I will finally output from the > transformation > of the xhtml requires all text be wrapped in a special displaytext tag > including the in-line elements mentioned above. By placing every piece > of text, including the in-line text tags above, in a textnode I could > easily pass the document through another template that says... > > <xsl:template match="textnode[normalize-space(.)]"> > <xsl:element name="displaytext"> > <xsl:apply-templates/> > </xsl:element> > </xsl:template> > > This would make things much easier. > > Below are the xsl processor and xsl version. I am not tied to Saxon if > another processor could do the job, provided it can be used > within Java > and ports across platforms (windows, unix, etc). > > Processor: Saxon8B > XSL Version: 2.0 > > Thanks in advance for your help. > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] Problems with mixed content a, Tony Kinnis | Thread | RE: [xsl] Problems with mixed conte, Tony Kinnis |
[xsl] Problems with mixed content a, Tony Kinnis | Date | [xsl] xml xsl web architecture, Anthony Ettinger |
Month |