Subject: [xsl] Grouping by character runs (and keeping element structure) From: "Christian Roth" <roth@xxxxxxxxxxxxxx> Date: Thu, 27 Jul 2006 12:34:33 +0200 |
Continuing my grouping issues: XSLT2 handles grouping on a node level quite conveniently. However, adding structure to legacy, rather flat content (i.e.: character runs) still poses challenges in grouping. The following applies mainly to document-centric (as opposed to data-centric) XML. __ EXAMPLE 1 __ <p>Note #4: Don't tumble dry your pet.</p> TASK: Group the leading paragraph text "Note #4:" using <marker> so that the result looks like (indented for readibility): <p><marker>Note #4:</marker> Don't tumble dry your pet.</p> SOLUTION: The solution is easy, as we can just work on the text without having to worry about markup: <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match="p"> <xsl:copy> <xsl:analyze-string select="." regex="^Note\s#\d+:"> <xsl:matching-substring> <marker> <xsl:value-of select="." /> </marker> </xsl:matching-substring> <xsl:non-matching-substring> <xsl:value-of select="normalize-space(.)" /> </xsl:non-matching-substring> </xsl:analyze-string> </xsl:copy> </xsl:template> </xsl:stylesheet> However, in "real" documents, you will have likely something like this: __ EXAMPLE 2 __ <p><b>Note</b> <i>#4</i>: Don't tumble dry your pet.</p> TASK: Group the leading paragraph text "Note #4:" including any contained markup using <marker> so that the result looks like: <p><marker><b>Note</b> <i>#4</i>:</marker> Don't tumble dry your pet.</p> SOLUTION: Here it starts to get really complicated. Since now the text will contain markup we need to retain, but the text run is still to be considered from the <p> level (so that we can test for "starts with pattern" using '^'), <xsl:analyze-string/> does not seem to do the trick in this case. A worst-case scenario of course would be: __ EXAMPLE 3 __ <p><ul><b>Note</b> <i>#4</i>: Don't tumble dry your pet</ul>.</p> TASK: Group the leading paragraph text "Note #4:" including any contained markup using <marker> to a child of <p> so that the result looks like: <p><marker><ul><b>Note</b> <i>#4</i>:</ul></marker> <ul>Don't tumble dry your pet</ul>.</p> SOLUTION: Same problems as in EXAMPLE 2, but additionally note that the <ul> element must be split/duplicated so that <marker> can be a child of <p>, yet retains the full formatting info in form of the contained element structure. Is there a certain pattern on how to tackle these kind of problems in XSLT, or is the language just not the tool of choice for this kind of transformation? -Christian
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Filemaker XSL woes, Chad Chelius | Thread | Re: [xsl] Grouping by character run, David Carlisle |
RE: [xsl] Removing Blank pages from, Shailesh Shinde | Date | Re: [xsl] Grouping by character run, David Carlisle |
Month |