RE: [xsl] Tricky XSLT 2.0 grouping problem

Subject: RE: [xsl] Tricky XSLT 2.0 grouping problem
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Fri, 10 Oct 2008 15:36:30 +0100
In general the problem is insoluble, because given the sequence

g - first level
h - first level
1 - second level
2 - second level
A - third level
B - third level
i - first level? or fourth level?

the (i) could be either first level or fourth level.

Your input data structure is badly designed, because it gives no way of
distinguishing these two cases.

A reasonable way to proceed if you're stuck with this input would be to
assume that (i) is first level if and only if the previous first level
number is (h). But to do this you'll need to use sibling recursion rather
than pattern-based grouping.

Michael Kay
http://www.saxonica.com/

 

> -----Original Message-----
> From: James Sulak [mailto:jsulak@xxxxxxxxxxxxxxxx] 
> Sent: 10 October 2008 15:04
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] Tricky XSLT 2.0 grouping problem
> 
> I have a tricky grouping problem that I'm running into a wall 
> with.  I thought it might be a fun challenge to throw out 
> there.  I'm attempting to group a flat list of <section> 
> elements into a hierarchy based on matching its number 
> against different regular expressions.  The list is assumed 
> to be in the correct order.  I have it working (the code is
> below) with one exception:  roman numerals.  
> 
> For example, the XML:
> 
> <body>
> <section><pnum>(a)</pnum><p>First-level section</p></section> 
> <section><pnum>(1)</pnum><p>Second-level 
> section</p></section> <section><pnum>(A)</pnum><p>Third-level 
> section</p></section> 
> <section><pnum>(i)</pnum><p>Fourth-level 
> section</p></section> 
> <section><pnum>(ii)</pnum><p>Fourth-level 
> section</p></section> <section><pnum>(B)</pnum><p>Third-level 
> section</p></section> 
> <section><pnum>(2)</pnum><p>Second-level 
> section</p></section> <section><pnum>(A)</pnum><p>Third-level 
> section</p></section> </body>
> 
> Should give the result:
> 
> <body>
> <section>
>   <pnum>(a)</pnum><p>First-level section</p>
>   <section>
>     <pnum>(1)</pnum><p>Second-level section</p>
>     <section>
>       <pnum>(A)</pnum><p>Third-level section</p>
> 	<section><pnum>(i)</pnum><p>Fourth-level section</p></section>
> 	<section><pnum>(ii)</pnum><p>Fourth-level section</p></section>
>     </section>
>     <section>
>       <pnum>(B)</pnum><p>Third-level section</p>
>     </section>
>   </section>
>   <section>
>     <pnum>(2)</pnum><p>Second-level section</p>
>     <section><pnum>(A)</pnum><p>Third-level section</p></section>
>   </section>
> </section>
> </body>
> 
> The problem is that the number "(i)," which is supposed to be 
> a fourth-level section, in ambiguous with an "(i)" that would 
> be a first-level section.  My transform ends up treating it 
> like a first-level section, and so gives the following, 
> incorrect output:
> 
> <body>
> <section>
>   <pnum>(a)</pnum><p>First-level section</p>
>   <section>
>     <pnum>(1)</pnum><p>Second-level section</p>
>     <section>
>       <pnum>(A)</pnum><p>Third-level section</p>
>     </section>
>   </section>
> </section>
> <section>
>   <pnum>(i)</pnum><p>Fourth-level section</p>
>   <section>
>     <pnum>(ii)</pnum><p>Fourth-level section</p>
>     <section>
>       <pnum>(B)</pnum><p>Third-level section</p>
>     </section>
>   </section>
>   <section>
>     <pnum>(2)</pnum><p>Second-level section</p>
>     <section>
>       <pnum>(A)</pnum><p>Third-level section</p>
>     </section>
>   </section>
> </section>
> </body>
> 
> I've included my current transform below.  The grouping_keys 
> variable is a sequence of regex strings that match each 
> subsequent level of section
> nesting.  Does anybody have an alternate way of tackling this?   
> 
> Thanks,
> 
> -James 
> 
> 
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
>     xmlns:xs="http://www.w3.org/2001/XMLSchema"; version="2.0">
> 
>     <xsl:variable name="grouping_keys" as="xs:string+"
>                   select="('\([a-z]\)', '\([1-9]\)', 
> '\([A-Z]\)', '\([ivx]{1,4}\)')" />
>     
>     <!-- Start the grouping here -->
>     <xsl:template match="codebody">
>         <codebody>
>             <xsl:copy-of select="@*"/>
>             <xsl:for-each-group select="*"
>                 group-starting-with="section[matches(pnum,
> string($grouping_keys[1]))]">
>                 <xsl:apply-templates select="." mode="group">
>                     <xsl:with-param name="level" select="1"
> as="xs:integer"/>
>                 </xsl:apply-templates>
>             </xsl:for-each-group>
>         </codebody>
>     </xsl:template>
> 
>     <!-- This template copies the current section and groups 
> any "nested" sections  -->
>     <xsl:template match="section" mode="group">
>         <xsl:param name="level" as="xs:integer"/>
>         <section>
>             <xsl:copy-of select="@*, *"/>
>             <xsl:if test="$level &lt; count($grouping_keys)">
>                 <xsl:for-each-group select="current-group() except ."
>                     group-starting-with="section[matches(pnum,
> string($grouping_keys[$level + 1]))]">
>                     <xsl:apply-templates select="." mode="group">
>                         <xsl:with-param name="level" 
> select="$level + 1"
> as="xs:integer"/>
>                     </xsl:apply-templates>
>                 </xsl:for-each-group>
>             </xsl:if>
>         </section>
>     </xsl:template>
> 
>     <xsl:template match="element()" mode="#all">
>         <xsl:copy>
>             <xsl:apply-templates select="@*,node()" mode="#current"/>
>         </xsl:copy>
>     </xsl:template>
> 
>     <xsl:template
> match="attribute()|text()|comment()|processing-instruction()"
> mode="#all">
>         <xsl:copy/>
>     </xsl:template>
> 
> 
> </xsl:stylesheet>

Current Thread