RE: [xsl] Structuring linear input

Subject: RE: [xsl] Structuring linear input
From: "Michael Kay" <michael.h.kay@xxxxxxxxxxxx>
Date: Tue, 22 Oct 2002 13:57:59 +0100
> I need to change a linear input into something more structure; e.g. 
> 	<A/>	<B/>	<C/>	<D/>	<D/>	<C/>	<B/>	
> <D/>	<A/>
> <B/>
> 
> The result should look like this:
> 	<A>
> 		<B>
> 			<block>
> 				<C/><D/>
> 			</block>
> 			<block>
> 				<D/>
> 			</block>
> 			<block>
> 				<C/>
> 			</block>
> 		</B>
> 		<B>
> 			<block>
> 				<D/>
> 			</block>
> 		</B>
> 	<A/>
> 	<A>
> 		<B/>
> 	</A>
> 
> The linear input can be described by the following regular expression:
> 	(A (B (C* D*)*)*)*

I'm not sure why the second D has gone in a new block?
> 
> Does anyone has any idea how this problem can be solved?
> 
It's tricky. Even the grouping facilities in XSLT 2.0 aren't capable of
matching arbitrary regular grammars: they rely on identifying groups by
common values and/or a distinguished start or end node. One way you
could do it in XSLT 2.0 is to form a string that assembles the names of
the elements (space-separated, say), use regular-expression matching to
manipulate the string and insert the parentheses to indicate grouping,
then construct the grouped tree structure by fetching the elements
corresponding to particular names in the string, matching them by
position.

The A and B groups are relatively easy, it's the C D group that's hard.
If the group really is (C*D*)* then you can match the first node as
C[not(preceding-sibling::node()[1][self::C])] |
D[not(preceding-sibling::node()[1])], but your example appears to be
using a grouping of (C*D)*. I guess the corresponding condition here is:

(C|D)[not(preceding-sibling::node()[1][self::C])]

i.e. any node that isn't immediately preceded by a C starts a new group

If you know the expression that selects the first node in a group, then
in XSLT 2.0 you can select these using <xsl:for-each-group
group-starting-with>. In XSLT 1.0 you can use Muenchian grouping,
defining the grouping key as the generate-id() of the most recent (self
or preceding-sibling) node that satisfies this predicate.

Michael Kay
Software AG
home: Michael.H.Kay@xxxxxxxxxxxx
work: Michael.Kay@xxxxxxxxxxxxxx 
      


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread