[xsl] Better Way to Group Siblings By Start/End Markers?

From: Eliot Kimber <ekimber@xxxxxxxxxxxx>
Date: Mon, 23 Jun 2008 17:04:09 -0500
I am experimenting with using XSLT to convert Office Open XML into InCopy
INCX (the CS3 Word import fails to capture some things I need captured from
the Word data).

One challenge is handling Word fields, which need to be converted to any
number of different, and differently-structured, INCX constructs (whose
details are not important here).

A Word field is organized as a sequence of w:r elements within a larger
sequence of w:r elements. A field start is indicated by a w:r with a field
start indicator and the field end is indicated by another w:r with a field
end indicator. The w:r elements between these two marker elements comprise
the field data, which can be any number of things, including w:r elements
that would easily occur outside the scope of the field (e.g., w:r containing
literal document content).

Here is a typical sample:

        <w:t xml:space="preserve">-  </w:t>
        <w:instrText>HYPERLINK "http://www.example.com/";</w:instrText>

I have this for-each-group that seems to group correctly, but I'm wondering
if there's a simpler expression that does what I want:

<xsl:for-each-group select="w:r"
string(self::*[w:fldChar[@w:fldCharType = 'begin' or @w:fldCharType =
'end']] or 
(self::*[preceding-sibling::*/w:fldChar[@w:fldCharType = 'begin']] and
self::*[following-sibling::*/w:fldChar[@w:fldCharType = 'end']] and
count((self::*[preceding-sibling::*/w:fldChar[@w:fldCharType =
'begin']])[1]/(*[following-sibling::*/w:fldChar[@w:fldCharType = 'end']])[1]
(self::*[following-sibling::*/w:fldChar[@w:fldCharType = 'end']])[1]) = 1

In prose (at least this is what I intend the above expression to mean): if
w:r has child w:fldChar where @w:fldCharType = 'begin' or 'end' or w:r has
both a preceding sibling w:r with a w:fldChar of type 'begin' and a
following sibling w:r with a w:fldChar of type 'end' AND the nearest
preceding sibling field start has the same nearest following sibling field
end as the current node, then return the grouping "true" else return the
grouping key "false".


I can't think of a simpler way to say this. Is there one?

I realize I could factor some of the complexity of the expression out into a
function or two, which I will probably do.



