Subject: Re: [xsl] exercise in complex grouping|
From: "Martin Honnen martin.honnen@xxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 12 May 2020 09:45:25 -0000
I have a moderately sizable TEI file (~31,000 text nodes with ~100,400 "words" or ~688,000 characters; ~20,000 elements, ~15,000 attributes). Somewhere in all that mess there are a few pairs of elements for which I need some special processing.
Say each pair is an <A> and a <B>. I can find each <B> by XPath quite trivially. In addition, for every pair, <B> has a @target that points to the corresponding <A> via a bare name identifier URL. Furthermore, every <B> in the document is part of such a pair. (Which is why it is so trivial to find them via XPath. The same can not be said for <A>: there are *lots* of <A> elements that are not part of an <A>-<B> pair; but none, of course, that bear that particular @xml:id, so they can be found by XPath. It's just easy, not trivial. :-)
In general, there can be other nodes between <A> and <B>, and there will be cases in which <B> precedes rather than follows the <A> it points to. E.g.,
blah blah blah <d><e>blah</e> blah <B target="#A1">blort</B> <f>monkey</f> shines <A xml:id="A1">snort</A> blah</d>
I want to be able to handle these cases, too.
For the foreseeable future, there will never be another <B> in between a <B> and the <A> it points to, and each <B> will be a child of the same element as the <A> it points to. (I.e., no overlap problems.) But as soon as I say these complications will never happen, the very next day the editors will gleeful send e-mail saying they have found such a case. But for now, if needed, I'm willing to write code that presumes it won't happen.
What I want for output is to be able to wrap the <B> with the <A> it points to, *and everything in between* in a <C>.
blah blah blah <d><e>blah</e> blah <C xml:id="A1Container"> <B target="#A1">blort</B> <f>monkey</f> shines <A xml:id="A1">snort</A> </C> blah</d>
I am 90% confident I can write some messy XSLT 1.0 Muenchian grouping code that does this. (Although I suspect it would take two passes, one for <A> precedes <B>, another for <B> precedes <A>; but I don't care about two passes at all, and would not even care if it took N passes.) But I am equally confident there is a much better <xsl:for-each-group> method that, at the moment, I simply can't wrap my head around.
In XSLT 2 and later you have <xsl:for-each-group select="node()" group-starting-with="B[@target]"> Furthermore using `id(substring(@target, 2)` would give you the A element so you can use the << operator or you can use a nested group-ending-with to identify the A and the items in between.
I have not understood what you want to do for input where the B follows the A element.