[xsl] Splitting text nodes - xsl:iterate?

Subject: [xsl] Splitting text nodes - xsl:iterate?
From: "Tom Cleghorn tcleghorn@xxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 12 Nov 2014 14:10:22 -0000
Hi list,

Given an input document looking something like this:
<doc>
  <head><foo/><bar/><baz/></head>
  <body>
    <sec>
      <para>Lorem ipsum dolor sit amet, consectetur adipiscing elit.<box 
outline="maybe"><para quack="y">Proin id <?foo bar?>bibendum urna, <baz>ut 
ornare</baz> mi.</para></box></para>
      <para>Aenean dui risus, <qux>sodales quis leo sit amet, ornare 
consequat</qux> metus. Ut vel massa congue, egestas nibh et, rutrum 
odio.</para>
    </sec>
  </body>
</doc>

(i.e. document markup consisting of arbitrary text and element nodes 
nested to some unknown depth)

and the requirement for two separate outputs looking like these:
<doc>
  <head><foo/><bar/><baz/></head>
  <body>
    <sec>
      <para><new:start/>Lorem ipsum dolor sit amet, consectetur adipiscing 
elit.<box outline="maybe"><para quack="y">Proin id <?foo bar?>bibendum 
urna, <baz>ut ornare</baz> mi.</para></box></para>
      <para>Aenean dui risus, <qux>sodales quis <new:end/>leo sit amet, 
ornare consequat</qux> metus. Ut vel massa congue, egestas nibh et, rutrum 
odio.</para>
    </sec>
  </body>
</doc>

<sec>
  <para>Lorem ipsum dolor sit amet, consectetur adipiscing elit.<box 
outline="maybe"><para quack="y">Proin id <?foo bar?>bibendum urna, <baz>ut 
ornare</baz> mi.</para></box></para>
  <para>Aenean dui risus, <qux>sodales quis [...]</qux></para>
</sec>

(i.e. a copy of the input, with new:start and new:end elements marking the 
first 20 words of the document; and separately a copy of those first 
twenty words, preserving all markup within them and adding ellipses at the 
end)

...how might I fruitfully approach the transformation in an XSLT idiom? I 
feel that there should be some neat declarative way of doing it, possibly 
using xsl:iterate and/or accumulators, that I'm just failing to see. XSLT 
3.0 is available (Saxon 9.6), but the source documents are old content and 
not open to adjustment, sadly. I've tried using xsl:iterate, but I seem to 
be falling down in keeping track of whether or not I'm processing the 
specific text node in which the break needs to occur.

Am I making a rod for my own back here? Should I just be breaking out to a 
custom Java function and crossing my fingers that I manage to avoid 
ill-formed output? Any advice will be very gratefully received!

Thanks!

Current Thread