Re: [xsl] Splitting text nodes - xsl:iterate?

Subject: Re: [xsl] Splitting text nodes - xsl:iterate?
From: "David Rudel fwqhgads@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 12 Nov 2014 14:44:59 -0000
Hi Tom,
I suspect that there is a straight-forward way to do this with
accumulators, but I haven't used them much. I think one could use an
iterate with the @select value being all nodes, but I don't think that is
necessary, nor is it particularly "declarative," as per your interest.

To accomplish the first goal (ignoring processing instructions):

Consider creating a named template that keeps track of how much text has
been seen so far using a template parameter and self-propels through the
tree in the following manner:

If the template is called on a text node, and the counter keeping track of
the number of words seen is 0, it inserts a <new:begin/> element.

If the template is called on a text node, and the number of words in the
text node would not put the total word count over 20 --- or if the counter
is already over 20--- then it increments its counter (template parameter)
and copies the text.

If the template is called with a text node that would put it at 20 or more
words, it copies the text but inserts a <new:end/> at the appropriate place.

If the template is called on an element node it makes a shallow copy of
itself and inside that shallow copy, it calls itself on the first child of
element, unless the element has no children, in which case it calls itself
on the first following::node().

This should allow the template to eat all the elements and texts one by one
until it hits the one that puts the text count over 20, and then it
shouldn't be hard to insert the <new:end/> tag.

Once you have done the above, it should be fairly trivial to accomplish the
second objective by saving the first output to a file and using the new
<new:start/> and <new:end/> tags.

On Wed, Nov 12, 2014 at 3:10 PM, Tom Cleghorn tcleghorn@xxxxxxxxxxxxx <
xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

> Hi list,
>
> Given an input document looking something like this:
> <doc>
>   <head><foo/><bar/><baz/></head>
>   <body>
>     <sec>
>       <para>Lorem ipsum dolor sit amet, consectetur adipiscing elit.<box
> outline="maybe"><para quack="y">Proin id <?foo bar?>bibendum urna, <baz>ut
> ornare</baz> mi.</para></box></para>
>       <para>Aenean dui risus, <qux>sodales quis leo sit amet, ornare
> consequat</qux> metus. Ut vel massa congue, egestas nibh et, rutrum
> odio.</para>
>     </sec>
>   </body>
> </doc>
>
> (i.e. document markup consisting of arbitrary text and element nodes
> nested to some unknown depth)
>
> and the requirement for two separate outputs looking like these:
> <doc>
>   <head><foo/><bar/><baz/></head>
>   <body>
>     <sec>
>       <para><new:start/>Lorem ipsum dolor sit amet, consectetur adipiscing
> elit.<box outline="maybe"><para quack="y">Proin id <?foo bar?>bibendum
> urna, <baz>ut ornare</baz> mi.</para></box></para>
>       <para>Aenean dui risus, <qux>sodales quis <new:end/>leo sit amet,
> ornare consequat</qux> metus. Ut vel massa congue, egestas nibh et, rutrum
> odio.</para>
>     </sec>
>   </body>
> </doc>
>
> <sec>
>   <para>Lorem ipsum dolor sit amet, consectetur adipiscing elit.<box
> outline="maybe"><para quack="y">Proin id <?foo bar?>bibendum urna, <baz>ut
> ornare</baz> mi.</para></box></para>
>   <para>Aenean dui risus, <qux>sodales quis [...]</qux></para>
> </sec>
>
> (i.e. a copy of the input, with new:start and new:end elements marking the
> first 20 words of the document; and separately a copy of those first twenty
> words, preserving all markup within them and adding ellipses at the end)
>
> ...how might I fruitfully approach the transformation in an XSLT idiom? I
> feel that there should be some neat declarative way of doing it, possibly
> using xsl:iterate and/or accumulators, that I'm just failing to see. XSLT
> 3.0 is available (Saxon 9.6), but the source documents are old content and
> not open to adjustment, sadly. I've tried using xsl:iterate, but I seem to
> be falling down in keeping track of whether or not I'm processing the
> specific text node in which the break needs to occur.
>
> Am I making a rod for my own back here? Should I just be breaking out to a
> custom Java function and crossing my fingers that I manage to avoid
> ill-formed output? Any advice will be very gratefully received!
>
> Thanks!
>  XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
> EasyUnsubscribe <-list/1312897> (by
> email <>)
>



-- 

"A false conclusion, once arrived at and widely accepted is not dislodged
easily, and the less it is understood, the more tenaciously it is held." -
Cantor's Law of Preservation of Ignorance.

Current Thread