Subject: Re: [xsl] Splitting text nodes - xsl:iterate? From: "David Rudel fwqhgads@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Wed, 12 Nov 2014 14:44:59 -0000 |
Hi Tom, I suspect that there is a straight-forward way to do this with accumulators, but I haven't used them much. I think one could use an iterate with the @select value being all nodes, but I don't think that is necessary, nor is it particularly "declarative," as per your interest. To accomplish the first goal (ignoring processing instructions): Consider creating a named template that keeps track of how much text has been seen so far using a template parameter and self-propels through the tree in the following manner: If the template is called on a text node, and the counter keeping track of the number of words seen is 0, it inserts a <new:begin/> element. If the template is called on a text node, and the number of words in the text node would not put the total word count over 20 --- or if the counter is already over 20--- then it increments its counter (template parameter) and copies the text. If the template is called with a text node that would put it at 20 or more words, it copies the text but inserts a <new:end/> at the appropriate place. If the template is called on an element node it makes a shallow copy of itself and inside that shallow copy, it calls itself on the first child of element, unless the element has no children, in which case it calls itself on the first following::node(). This should allow the template to eat all the elements and texts one by one until it hits the one that puts the text count over 20, and then it shouldn't be hard to insert the <new:end/> tag. Once you have done the above, it should be fairly trivial to accomplish the second objective by saving the first output to a file and using the new <new:start/> and <new:end/> tags. On Wed, Nov 12, 2014 at 3:10 PM, Tom Cleghorn tcleghorn@xxxxxxxxxxxxx < xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > Hi list, > > Given an input document looking something like this: > <doc> > <head><foo/><bar/><baz/></head> > <body> > <sec> > <para>Lorem ipsum dolor sit amet, consectetur adipiscing elit.<box > outline="maybe"><para quack="y">Proin id <?foo bar?>bibendum urna, <baz>ut > ornare</baz> mi.</para></box></para> > <para>Aenean dui risus, <qux>sodales quis leo sit amet, ornare > consequat</qux> metus. Ut vel massa congue, egestas nibh et, rutrum > odio.</para> > </sec> > </body> > </doc> > > (i.e. document markup consisting of arbitrary text and element nodes > nested to some unknown depth) > > and the requirement for two separate outputs looking like these: > <doc> > <head><foo/><bar/><baz/></head> > <body> > <sec> > <para><new:start/>Lorem ipsum dolor sit amet, consectetur adipiscing > elit.<box outline="maybe"><para quack="y">Proin id <?foo bar?>bibendum > urna, <baz>ut ornare</baz> mi.</para></box></para> > <para>Aenean dui risus, <qux>sodales quis <new:end/>leo sit amet, > ornare consequat</qux> metus. Ut vel massa congue, egestas nibh et, rutrum > odio.</para> > </sec> > </body> > </doc> > > <sec> > <para>Lorem ipsum dolor sit amet, consectetur adipiscing elit.<box > outline="maybe"><para quack="y">Proin id <?foo bar?>bibendum urna, <baz>ut > ornare</baz> mi.</para></box></para> > <para>Aenean dui risus, <qux>sodales quis [...]</qux></para> > </sec> > > (i.e. a copy of the input, with new:start and new:end elements marking the > first 20 words of the document; and separately a copy of those first twenty > words, preserving all markup within them and adding ellipses at the end) > > ...how might I fruitfully approach the transformation in an XSLT idiom? I > feel that there should be some neat declarative way of doing it, possibly > using xsl:iterate and/or accumulators, that I'm just failing to see. XSLT > 3.0 is available (Saxon 9.6), but the source documents are old content and > not open to adjustment, sadly. I've tried using xsl:iterate, but I seem to > be falling down in keeping track of whether or not I'm processing the > specific text node in which the break needs to occur. > > Am I making a rod for my own back here? Should I just be breaking out to a > custom Java function and crossing my fingers that I manage to avoid > ill-formed output? Any advice will be very gratefully received! > > Thanks! > XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list> > EasyUnsubscribe <-list/1312897> (by > email <>) > -- "A false conclusion, once arrived at and widely accepted is not dislodged easily, and the less it is understood, the more tenaciously it is held." - Cantor's Law of Preservation of Ignorance.
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] Splitting text nodes - xsl:it, Tom Cleghorn tclegho | Thread | Re: [xsl] Splitting text nodes - xs, Martin Honnen martin |
[xsl] Splitting text nodes - xsl:it, Tom Cleghorn tclegho | Date | Re: [xsl] What PC Windows editor ar, Flanders, Charles E |
Month |