Re: [xsl] XSLT splitting (grouping?) hierarchical structure

Subject: Re: [xsl] XSLT splitting (grouping?) hierarchical structure
From: "Joel Kalvesmaki director@xxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 11 Feb 2022 02:14:30 -0000
Hello Matthieu,

I've approached this task in XSLT 3.0 with the TAN functions tan:tree-to-sequence() and tan:sequence-to-tree(), which are the core processes of about four different applications and functions that do something similar to what you want to do.

The best way to seem them in action is to look at tan:chop-tree(), which takes a tree and a set of integers (the integers being a proxy for your <split>s), and splits/chops the tree at those positions in the overall text value of the tree:
https://github.com/textalign/TAN-2021/blob/90ce604d2834f1a26aab20dbbf5a5c612a3e5d3e/functions/nodes/TAN-fn-nodes-standard.xsl#L1824


Best wishes,

jk

On 2022-02-10 02:02, Michael Kay mike@xxxxxxxxxxxx wrote:
Since you're looking for design patterns, in Jackson Structured
Programming ( revisited using modern terminology at
http://mcs.open.ac.uk/mj665/JSPDDevt.pdf ) this is known as a
"boundary clash" problem, and the usual solution is to flatten the
heirarchy into a sequence of leaf nodes each containing details of its
own ancestry, and then reconstruct the new heirarchy by a grouping
operation on this sequence of leaf nodes. The original JSP book from
1975 is quite tough going nowadays, it all rather assumes you're well
versed in sort-merge processing of hierarchical data files on magnetic
tape. But the overall philosophy of transforming hierarchies using a
pipeline of successive tree-walking transformations is isomorphic to
the world we live in.

Although it's instinctive to reach for an XSLT solution, I think I
once solved a problem like this at the SAX level: keep a stack of open
elements, and when you hit a <split/>, emit endElement events to close
open elements up to a certain level, then output the <split/>, then
re-open the elements that you closed, in reverse order; you've then
got a structure that's relatively easy to break into sections using
conventional grouping.

Michael Kay
Saxonica

On 10 Feb 2022, at 08:20, Matthieu Ricaud-Dussarget
ricaudm@xxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

Dear XSL List,

It's not the first time I'm facing a splitting problem working with
publishing documents.
I used to find kind of tricky/verbose solutions but I'm wondering if
I'm missing something obvious, especially with XSLT 3.0 new features
?

My XML looks like this :<root>
<section>
<title>Title</title>
<content>
<p>paragraph #1</p>
<p>paragraph #2 to split <split id="split-1"/> here</p>
<p>paragraph #3</p>
<p>paragraph #4 <strong> to split <split id="split-2"/>
here</strong> if possible</p>
<p>paragraph #5</p>
<ul>
<li>Item #1</li>
<li>Item #2 to split <split id="split-3"/> here</li>
<li>Item #3</li>
<li>
<ul>
<li>Item #4</li>
<li>Item #5 to <em>split <split id="split-4"/></em> here
if possible</li>
<li>Item #6</li>
</ul>
</li>
</ul>
<p>paragraph #6</p>
</content>
</section>
</root>

The goal is to split the section on every <split> element (just like
a page would break the flowing text anywhere in the structure).

Expected result :

<root>
<section>
<title>Title</title>
<content>
<p>paragraph #1</p>
<p>paragraph #2 to split</p>
</content>
</section>
<split id="split-1"/>
<section>
<title>Title</title>
<content>
<p>paragraph #3</p>
<p>paragraph #4 <strong> to split</strong></p>
</content>
</section>
<split id="split-2"/>
<section>
<title>Title</title>
<content>
<p><strong> here</strong> if possible</p>
<p>paragraph #5</p>
<ul>
<li>Item #1</li>
<li>Item #2 to split </li>
</ul>
</content>
</section>
<split id="split-3"/>
<section>
<title>Title</title>
<content>
<ul>
<li> here</li>
<li>Item #3</li>
<li>
<ul>
<li>Item #4</li>
<li>Item #5 to <em>split</em></li>
</ul>
</li>
</ul>
</content>
</section>
<split id="split-4"/>
<section>
<title>Title</title>
<content>
<ul>
<ul>
<li> here if possible</li>
</ul>
<li>Item #6</li>
</ul>
<p>paragraph #6</p>
</content>
</section>
</root>

My idea was to iterate from 1 to the number of split elements + 1
and working on the section with tunnel params so I can test for each
node if it's before / after / in between (current) splits elements,
and then decide to keep the node or not according to this position.

I already used this kind of solution on a similar problem, long time
ago. So I'll give it a try though I'm not not totally confident with
it (because split elements can appear as inline content here).

Please let me know if you have ideas, if my solution is the right or
wrong way to go?
Are there special design patterns for this kind of problem ?
And last, have you ever faced this kind of splitting issue, any
feedback welcome :)

Cheers,
Matthieu Ricaud-Dussarget

--
Matthieu Ricaud-Dussarget
+33 6.63.25.95.58

XSL-List info and archive [1]
EasyUnsubscribe [2] (by email)

XSL-List info and archive [1] EasyUnsubscribe [3] (by email)

Links:
------
[1] http://www.mulberrytech.com/xsl/xsl-list
[2] http://lists.mulberrytech.com/unsub/xsl-list/293509
[3] http://lists.mulberrytech.com/unsub/xsl-list/3422410

-- Joel Kalvesmaki Director, Text Alignment Network http://textalign.net

Current Thread