Re: [xsl] XSLT splitting (grouping?) hierarchical structure

Subject: Re: [xsl] XSLT splitting (grouping?) hierarchical structure
From: "Matthieu Ricaud-Dussarget ricaudm@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 11 Feb 2022 07:28:31 -0000
Hi all,

Thanks a lot Joel for your pointer, TAN looks like quite an impressive work
with XSLT 3.0 !
Your approach tan:tree-to-sequence() and tan:sequence-to-tree() is really
near to what Michael says : "the usual solution is to flatten the heirarchy
into a sequence of leaf nodes each containing details of its own ancestry,
and then reconstruct the new heirarchy by a grouping operation on this
sequence of leaf nodes"

I finally tried Gerrit "upward projection method" with the nice example
given by Martin.
It didn't worked so easily in my real use-case because :
1) My <split> element has content inside, so instead
of current-group()[not(self::split)] in the tunnel "nodes" parameter I had
to set it to current-group()[not(ancestor-or-self::split)] (because the
grouping strategy select each descendant nodes)
2) I still had problem at this point because my split element might also be
a container of the subtree to split, I find a quick solution that is to set
a copy of the subtree into a variable so I'm sure the full context won't
impact to the splitting strategy

Actually my split element is not a special element for splitting : it's a
common element with content, it may be nested anywhere. My goal here was to
deal with the special case where this element appears within text as inline
content, which is not allowed in the result schema.

Here is the full example with XSpec :
The split element is <element type="choice">

Xspec :
<?xml version="1.0" encoding="UTF-8"?>
<x:description xmlns:x="http://www.jenitennison.com/xslt/xspec";
  stylesheet="../../main/xsl/SEF2Oppus.xsl">

  <x:scenario label="Split inline choice">
    <x:context>
      <BLOCK>
        <element type="paragraph">
          <id>xxx</id>
          <content>
            <p>paragraph #1</p>
            <p>paragraph #2 to split <element type="choice" id="split-1"/>
here</p>
            <p>paragraph #3</p>
            <p>paragraph #4 <strong> to split <element type="choice"
id="split-2"/> here</strong> if possible</p>
            <p>paragraph #5</p>
            <ul>
              <li>Item #1</li>
              <li>Item #2 to split <element type="choice" id="split-3"/>
here</li>
              <li>Item #3</li>
              <li>
                <ul>
                  <li>Item #4</li>
                  <li>Item #5 to <em>split <element type="choice"
id="split-4"/></em> here if possible</li>
                  <li>Item #6</li>
                </ul>
              </li>
            </ul>
            <p>paragraph #6</p>
          </content>
        </element>
      </BLOCK>
    </x:context>
    <x:expect label="OK">
      <BLOCK>
        <element type="paragraph">
          <id>xxx</id>
          <content>
            <p>paragraph #1</p>
            <p>paragraph #2 to split </p>
          </content>
        </element>
        <element type="choice" id="split-1" />
        <element type="inline">
          <id>xxx</id>
          <content>
            <p> here</p>
            <p>paragraph #3</p>
            <p>paragraph #4 <strong> to split </strong>
            </p>
          </content>
        </element>
        <element type="choice" id="split-2" />
        <element type="inline">
          <id>xxx</id>
          <content>
            <p>
              <strong> here</strong> if possible</p>
            <p>paragraph #5</p>
            <ul>
              <li>Item #1</li>
              <li>Item #2 to split </li>
            </ul>
          </content>
        </element>
        <element type="choice" id="split-3" />
        <element type="inline">
          <id>xxx</id>
          <content>
            <ul>
              <li> here</li>
              <li>Item #3</li>
              <li>
                <ul>
                  <li>Item #4</li>
                  <li>Item #5 to <em>split </em>
                  </li>
                </ul>
              </li>
            </ul>
          </content>
        </element>
        <element type="choice" id="split-4" />
        <element type="inline">
          <id>xxx</id>
          <content>
            <ul>
              <li>
                <ul>
                  <li> here if possible</li>
                  <li>Item #6</li>
                </ul>
              </li>
            </ul>
            <p>paragraph #6</p>
          </content>
        </element>
      </BLOCK>
    </x:expect>
  </x:scenario>

  <x:scenario label="Element choice with content">
    <x:context>
      <element type="paragraph">
        <id>xxx</id>
        <content>
          <p>one</p>
          <p>two <element type="choice"><id>yyy</id></element> text</p>
        </content>
      </element>
    </x:context>
    <x:expect label="OK">
      <element type="paragraph">
        <id>xxx</id>
        <content>
          <p>one</p>
          <p>two </p>
        </content>
      </element>
      <element type="choice"><id>yyy</id></element>
      <element type="inline">
        <id>xxx</id>
        <content>
          <p> text</p>
        </content>
      </element>
    </x:expect>
  </x:scenario>

  <x:scenario label="Nested elements choice">
    <x:context>
      <element type="choice">
        <id>yyy</id>
        <choiceID>zzz</choiceID>
        <label>label</label>
        <content>
          <element type="paragraph">
            <id>xxx</id>
            <content>
              <p>one</p>
              <p>two <element type="choice"><id>yyy</id></element> text</p>
            </content>
          </element>
        </content>
      </element>
    </x:context>
    <x:expect label="OK">
      <element type="choice">
        <id>yyy</id>
        <choiceID>zzz</choiceID>
        <label>label</label>
        <content>
          <element type="paragraph">
            <id>xxx</id>
            <content>
              <p>one</p>
              <p>two </p>
            </content>
          </element>
          <element type="choice"><id>yyy</id></element>
          <element type="inline">
            <id>xxx</id>
            <content>
              <p> text</p>
            </content>
          </element>
        </content>
      </element>
    </x:expect>
  </x:scenario>

</x:description>

Nota Bene :
- you might be surprised the <id> elements are copied here, but it's not a
problem in my case as they only contains a processing-instruction saying
"generate an id" (which is done after the XSLT in a java programm) so
copying <id> won't generate id duplication
- As you may have noticed, the first new element out of the splitting as
type="paragraph" whereas the others generated after as type="inline" so
splitted block may still appear in the same line in the output format. In
the XSLT I didn't find a way to get the 1st current-group() so I did a kind
of hack with the 1st child nodes to distinguish the 1st group from the
others

XSLT looks like :

<xsl:mode name="split" on-no-match="shallow-copy"/>
<xsl:mode name="#default" on-no-match="shallow-copy"/>

<xsl:template match="node()" mode="split">
    <xsl:param name="nodes" as="node()*" tunnel="yes"/>
    <xsl:if test=". intersect $nodes/ancestor-or-self::node()">
      <xsl:next-match/>
    </xsl:if>
  </xsl:template>

<xsl:template match="element[@type = 'paragraph']">
    <!-- copy the subtree here so there's no bord effects in the $nodes
param with nested choice elements in the whole structure-->
    <xsl:variable name="self" as="element()">
      <xsl:copy-of select="."/>
    </xsl:variable>
    <xsl:variable name="fistChildNode" select="$self/node()[1]"
as="node()"/>
    <xsl:for-each-group select="$self//node()"
group-ending-with="element[@type = 'choice']">
      <element type="{if(current-group()[1] is $fistChildNode)
then('paragraph') else('inline')}">
        <xsl:apply-templates select="$self/node()" mode="split">
          <xsl:with-param name="nodes" as="node()*"
            select="current-group()[not(ancestor-or-self::element[@type =
'choice'])] | $self/id/descendant-or-self::node()" tunnel="yes"/>
        </xsl:apply-templates>
      </element>
      <xsl:sequence select="current-group()[last()][self::element[@type =
'choice']]"/>
    </xsl:for-each-group>
  </xsl:template>

This works fine, thanks again for your help all !

Cheers
Matthieu



Le ven. 11 fC)vr. 2022 C  03:14, Joel Kalvesmaki director@xxxxxxxxxxxxx <
xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> a C)crit :

> Hello Matthieu,
>
> I've approached this task in XSLT 3.0 with the TAN functions
> tan:tree-to-sequence() and tan:sequence-to-tree(), which are the core
> processes of about four different applications and functions that do
> something similar to what you want to do.
>
> The best way to seem them in action is to look at tan:chop-tree(), which
> takes a tree and a set of integers (the integers being a proxy for your
> <split>s), and splits/chops the tree at those positions in the overall
> text value of the tree:
>
>
https://github.com/textalign/TAN-2021/blob/90ce604d2834f1a26aab20dbbf5a5c612a
3e5d3e/functions/nodes/TAN-fn-nodes-standard.xsl#L1824
>
> Best wishes,
>
> jk
>
> On 2022-02-10 02:02, Michael Kay mike@xxxxxxxxxxxx wrote:
> > Since you're looking for design patterns, in Jackson Structured
> > Programming ( revisited using modern terminology at
> > http://mcs.open.ac.uk/mj665/JSPDDevt.pdf ) this is known as a
> > "boundary clash" problem, and the usual solution is to flatten the
> > heirarchy into a sequence of leaf nodes each containing details of its
> > own ancestry, and then reconstruct the new heirarchy by a grouping
> > operation on this sequence of leaf nodes. The original JSP book from
> > 1975 is quite tough going nowadays, it all rather assumes you're well
> > versed in sort-merge processing of hierarchical data files on magnetic
> > tape. But the overall philosophy of transforming hierarchies using a
> > pipeline of successive tree-walking transformations is isomorphic to
> > the world we live in.
> >
> > Although it's instinctive to reach for an XSLT solution, I think I
> > once solved a problem like this at the SAX level: keep a stack of open
> > elements, and when you hit a <split/>, emit endElement events to close
> > open elements up to a certain level, then output the <split/>, then
> > re-open the elements that you closed, in reverse order; you've then
> > got a structure that's relatively easy to break into sections using
> > conventional grouping.
> >
> > Michael Kay
> > Saxonica
> >
> >> On 10 Feb 2022, at 08:20, Matthieu Ricaud-Dussarget
> >> ricaudm@xxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> >>
> >> Dear XSL List,
> >>
> >> It's not the first time I'm facing a splitting problem working with
> >> publishing documents.
> >> I used to find kind of tricky/verbose solutions but I'm wondering if
> >> I'm missing something obvious, especially with XSLT 3.0 new features
> >> ?
> >>
> >> My XML looks like this :<root>
> >> <section>
> >> <title>Title</title>
> >> <content>
> >> <p>paragraph #1</p>
> >> <p>paragraph #2 to split <split id="split-1"/> here</p>
> >> <p>paragraph #3</p>
> >> <p>paragraph #4 <strong> to split <split id="split-2"/>
> >> here</strong> if possible</p>
> >> <p>paragraph #5</p>
> >> <ul>
> >> <li>Item #1</li>
> >> <li>Item #2 to split <split id="split-3"/> here</li>
> >> <li>Item #3</li>
> >> <li>
> >> <ul>
> >> <li>Item #4</li>
> >> <li>Item #5 to <em>split <split id="split-4"/></em> here
> >> if possible</li>
> >> <li>Item #6</li>
> >> </ul>
> >> </li>
> >> </ul>
> >> <p>paragraph #6</p>
> >> </content>
> >> </section>
> >> </root>
> >>
> >> The goal is to split the section on every <split> element (just like
> >> a page would break the flowing text anywhere in the structure).
> >>
> >> Expected result :
> >>
> >> <root>
> >> <section>
> >> <title>Title</title>
> >> <content>
> >> <p>paragraph #1</p>
> >> <p>paragraph #2 to split</p>
> >> </content>
> >> </section>
> >> <split id="split-1"/>
> >> <section>
> >> <title>Title</title>
> >> <content>
> >> <p>paragraph #3</p>
> >> <p>paragraph #4 <strong> to split</strong></p>
> >> </content>
> >> </section>
> >> <split id="split-2"/>
> >> <section>
> >> <title>Title</title>
> >> <content>
> >> <p><strong> here</strong> if possible</p>
> >> <p>paragraph #5</p>
> >> <ul>
> >> <li>Item #1</li>
> >> <li>Item #2 to split </li>
> >> </ul>
> >> </content>
> >> </section>
> >> <split id="split-3"/>
> >> <section>
> >> <title>Title</title>
> >> <content>
> >> <ul>
> >> <li> here</li>
> >> <li>Item #3</li>
> >> <li>
> >> <ul>
> >> <li>Item #4</li>
> >> <li>Item #5 to <em>split</em></li>
> >> </ul>
> >> </li>
> >> </ul>
> >> </content>
> >> </section>
> >> <split id="split-4"/>
> >> <section>
> >> <title>Title</title>
> >> <content>
> >> <ul>
> >> <ul>
> >> <li> here if possible</li>
> >> </ul>
> >> <li>Item #6</li>
> >> </ul>
> >> <p>paragraph #6</p>
> >> </content>
> >> </section>
> >> </root>
> >>
> >> My idea was to iterate from 1 to the number of split elements + 1
> >> and working on the section with tunnel params so I can test for each
> >> node if it's before / after / in between (current) splits elements,
> >> and then decide to keep the node or not according to this position.
> >>
> >> I already used this kind of solution on a similar problem, long time
> >> ago. So I'll give it a try though I'm not not totally confident with
> >> it (because split elements can appear as inline content here).
> >>
> >> Please let me know if you have ideas, if my solution is the right or
> >> wrong way to go?
> >> Are there special design patterns for this kind of problem ?
> >> And last, have you ever faced this kind of splitting issue, any
> >> feedback welcome :)
> >>
> >> Cheers,
> >> Matthieu Ricaud-Dussarget
> >>
> >> --
> >> Matthieu Ricaud-Dussarget
> >> +33 6.63.25.95.58
> >>
> >> XSL-List info and archive [1]
> >> EasyUnsubscribe [2] (by email)
> >
> >  XSL-List info and archive [1]
> >  EasyUnsubscribe [3] (by email)
> >
> > Links:
> > ------
> > [1] http://www.mulberrytech.com/xsl/xsl-list
> > [2] http://lists.mulberrytech.com/unsub/xsl-list/293509
> > [3] http://lists.mulberrytech.com/unsub/xsl-list/3422410
>
> --
> Joel Kalvesmaki
> Director, Text Alignment Network
> http://textalign.net
>
>
>

--
Matthieu Ricaud-Dussarget
+33 6.63.25.95.58

Current Thread