Re: [xsl] Joining list fragments

Subject: Re: [xsl] Joining list fragments
From: "Imsieke, Gerrit, le-tex gerrit.imsieke@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 6 May 2020 06:25:49 -0000
Ok, it turned out that some recursion is necessary.

Michael (MC<ller-Hillebrand) sent me an updated test file and the expected results. As one can expect, the problem is even more complex than Michael's initial sample input suggests, due to the merging on multiple levels that is required.

But today I take pride in saying that the self-declared king of grouping (I) was able to solve it!

https://github.com/gimsieke/join-list-fragments

The solution is remotely similar to what I presented about "upward projection" at XML Prague 2019 (https://subversion.le-tex.de/common/presentations/2019-02-09_xmlprague_xslt-upward-projection/slides/) in that leaf nodes are grouped and the surrounding subtree is later reconstructed.

If you run the example (apply xsl/join-list-fragments to test/sample_html.xml in #default mode), you will notice that a file debug1_atomic-items.xml is created. This is a somewhat flattened input that I looked at intensely and that I gradually modified when I set up the grouping. I can't stress enough how much looking at this semi-flattened file and the ad-hoc attributes that I created informed the evolution of the grouping. Without this debugging output, it would have been too complex to understand what is going on and what should happen in the recursive grouping.

The debugging output has the following additional attributes:

list-level: 0 for uninteresting elements, absent attribute for elements that need to be collected with the preceding list item, any other positive value indicates the nesting depth at which a new list item will be created for the group starting at that element

start: 'true' for an element that will become the first item of a (re-) created top-level ol element

start-level: the depth at which a re-created ol element will be created (2 indicates an ol/li/ol). This attribute is not used for top-level lists, where @start is used.

It may be that an additional recursion is necessary if there is more variation than start-level="2". Maybe MMH can create more input that also contains such a case, but it might well be that it isn't relevant fpr their problem.

I might eventually add more documentation to the XSLT. At this stage, even with what I wrote above, it's a bit obscure -- write-only code -- which often is the case for recursive grouping. Running it in oXygen debugger with appropriate breakpoints and with inspecting current-group() might further illustrate how it works.

Gerrit

On 03.05.2020 15:44, Imsieke, Gerrit, le-tex wrote:
There were two redundancies; I put a modified version into this Gist:
https://gist.github.com/gimsieke/56311eee455bf43b2f685e9cfa699c37

On 03.05.2020 15:28, Imsieke, Gerrit, le-tex gerrit.imsieke@xxxxxxxxx wrote:
Ok, itbs a triply-nested grouping now (starting-with/adjacent/starting-with):

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="3.0"
B B xmlns:my="http://localhost/";
B B xmlns:xs="http://www.w3.org/2001/XMLSchema";
B B exclude-result-prefixes="my xs">
B B <xsl:mode on-no-match="shallow-copy"/>


B B <xsl:template match="*[ol[@data-meta = 'listlevel=start']]">
B B B B <xsl:copy>
B B B B B B <xsl:apply-templates select="@*"/>
B B B B B B <!-- named template: then you can call it recursively for lower list levels
B B B B B B B B B B B (not used here) -->
B B B B B B <xsl:call-template name="collect">
B B B B B B B B <!-- We are grouping a) the list *items* in both special list types and
B B B B B B B B B B b) any other children of the current div. These a) and b) nodes are
B B B B B B B B returned by my:atomic-items() in document order: -->
B B B B B B B B <xsl:with-param name="nodes" select="my:atomic-items(.)"/>
B B B B B B </xsl:call-template>
B B B B </xsl:copy>
B B </xsl:template>


B B  <xsl:function name="my:atomic-items" as="node()*">
B B B B  <xsl:param name="context" as="element()"/>
B B B B  <xsl:variable name="special-lists" as="element(ol)*"
B B B B B B  select="$context/(B  ol[@data-meta = 'listlevel=start']
B B B B B B B B B B B B B B B B B B B B B B B B  | ol[@data-meta = 'listlevel=continue']
B B B B B B B B B B B B B B B B B B B B B B B  )" />
B B B B  <xsl:sequence
B B B B B B  select="$context/* except $special-lists | $special-lists/li"/>
B B  </xsl:function>

B B <xsl:template name="collect">
B B B B <xsl:param name="nodes" as="node()*"/>
B B B B <!-- First grouping: Start with
B B B B B B ol[@data-meta = 'listlevel=start']/li[1] -->
B B B B <xsl:for-each-group select="$nodes"
B B B B B B group-starting-with="li[parent::ol[@data-meta = 'listlevel=start']]
B B B B B B B B B B B B B B B B B B B B B B B B B B B B B [. is ../*[1]]">
B B B B B B <xsl:choose>
B B B B B B B B <!-- The first group can, in principle, start with any other node
B B B B B B B B B B that precedes the actual first list starting item. Such an
B B B B B B B B B B uninteresting initial group will be processed in otherwise. -->
B B B B B B B B <xsl:when test="parent::ol[@data-meta = 'listlevel=start']">
B B B B B B B B B B <!-- This is a grouping that may result in at most two groups:
B B B B B B B B B B B B The current list and anything non-collected and non-continuing
B B B B B B B B B B B B that might come after it (<p>Other arbitrary content</p>): -->
B B B B B B B B B B <xsl:for-each-group select="current-group()"
B B B B B B B B B B B B group-adjacent="exists(self::li)
B B B B B B B B B B B B B B B B B B B B B B B B B B B B or
B B B B B B B B B B B B B B B B B B B B B B B B B B B B @data-meta = ('collect', 'listlevel=continue')">
B B B B B B B B B B B B <xsl:choose>
B B B B B B B B B B B B B B <xsl:when test="current-grouping-key()">
B B B B B B B B B B B B B B B B <!-- We re-create the surrounding ol: -->
B B B B B B B B B B B B B B B B <ol>
B B B B B B B B B B B B B B B B B B <!-- The context element is the first li
B B B B B B B B B B B B B B B B B B B B in an ol[@data-meta = 'listlevel=start'] -->
B B B B B B B B B B B B B B B B B B <xsl:copy-of select="../@data-meta"/>
B B B B B B B B B B B B B B B B B B <xsl:for-each-group select="current-group()"
B B B B B B B B B B B B B B B B B B B B group-starting-with="li[not(@data-meta =
B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B 'listitem=continue')]">
B B B B B B B B B B B B B B B B B B B B <!-- Create a new li: -->
B B B B B B B B B B B B B B B B B B B B <xsl:copy>
B B B B B B B B B B B B B B B B B B B B B B <xsl:copy-of select="@data-meta"/>
B B B B B B B B B B B B B B B B B B B B B B <xsl:apply-templates
B B B B B B B B B B B B B B B B B B B B B B B B select="B current-group()/self::li/node()
B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B | current-group()[empty(self::li)]"/>
B B B B B B B B B B B B B B B B B B B B </xsl:copy>
B B B B B B B B B B B B B B B B B B </xsl:for-each-group>
B B B B B B B B B B B B B B B B </ol>
B B B B B B B B B B B B B B </xsl:when>
B B B B B B B B B B B B B B <xsl:otherwise>
B B B B B B B B B B B B B B B B <xsl:apply-templates select="current-group()"/>
B B B B B B B B B B B B B B </xsl:otherwise>
B B B B B B B B B B B B </xsl:choose>
B B B B B B B B B B </xsl:for-each-group>
B B B B B B B B </xsl:when>
B B B B B B B B <xsl:otherwise>
B B B B B B B B B B <xsl:apply-templates select="current-group()"/>
B B B B B B B B </xsl:otherwise>
B B B B B B </xsl:choose>
B B B B </xsl:for-each-group>
B B </xsl:template>


</xsl:stylesheet>

Result:

<div>
B B  <h2 id="E2">Item with content to be joined follows div to collect</h2>
B B  <div>
B B B B  <ol data-meta="listlevel=start">
B B B B B B  <li>
B B B B B B B B  <p>1st item</p>
B B B B B B B B  <div class="box" data-meta="collect">
B B B B B B B B B B  <p>Hint</p>
B B B B B B B B  </div>
B B B B B B B B  <p>Para ff</p>
B B B B B B  </li>
B B B B B B  <li>
B B B B B B B B  <p>2nd item</p>
B B B B B B  </li>
B B B B  </ol>
B B B B  <p>Other arbitrary content</p>
B B  </div>
</div>

This grouping, although it might seem convoluted, is better than cherry-picking and reassembling nodes, because the input is always completely covered (selected) and dealt with (apart from the special ol elements that are not part of the grouping population and need to be recreated). It's less prone to losing some of the input, and in general also less prone to duplicating content.

I hope it does the job right. If not, I'll create a Gist on Github and we can edit it until it works.

Gerrit


On 03.05.2020 12:07, Michael Kay mike@xxxxxxxxxxxx wrote:


On 3 May 2020, at 09:54, gerrit.imsieke@xxxxxxxxx <mailto:gerrit.imsieke@xxxxxxxxx> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx <mailto:xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>> wrote:

This looks like a nested group-starting-with / group-adjacent to me at first glance.

My reaction too. Of course the various options on xsl:for-each-group are "stereotypes" - they handle 90% of grouping requirements but there are cases they can't cope with, and in that case you have to go to lower-level solutions. The "window" clause in XQuery is more powerful - perhaps it takes you from the 90% level to 99%. There's also a small minority of cases where XQuery's "for $x at $p in..." can be very valuable. But generally, anything that involves manipulating a sequence using integer positions smells of desperation, and makes one look hard to see if there isn't some better way.


Michael Kay
Saxonica



Sent from MailDroid <https://goo.gl/ODgwBb>

-----Original Message-----
From: "Michael MC<ller-Hillebrand mmh@xxxxxxxxx <mailto:mmh@xxxxxxxxx>" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx <mailto:xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>>
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx <mailto:xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
Sent: Sun, 03 May 2020 10:33
Subject: Re: [xsl] Joining list fragments (was: Handling position in non-atomic sequences)


Am 02.05.2020 um 13:43 schrieb Michael Kay:

You're talking about a solution, but I don't think you've told us what the problem is. What are you trying to achieve?

Fair enough,


sometimes one is so deep wrapping onebs thoughts around a detail problem, that the broader picture gets lost.

We want to join list fragments and some content in between them. An HTML-ish version of the input looks like this:

<div>
<h2id="E2">Item with content to be joined follows div to collect</h2>
<div>
<oldata-meta="listlevel=start">
<li>
<p>1st item</p>
</li>
</ol>
<divclass="box"data-meta="collect">
<p>Hint</p>
</div>
<oldata-meta="listlevel=continue">
<lidata-meta="listitem=continue">
<p>Para ff</p>
</li>
<li>
<p>2nd item</p>
</li>
</ol>
<p>Other arbitrary content</p>
</div>
</div>

Every broken list sequence starts with data-meta="listlevel=start" and a list or a list item that is supposed to be joined with the start list is marked using data-meta="listlevel=continue" and data-meta="listitem=continue". There can be any number of collect items between lists and multiple continue lists, but it is guaranteed that whatever needs to be collected will end with a list. In DTD content model notation: startList, (collectItem*, continueList)+

The lists are not limited to a single level. Gladly, if there is a "listitem=continue" in a continue list, it is guaranteed to be at the same level the previous list ends.

The task is to add to the last item of the previous list:
* all content marked "collect" between the lists; other content would break the process
* content of the next listbs first list item if marked "listitem=continue"
The remaining content of each continue list would be added as additional items to the start list.


The desired result for the input data above would look like this:

<div>
<h2id="E2">Item with content to be joined follows div to collect</h2>
<div>
<oldata-meta="listlevel=start">
<li>
<p>1st item</p>
<divclass="box"data-meta="collect">
<p>Hint</p>
</div>
<p>Para ff</p>
</li>
<li>
<p>2nd item</p>
</li>
</ol>
<p>Other arbitrary content</p>
</div>
</div>

All the "collect" content and continue lists are at the same hierarchical level as the start list. So my initial strategy was to begin with the start list and collect all "valid" siblings (collect, listlevel="continue") by walking the following-sibling axis one by one. The result would be a sequence of elements that have to be processed into the start list at various locations.

My original question was how to find the position of the next <ol> in that sequence to easily use subsequence(). And I got some helpful pointers for that (I even looked at xsltfunctions.com <http://xsltfunctions.com/>, but due to my xsl:copy-of decision node identity seemed not to be an option), thanks a lot!

But currently I am doubting my general strategy and I have the feeling I am missing one very obvious thing. The task is basic tree transformation, and my current ideas all look very complicated. For the simplified data I cannot yet share XSLT code.

Basically I use a mode to process the continue lists and their items. Within each template I add special handling for "continue" content (ignoring the elements,B just processing their content). For the very last item in each list I add rules to process the collect elements. Getting to the content of a <liB data-meta="listitem=continue"> and making sure it is processed only once is where I am stuck at.

Currently I think of putting more effort in the collect phase, e.g. to already split continue lists in two parts: the continue item, which is then easily processed with the collect items, and the rest of the list, which will always be new items or sub-lists.

Pointers if anyone has had success for a problem like this, would be very welcome. (And I wonder if I have to deal with position in element sequences at all.)

-B Michael MC<ller-Hillebrand

Current Thread