Re: [xsl] Finding Only Initial Following Siblings That Meet Some Criteria

Subject: Re: [xsl] Finding Only Initial Following Siblings That Meet Some Criteria
From: "Imsieke, Gerrit, le-tex gerrit.imsieke@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 6 Feb 2020 00:09:06 -0000
On 06.02.2020 00:42, Eliot Kimber ekimber@xxxxxxxxxxxx wrote:
In my case, I must start with the first instance of the matching phrase anywhere in the source document (I'm pulling stuff that could be anywhere to a specific location) and then only want to consider things that immediately follow that specific <ph> element.

You didnbt show that you were matching the first occurrence of ph[@outputclass] in the whole document. Well, if you did and if ph[@outputclass] may occur at other places than following siblings of that first occurrence, your solution won't group them all. It won't consider ph[@outputclass] in the next <p>, for example.


If you want to match the first ph[@outputclass] in each p though and if the content that precedes this occurrence ought to be preserved (but not duplicated), you need to process the preceding siblings, too. The danger in looking ahead and behind instead of processing all nodes, by means of a grouping of the parent, is that you might end up processing nodes twice if you just do an apply-templates in p, or not processing them at all if you only start with the first occurrence and don't process its preceding siblings.


So unless I'm missing a subtlety of your solution, I don't think it would do quite what I want because it's too inclusive.

I'd argue that what I proposed is not so subtly different from your solution. As a principle, it's "always try to process all nodes in a given context with for-each-group, and avoid cherry-picking specific child nodes that will or will not also group some of their siblings".


But unless I know which node you matched and what else you processed in that context, I don't know whether you made sure to avoid duplicating or neglecting content.

Gerrit


Cheers,


E.
--
Eliot Kimber
http://contrext.com


o;?On 2/5/20, 5:07 PM, "Imsieke, Gerrit, le-tex gerrit.imsieke@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

Grouping should liberate you from looking ahead or behind. So instead of
matching the first <ph outputclass="x">, you'd match <p> (or more
generally '*[ph[@outputclass]]') and do the group-adjacent grouping for
the child nodes, like this:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
version="3.0">
<xsl:template match="*[ph[@outputclass]]">
<xsl:copy>
<xsl:apply-templates select="@*" mode="#current"/>
<xsl:for-each-group select="node()"
group-adjacent="string(self::ph/@outputclass)">
<xsl:choose>
<xsl:when test="current-grouping-key()">
<xsl:element name="{current-grouping-key()}">
<xsl:value-of select="current-group()"
separator=""/>
</xsl:element>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="current-group()"
mode="#current"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
<xsl:mode on-no-match="shallow-copy"/>
</xsl:stylesheet>
This is not shorter in terms of lines of code than what you suggested.
In terms of performance, it could be a bit more efficient than your
solution, depending on the cost of identifying the first
ph[@output-class] and its following siblings, compared to the cost of
identifying a parent of ph[@output-class] and selecting its children.
But as I wanted to say above, in terms of idiomatic XSLT 2+ purity, I'd
always prefer a solution that doesn't look along the preceding/following
axes, even when it is done just once for selecting the for-each-group
population.
Gerrit
On 05.02.2020 23:29, Eliot Kimber ekimber@xxxxxxxxxxxx wrote:
> In my XML I can have adjacent elements that should be processed as a unit, where the adjacent elements all have the same value for a given attribute. Other elements with the same attribute could be following siblings but separated by other elements or text nodes, i.e.:
>
> <p>Text <ph outputclass="x">1</ph><ph outputclass="x">2</ph> more text <ph outputclass="x">New sequence</ph></p>
>
> Where the rendered result should combine the first two <ph> elements but not the third, i.e.:
>
> <p>Text <x>12</x> more text <x>New sequence</x></p>
>
> Processing is applied to the first element in the document with the @outputclass value "x" and then I want to grab any immediately following siblings with the same @outputclass value and no intervening text or element nodes.
>
> My solution is to use for-each-group like so:
>
> <xsl:variable name="this" as="element()" select="."/>
> <xsl:variable name="adjacent-sibs" as="element()+">
> <xsl:for-each-group select="($this, $this/following-sibling::node())"
> group-adjacent="string(@outputclass)">
> <xsl:if test=". is $this">
> <xsl:sequence select="current-group()"/>
> </xsl:if>
> </xsl:for-each-group>
> </xsl:variable>
>
> Which works, but I'm thinking there must be a more compact way to do the same selection, but the formulation is escaping me.
>
> Is there a more compact or more efficient way to make this selection of only immediately-adjacent following siblings?
>
> Thanks,
>
> E.
> --
> Eliot Kimber
> http://contrext.com

Current Thread