Re: [xsl] Moving element up hierarchy unless text nodes

Subject: Re: [xsl] Moving element up hierarchy unless text nodes
From: "James Cummings james@xxxxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 6 Apr 2015 13:21:54 -0000
I _finally_ had a chance to test and make sure I think I understand the
clever solution Wendell came up with for moving <pb/> elements before or
after nodes with no text content and/or whitespace-only nodes. I must
apologise to him for delaying so long in doing so. Mea culpa.

I've added some comments to the XSL to ensure I understood what was going
on. Although I've never really been good with key()s the bits that confused
me most were:
===
    <!-- copy pb if it is both leading and trailing, thus stays put -->
    <xsl:template match="pb">
        <xsl:if test="(. is key('leading-pb',generate-id())) and
            (. is key('trailing-pb',generate-id()))">
            <xsl:copy-of select="."/>
        </xsl:if>
    </xsl:template>
===
Where if I understand it, a <pb/> is only copied if its generate-id is
equal to be leading-pb and trailng-pb key. (i.e. it is in the middle some
elements with text, or a text node, or similar, so it stays where it is.)

The other confusing bit for me was the test in the leading/trailing-pb mode
matching any element but closer inspection I think means I understand it.
(Though never would have thought of it...) This tests for trailing-pb mode
that the result is empty for the follow-sibling nodes or text that isn't
just whitespace.  Otherwise it generates an id.
===
   <xsl:choose>
            <xsl:when test="empty(following-sibling::*/(. except self::pb) |
                following-sibling::text()[matches(.,'\S')])">
                <xsl:apply-templates select=".." mode="trailing-pb"/>
            </xsl:when>
            <xsl:otherwise>
                <xsl:sequence select="generate-id()"/>
            </xsl:otherwise>
        </xsl:choose>
===

I think I understand all the individual bits to this but still have
difficulty thinking through the whole thing.

It does seem to work on all the tests I've tried.

Thanks Wendell!

-James

=====full xslt===
  <!-- comments, processing instructions, text nodes and attributes -->
    <xsl:template match="comment() | processing-instruction() | text() |
@*">
        <xsl:copy-of select="."/>
    </xsl:template>

    <!-- copy elements separately so can move pb elements -->
    <xsl:template match="*">
        <!-- copy the pb only if no ancestor considers it leading or
trailing -->
        <xsl:copy-of select="key('leading-pb',generate-id())"/>
        <!-- copy the element, attributes, and process nodes -->
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
        <xsl:copy-of select="key('trailing-pb',generate-id())"/>
    </xsl:template>

    <!-- copy pb if it is both leading and trailing, thus stays put -->
    <xsl:template match="pb">
        <xsl:if test="(. is key('leading-pb',generate-id())) and
            (. is key('trailing-pb',generate-id()))">
            <xsl:copy-of select="."/>
        </xsl:if>
    </xsl:template>

    <!-- key for leading pb applying templates in leading-pb mode -->
    <xsl:key name="leading-pb" match="pb">
        <xsl:apply-templates select="." mode="leading-pb"/>
    </xsl:key>
    <!-- key for trailing pb applying templates in trailing-pb mode -->
    <xsl:key name="trailing-pb" match="pb">
        <xsl:apply-templates select="." mode="trailing-pb"/>
    </xsl:key>

    <!-- everything directly under body generate an id -->
    <xsl:template match="body/*" mode="leading-pb trailing-pb">
        <xsl:sequence select="generate-id()"/>
    </xsl:template>

    <!-- when the preceding-sibling is empty or not whitespace
apply-templates in leading-pb to the parent -->
    <xsl:template match="*" mode="leading-pb">
        <xsl:choose>
            <xsl:when test="empty(preceding-sibling::*/(. except self::pb) |
                preceding-sibling::text()[matches(.,'\S')])">
                <xsl:apply-templates select=".." mode="leading-pb"/>
            </xsl:when>
            <xsl:otherwise>
                <xsl:sequence select="generate-id()"/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>

    <!-- when the preceding-sibling is empty or not whitespace
apply-templates in leading-pb to the parent -->
    <xsl:template match="*" mode="trailing-pb">
        <xsl:choose>
            <xsl:when test="empty(following-sibling::*/(. except self::pb) |
                following-sibling::text()[matches(.,'\S')])">
                <xsl:apply-templates select=".." mode="trailing-pb"/>
            </xsl:when>
            <xsl:otherwise>
                <xsl:sequence select="generate-id()"/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
 =====






On Wed, Mar 4, 2015 at 12:36 AM, James Cummings james@xxxxxxxxxxxxxxxxx <
xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

>
> Cool Wendell!
>
> I've not had a chance to test this out yet, I may have to come back to you
> with some questions as I'm really not sure I understand that match
> pattern.  I'll have a play with it.
>
> Many thanks!
>
> -James
>
> On Tue, Mar 3, 2015 at 7:48 PM, Wendell Piez wapiez@xxxxxxxxxxxxxxx <
> xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
>> Hi again James,
>>
>> So in the code I posted yesterday I realized at least one more
>> interesting improvement is possible.
>>
>> Instead of
>>
>> <xsl:template match="pb">
>>   <!-- Only copy the pb if no ancestor considers it 'leading' or
>> 'trailing'. -->
>>   <xsl:if test="empty(ancestor::*/
>>         (key('leading-pb',generate-id()) |
>>          key('trailing-pb',generate-id())) intersect . )  ">
>>     <xsl:copy-of select="."/>
>>   </xsl:if>
>> </xsl:template>
>>
>> We could have more directly and efficiently
>>
>>   <xsl:template match="pb">
>>     <xsl:if test="(. is key('leading-pb',generate-id())) and
>>             (. is key('trailing-pb',generate-id()))">
>>       <xsl:copy-of select="."/>
>>     </xsl:if>
>>   </xsl:template>
>>
>>
>> Or even (if you are crazy for match patterns, and who isn't)
>>
>> <xsl:template match="pb[empty(key('leading-pb',generate-id())) or
>>       empty(key('trailing-pb',generate-id()))]"/>
>>
>> These work because the keys bind pb elements to themselves when they
>> are not 'leading' or 'trailing' (i.e. correctly outside not inside
>> their parent).
>>
>> Cheers, Wendell
>>
>> On Mon, Mar 2, 2015 at 2:11 PM, Wendell Piez wapiez@xxxxxxxxxxxxxxx
>> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>> > Hi James,
>> >
>> > So, try this. It works by assigning 'pb' elements to ancestors that
>> > consider them 'leading' (start the element off) or 'trailing'. They
>> > can be retrieved from (for) said ancestor using a key.
>> >
>> > Lightly tested.
>> >
>> > <xsl:template match="comment() | processing-instruction() | text() |
>> @*">
>> >   <xsl:copy-of select="."/>
>> > </xsl:template>
>> >
>> > <xsl:template match="*">
>> >   <xsl:copy-of select="key('leading-pb',generate-id())"/>
>> >   <xsl:copy>
>> >     <xsl:apply-templates select="@* | node()"/>
>> >   </xsl:copy>
>> >   <xsl:copy-of select="key('trailing-pb',generate-id())"/>
>> > </xsl:template>
>> >
>> > <xsl:template match="pb">
>> >   <!-- Only copy the pb if no ancestor considers it 'leading' or
>> 'trailing'. -->
>> >   <xsl:if test="empty(
>> >     ancestor::*/(key('leading-pb',generate-id()) |
>> > key('trailing-pb',generate-id())) intersect . )  ">
>> >     <xsl:copy-of select="."/>
>> >   </xsl:if>
>> > </xsl:template>
>> >
>> > <xsl:key name="leading-pb" match="pb">
>> >   <xsl:apply-templates select="." mode="leading-pb"/>
>> > </xsl:key>
>> >
>> > <xsl:key name="trailing-pb" match="pb">
>> >   <xsl:apply-templates select="." mode="trailing-pb"/>
>> > </xsl:key>
>> >
>> > <xsl:template match="body/*" mode="leading-pb trailing-pb">
>> >   <xsl:sequence select="generate-id()"/>
>> > </xsl:template>
>> >
>> > <xsl:template match="*" mode="leading-pb">
>> >   <xsl:choose>
>> >     <xsl:when test="empty(preceding-sibling::*/(. except self::pb) |
>> > preceding-sibling::text()[matches(.,'\S')])">
>> >       <xsl:apply-templates select=".." mode="leading-pb"/>
>> >     </xsl:when>
>> >     <xsl:otherwise>
>> >       <xsl:sequence select="generate-id()"/>
>> >     </xsl:otherwise>
>> >   </xsl:choose>
>> > </xsl:template>
>> >
>> > <xsl:template match="*" mode="trailing-pb">
>> >   <xsl:choose>
>> >     <xsl:when test="empty(following-sibling::*/(. except self::pb) |
>> > following-sibling::text()[matches(.,'\S')])">
>> >       <xsl:apply-templates select=".." mode="trailing-pb"/>
>> >     </xsl:when>
>> >     <xsl:otherwise>
>> >       <xsl:sequence select="generate-id()"/>
>> >     </xsl:otherwise>
>> >   </xsl:choose>
>> > </xsl:template>
>> >
>> > Feel free to ask for any explanation needed. It *seems* to work
>> > (although I often do not trust my lying eyes) ... :-)
>> >
>> > Cheers, Wendell
>> >
>> > On Fri, Feb 27, 2015 at 6:51 PM, James Cummings
>> > james@xxxxxxxxxxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
>> > wrote:
>> >>
>> >> Hi there.
>> >>
>> >> We've been looking at canonicalising use of <pb/> in a large
>> collection of
>> >> TEI P5 XML texts. What we want to do is move this up the hierarchy
>> unless
>> >> there is text before or after it only stopping when there is a sibling
>> >> element with textual content or when it hits the body/back/front
>> elements.
>> >> i.e. someone might have encoded:
>> >>
>> >>
>> >> ====input====
>> >> <body>
>> >>     <div>
>> >>         <lg>
>> >>             <l><pb n="1"/> some text here</l>
>> >>             <l>some text here <pb n="2"/></l>
>> >>         </lg>
>> >>         <lg>
>> >>             <l>some text <pb n="3"/> some text</l>
>> >>             <anchor xml:id="test"/>
>> >>             <l><pb n="4"/>some text here</l>
>> >>             <l>some text here <pb n="5"/></l>
>> >>             <anchor xml:id="test2"/>
>> >>         </lg>
>> >>     </div>
>> >>     <div>
>> >>         <head>Some Text</head>
>> >>         <lg>
>> >>             <!-- A comment here -->
>> >>             <l><pb n="6"/>Some text</l>
>> >>             <l>Some text<pb n="7"/></l>
>> >>         </lg>
>> >>     </div>
>> >> </body>
>> >> =====
>> >>
>> >> And what we'd want to end up with is:
>> >>
>> >> =====
>> >> <body>
>> >>     <pb n="1"/>
>> >>     <div>
>> >>         <lg>
>> >>             <l> some text here</l>
>> >>             <l>some text here </l>
>> >>         </lg>
>> >>         <pb n="2"/>
>> >>         <lg>
>> >>             <l>some text <pb n="3"/> some text</l>
>> >>             <pb n="4"/>
>> >>             <anchor xml:id="test"/>
>> >>             <l>some text here</l>
>> >>             <l>some text here </l>
>> >>             <anchor xml:id="test2"/>
>> >>         </lg>
>> >>     </div>
>> >>     <pb n="5"/>
>> >>     <div>
>> >>         <head>Some Text</head>
>> >>         <pb n="6"/>
>> >>         <lg>
>> >>             <!-- A comment here -->
>> >>             <l>Some text</l>
>> >>             <l>Some text</l>
>> >>         </lg>
>> >>     </div>
>> >>     <pb n="7"/>
>> >> </body>
>> >> =====
>> >>
>> >> So as the <pb/> has text before/after it, it stays where it is. It
>> should
>> >> move to the level in the hierarchy where its
>> preceding-sibling::node()[1]
>> >> has text, passing over other empty elements or comments.  (Of course,
>> as you
>> >> might expect) the markup could be any element names, I just use
>> div/lg/l
>> >> here because it is short and nicely hierarchicial as an example. My
>> approach
>> >> so far has been, on every element to try to test if there is text()
>> between
>> >> where I currently am and the following::pb[1] by selecting everything
>> >> between the start and the pb and looking at its normalised
>> string-length.
>> >> But so far these tests aren't working right, and I haven't even got my
>> head
>> >> round how to do it in reverse for <pb/> at the end.
>> >>
>> >> Has anyone done something like this before that I could look at? Any
>> >> suggestions?
>> >>
>> >> Thanks for any help!
>> >>
>> >> -James Cummings
>> >> XSL-List info and archive
>> >> EasyUnsubscribe (by email)
>> >
>> >
>> >
>> > --
>> > Wendell Piez | http://www.wendellpiez.com
>> > XML | XSLT | electronic publishing
>> > Eat Your Vegetables
>> > _____oo_________o_o___ooooo____ooooooo_^
>> >
>>
>>
>>
>> --
>> Wendell Piez | http://www.wendellpiez.com
>> XML | XSLT | electronic publishing
>> Eat Your Vegetables
>> _____oo_________o_o___ooooo____ooooooo_^
>>
>>
>   XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
> EasyUnsubscribe <-list/1053205> (by
> email <>)

Current Thread