Subject: Re: [xsl] Moving element up hierarchy unless text nodes From: "Wendell Piez wapiez@xxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Mon, 6 Apr 2015 19:07:05 -0000 |
Dear James, I am relieved it seems to have passed all the tests so far! One thing that might shed light on the operation of this is the single edge case for which I think its behavior would be ... interesting, namely: <div><lg><l><pb/></l></lg></div> I hope and trust this never happens in your data. Cheers, Wendell On Mon, Apr 6, 2015 at 9:22 AM, James Cummings james@xxxxxxxxxxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > I _finally_ had a chance to test and make sure I think I understand the > clever solution Wendell came up with for moving <pb/> elements before or > after nodes with no text content and/or whitespace-only nodes. I must > apologise to him for delaying so long in doing so. Mea culpa. > > I've added some comments to the XSL to ensure I understood what was going > on. Although I've never really been good with key()s the bits that confused > me most were: > === > <!-- copy pb if it is both leading and trailing, thus stays put --> > <xsl:template match="pb"> > <xsl:if test="(. is key('leading-pb',generate-id())) and > (. is key('trailing-pb',generate-id()))"> > <xsl:copy-of select="."/> > </xsl:if> > </xsl:template> > === > Where if I understand it, a <pb/> is only copied if its generate-id is equal > to be leading-pb and trailng-pb key. (i.e. it is in the middle some elements > with text, or a text node, or similar, so it stays where it is.) > > The other confusing bit for me was the test in the leading/trailing-pb mode > matching any element but closer inspection I think means I understand it. > (Though never would have thought of it...) This tests for trailing-pb mode > that the result is empty for the follow-sibling nodes or text that isn't > just whitespace. Otherwise it generates an id. > === > <xsl:choose> > <xsl:when test="empty(following-sibling::*/(. except self::pb) | > following-sibling::text()[matches(.,'\S')])"> > <xsl:apply-templates select=".." mode="trailing-pb"/> > </xsl:when> > <xsl:otherwise> > <xsl:sequence select="generate-id()"/> > </xsl:otherwise> > </xsl:choose> > === > > I think I understand all the individual bits to this but still have > difficulty thinking through the whole thing. > > It does seem to work on all the tests I've tried. > > Thanks Wendell! > > -James > > =====full xslt=== > <!-- comments, processing instructions, text nodes and attributes --> > <xsl:template match="comment() | processing-instruction() | text() | > @*"> > <xsl:copy-of select="."/> > </xsl:template> > > <!-- copy elements separately so can move pb elements --> > <xsl:template match="*"> > <!-- copy the pb only if no ancestor considers it leading or > trailing --> > <xsl:copy-of select="key('leading-pb',generate-id())"/> > <!-- copy the element, attributes, and process nodes --> > <xsl:copy> > <xsl:apply-templates select="@* | node()"/> > </xsl:copy> > <xsl:copy-of select="key('trailing-pb',generate-id())"/> > </xsl:template> > > <!-- copy pb if it is both leading and trailing, thus stays put --> > <xsl:template match="pb"> > <xsl:if test="(. is key('leading-pb',generate-id())) and > (. is key('trailing-pb',generate-id()))"> > <xsl:copy-of select="."/> > </xsl:if> > </xsl:template> > > <!-- key for leading pb applying templates in leading-pb mode --> > <xsl:key name="leading-pb" match="pb"> > <xsl:apply-templates select="." mode="leading-pb"/> > </xsl:key> > <!-- key for trailing pb applying templates in trailing-pb mode --> > <xsl:key name="trailing-pb" match="pb"> > <xsl:apply-templates select="." mode="trailing-pb"/> > </xsl:key> > > <!-- everything directly under body generate an id --> > <xsl:template match="body/*" mode="leading-pb trailing-pb"> > <xsl:sequence select="generate-id()"/> > </xsl:template> > > <!-- when the preceding-sibling is empty or not whitespace > apply-templates in leading-pb to the parent --> > <xsl:template match="*" mode="leading-pb"> > <xsl:choose> > <xsl:when test="empty(preceding-sibling::*/(. except self::pb) | > preceding-sibling::text()[matches(.,'\S')])"> > <xsl:apply-templates select=".." mode="leading-pb"/> > </xsl:when> > <xsl:otherwise> > <xsl:sequence select="generate-id()"/> > </xsl:otherwise> > </xsl:choose> > </xsl:template> > > <!-- when the preceding-sibling is empty or not whitespace > apply-templates in leading-pb to the parent --> > <xsl:template match="*" mode="trailing-pb"> > <xsl:choose> > <xsl:when test="empty(following-sibling::*/(. except self::pb) | > following-sibling::text()[matches(.,'\S')])"> > <xsl:apply-templates select=".." mode="trailing-pb"/> > </xsl:when> > <xsl:otherwise> > <xsl:sequence select="generate-id()"/> > </xsl:otherwise> > </xsl:choose> > </xsl:template> > ===== > > > > > > > On Wed, Mar 4, 2015 at 12:36 AM, James Cummings james@xxxxxxxxxxxxxxxxx > <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: >> >> >> Cool Wendell! >> >> I've not had a chance to test this out yet, I may have to come back to you >> with some questions as I'm really not sure I understand that match pattern. >> I'll have a play with it. >> >> Many thanks! >> >> -James >> >> On Tue, Mar 3, 2015 at 7:48 PM, Wendell Piez wapiez@xxxxxxxxxxxxxxx >> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: >>> >>> Hi again James, >>> >>> So in the code I posted yesterday I realized at least one more >>> interesting improvement is possible. >>> >>> Instead of >>> >>> <xsl:template match="pb"> >>> <!-- Only copy the pb if no ancestor considers it 'leading' or >>> 'trailing'. --> >>> <xsl:if test="empty(ancestor::*/ >>> (key('leading-pb',generate-id()) | >>> key('trailing-pb',generate-id())) intersect . ) "> >>> <xsl:copy-of select="."/> >>> </xsl:if> >>> </xsl:template> >>> >>> We could have more directly and efficiently >>> >>> <xsl:template match="pb"> >>> <xsl:if test="(. is key('leading-pb',generate-id())) and >>> (. is key('trailing-pb',generate-id()))"> >>> <xsl:copy-of select="."/> >>> </xsl:if> >>> </xsl:template> >>> >>> >>> Or even (if you are crazy for match patterns, and who isn't) >>> >>> <xsl:template match="pb[empty(key('leading-pb',generate-id())) or >>> empty(key('trailing-pb',generate-id()))]"/> >>> >>> These work because the keys bind pb elements to themselves when they >>> are not 'leading' or 'trailing' (i.e. correctly outside not inside >>> their parent). >>> >>> Cheers, Wendell >>> >>> On Mon, Mar 2, 2015 at 2:11 PM, Wendell Piez wapiez@xxxxxxxxxxxxxxx >>> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: >>> > Hi James, >>> > >>> > So, try this. It works by assigning 'pb' elements to ancestors that >>> > consider them 'leading' (start the element off) or 'trailing'. They >>> > can be retrieved from (for) said ancestor using a key. >>> > >>> > Lightly tested. >>> > >>> > <xsl:template match="comment() | processing-instruction() | text() | >>> > @*"> >>> > <xsl:copy-of select="."/> >>> > </xsl:template> >>> > >>> > <xsl:template match="*"> >>> > <xsl:copy-of select="key('leading-pb',generate-id())"/> >>> > <xsl:copy> >>> > <xsl:apply-templates select="@* | node()"/> >>> > </xsl:copy> >>> > <xsl:copy-of select="key('trailing-pb',generate-id())"/> >>> > </xsl:template> >>> > >>> > <xsl:template match="pb"> >>> > <!-- Only copy the pb if no ancestor considers it 'leading' or >>> > 'trailing'. --> >>> > <xsl:if test="empty( >>> > ancestor::*/(key('leading-pb',generate-id()) | >>> > key('trailing-pb',generate-id())) intersect . ) "> >>> > <xsl:copy-of select="."/> >>> > </xsl:if> >>> > </xsl:template> >>> > >>> > <xsl:key name="leading-pb" match="pb"> >>> > <xsl:apply-templates select="." mode="leading-pb"/> >>> > </xsl:key> >>> > >>> > <xsl:key name="trailing-pb" match="pb"> >>> > <xsl:apply-templates select="." mode="trailing-pb"/> >>> > </xsl:key> >>> > >>> > <xsl:template match="body/*" mode="leading-pb trailing-pb"> >>> > <xsl:sequence select="generate-id()"/> >>> > </xsl:template> >>> > >>> > <xsl:template match="*" mode="leading-pb"> >>> > <xsl:choose> >>> > <xsl:when test="empty(preceding-sibling::*/(. except self::pb) | >>> > preceding-sibling::text()[matches(.,'\S')])"> >>> > <xsl:apply-templates select=".." mode="leading-pb"/> >>> > </xsl:when> >>> > <xsl:otherwise> >>> > <xsl:sequence select="generate-id()"/> >>> > </xsl:otherwise> >>> > </xsl:choose> >>> > </xsl:template> >>> > >>> > <xsl:template match="*" mode="trailing-pb"> >>> > <xsl:choose> >>> > <xsl:when test="empty(following-sibling::*/(. except self::pb) | >>> > following-sibling::text()[matches(.,'\S')])"> >>> > <xsl:apply-templates select=".." mode="trailing-pb"/> >>> > </xsl:when> >>> > <xsl:otherwise> >>> > <xsl:sequence select="generate-id()"/> >>> > </xsl:otherwise> >>> > </xsl:choose> >>> > </xsl:template> >>> > >>> > Feel free to ask for any explanation needed. It *seems* to work >>> > (although I often do not trust my lying eyes) ... :-) >>> > >>> > Cheers, Wendell >>> > >>> > On Fri, Feb 27, 2015 at 6:51 PM, James Cummings >>> > james@xxxxxxxxxxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> >>> > wrote: >>> >> >>> >> Hi there. >>> >> >>> >> We've been looking at canonicalising use of <pb/> in a large >>> >> collection of >>> >> TEI P5 XML texts. What we want to do is move this up the hierarchy >>> >> unless >>> >> there is text before or after it only stopping when there is a sibling >>> >> element with textual content or when it hits the body/back/front >>> >> elements. >>> >> i.e. someone might have encoded: >>> >> >>> >> >>> >> ====input==== >>> >> <body> >>> >> <div> >>> >> <lg> >>> >> <l><pb n="1"/> some text here</l> >>> >> <l>some text here <pb n="2"/></l> >>> >> </lg> >>> >> <lg> >>> >> <l>some text <pb n="3"/> some text</l> >>> >> <anchor xml:id="test"/> >>> >> <l><pb n="4"/>some text here</l> >>> >> <l>some text here <pb n="5"/></l> >>> >> <anchor xml:id="test2"/> >>> >> </lg> >>> >> </div> >>> >> <div> >>> >> <head>Some Text</head> >>> >> <lg> >>> >> <!-- A comment here --> >>> >> <l><pb n="6"/>Some text</l> >>> >> <l>Some text<pb n="7"/></l> >>> >> </lg> >>> >> </div> >>> >> </body> >>> >> ===== >>> >> >>> >> And what we'd want to end up with is: >>> >> >>> >> ===== >>> >> <body> >>> >> <pb n="1"/> >>> >> <div> >>> >> <lg> >>> >> <l> some text here</l> >>> >> <l>some text here </l> >>> >> </lg> >>> >> <pb n="2"/> >>> >> <lg> >>> >> <l>some text <pb n="3"/> some text</l> >>> >> <pb n="4"/> >>> >> <anchor xml:id="test"/> >>> >> <l>some text here</l> >>> >> <l>some text here </l> >>> >> <anchor xml:id="test2"/> >>> >> </lg> >>> >> </div> >>> >> <pb n="5"/> >>> >> <div> >>> >> <head>Some Text</head> >>> >> <pb n="6"/> >>> >> <lg> >>> >> <!-- A comment here --> >>> >> <l>Some text</l> >>> >> <l>Some text</l> >>> >> </lg> >>> >> </div> >>> >> <pb n="7"/> >>> >> </body> >>> >> ===== >>> >> >>> >> So as the <pb/> has text before/after it, it stays where it is. It >>> >> should >>> >> move to the level in the hierarchy where its >>> >> preceding-sibling::node()[1] >>> >> has text, passing over other empty elements or comments. (Of course, >>> >> as you >>> >> might expect) the markup could be any element names, I just use >>> >> div/lg/l >>> >> here because it is short and nicely hierarchicial as an example. My >>> >> approach >>> >> so far has been, on every element to try to test if there is text() >>> >> between >>> >> where I currently am and the following::pb[1] by selecting everything >>> >> between the start and the pb and looking at its normalised >>> >> string-length. >>> >> But so far these tests aren't working right, and I haven't even got my >>> >> head >>> >> round how to do it in reverse for <pb/> at the end. >>> >> >>> >> Has anyone done something like this before that I could look at? Any >>> >> suggestions? >>> >> >>> >> Thanks for any help! >>> >> >>> >> -James Cummings >>> >> XSL-List info and archive >>> >> EasyUnsubscribe (by email) >>> > >>> > >>> > >>> > -- >>> > Wendell Piez | http://www.wendellpiez.com >>> > XML | XSLT | electronic publishing >>> > Eat Your Vegetables >>> > _____oo_________o_o___ooooo____ooooooo_^ >>> > >>> >>> >>> >>> -- >>> Wendell Piez | http://www.wendellpiez.com >>> XML | XSLT | electronic publishing >>> Eat Your Vegetables >>> _____oo_________o_o___ooooo____ooooooo_^ >>> >> >> XSL-List info and archive >> EasyUnsubscribe (by email) > > > XSL-List info and archive > EasyUnsubscribe (by email) -- Wendell Piez | http://www.wendellpiez.com XML | XSLT | electronic publishing Eat Your Vegetables _____oo_________o_o___ooooo____ooooooo_^
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Moving element up hierarc, James Cummings james | Thread | Re: [xsl] Moving element up hierarc, Srinivas Gummula sri |
Re: [xsl] Moving element up hierarc, James Cummings james | Date | Re: [xsl] Moving element up hierarc, Srinivas Gummula sri |
Month |