Re: [xsl] breaking up XML on page break element

Subject: Re: [xsl] breaking up XML on page break element
From: "Wendell Piez wapiez@xxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 7 Jul 2014 17:55:32 -0000
Uhoh I wasn't reading ...

... compare my solution here:

https://github.com/wendellpiez/MITH_XSLT/blob/master/xslt/p-promote.xsl

plus there's an older version here:

http://piez.org/wendell/projects/Interedition2011/lib/p5o-browser-html.xsl

In Luminescent (my "hobby" LMNL processing framework) there's a fair
amount of this stuff (reducing and promoting hierarchies). The fact
that we can generalize methods to do this in XSLT 2.0 is fantastic.
:-)

Cheers, Wendell


On Mon, Jul 7, 2014 at 4:53 AM, Geert Bormans
geert@xxxxxxxxxxxxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
wrote:
> Hi Gerrit,
>
> First my congratulations to the German team
> (I admit they should have scored an extra goal...
> given I made a bet at the office for 2-0, that would have brought me some
> cash :-)
>
> Thanks very much for this solution.
> It is exactly what I was looking for.
> It seems robust and elegant, and I love patterns with a name ;-)
>
> Thanks a ton
>
> Geert
>
>
> At 20:20 4/07/2014, you wrote:
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
>>   version="2.0">
>>
>>   <xsl:output indent="yes"/>
>>
>>   <xsl:template match="* | @*" mode="#default">
>>     <xsl:copy>
>>       <xsl:apply-templates select="@*, node()" mode="#current"/>
>>     </xsl:copy>
>>   </xsl:template>
>>
>>   <xsl:template match="book" mode="#default">
>>     <xsl:variable name="context" select="." as="element(book)" />
>>     <xsl:copy>
>>       <xsl:for-each-group select="descendant::node()[not(node())]"
>> group-starting-with="pb">
>>         <xsl:copy-of select="self::pb"/>
>>         <xsl:apply-templates select="$context/*" mode="split">
>>           <xsl:with-param name="restricted-to"
>> select="current-group()/ancestor-or-self::node()" tunnel="yes"/>
>>         </xsl:apply-templates>
>>       </xsl:for-each-group>
>>     </xsl:copy>
>>   </xsl:template>
>>
>>   <xsl:template match="node()" mode="split">
>>     <xsl:param name="restricted-to" as="node()+" tunnel="yes" />
>>     <xsl:if test="exists(. intersect $restricted-to)">
>>       <xsl:copy>
>>         <xsl:copy-of select="@*" />
>>         <xsl:apply-templates mode="#current" />
>>       </xsl:copy>
>>     </xsl:if>
>>   </xsl:template>
>>
>>   <xsl:template match="pb" mode="split"/>
>>
>> </xsl:stylesheet>
>>
>> On 04.07.2014 18:31, Geert Bormans geert@xxxxxxxxxxxxxxxxxxx wrote:
>>>
>>> Thanks Gerrit,
>>> (I admit I need to read this twice to get it, but that might be caused
>>> by the 0-1 and me not trying to miss all of the fun in Rio)
>>> I will look into it after the match
>>>
>>>
>>> At 17:18 4/07/2014, you wrote:
>>>>
>>>> I tackle it by what I call CB"b,Eupward projectionCB"b,C :
>>
>> :
>>>>
>>>>
>>>> When processing the top-level element, do a for-each-group of all
>>>> descendants that are terminal nodes (those without children), with a
>>>> group-starting-with at the splitting points.
>>>>
>>>> For each group, process the book (or the HTML body, or whatever common
>>>> ancestor there is) once in another mode, with a tunneled parameter
>>>> 'restricted-to' that contains, for each group, the terminal nodes and
>>>> their ancestors.
>>>>
>>>> When processing each group, for each node that you encounter, test
>>>> whether the node is contained in the tunneled variable (using
>>>> intersect). If it is, reproduce the node and continue in this mode, if
>>>> it isnCB"b,b"t contained, do nothing.
>>
>> .
>>>>
>>>>
>>>> There may be an option to discard or to reproduce the splitting
>>>> elements.
>>>>
>>>> Examples for this technique are in
>>>> https://subversion.le-tex.de/common/evolve-hub/evolve-hub.xsl, modes
>>>> hub:split-at-tab and hub:split-at-br
>>>>
>>>> They are a bit more complex than your case because they split
>>>> paragraphs that may contain tables or footnotes that in turn can
>>>> contain other paragraphs. I introduced the function
>>>> hub:same-scope($splitting-element, $containing-element) to split only
>>>> at splitting elements that are contained within the paragraph that
>>>> should be split, rather than in a paragraph that is contained in a
>>>> footnote or table cell that is somehow contained in the given paragraph.
>>>>
>>>> I might prepare a synthetic standalone example if anyone is
>>>> interested, and furthermore on the condition that interested parties
>>>> root for Germany instead of France today.
>>>>
>>>> Gerrit
>>>>
>>>> On 04.07.2014 16:43, Geert Bormans geert@xxxxxxxxxxxxxxxxxxx wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> Here is a fun one I thought I could share
>>>>>
>>>>> I have a nicely nested XML (a bit TEI like)
>>>>> and markers for page breaks can happen everywhere in the document (as
>>>>> empty elements)
>>>>>
>>>>> Now I want to break the document per page, reconstructing the structure
>>>>> So in a first step, I want to isolate the pagebreak to the highest
>>>>> level
>>>>>
>>>>> <book>
>>>>> <title>...</title>
>>>>> <section>
>>>>> <para>aaa<pb/>bbb</para>
>>>>> </section>
>>>>> </book>
>>>>>
>>>>> to become
>>>>>
>>>>> <book>
>>>>> <title>...</title>
>>>>> <section>
>>>>> <para>aaa</para>
>>>>> </section>
>>>>> <pb/>
>>>>> <section>
>>>>> <para>bbb</para>
>>>>> </section>
>>>>> </book>
>>>>>
>>>>> Bearing in mind I need a generic solution
>>>>> and pagebreaks can happen at every level
>>>>>
>>>>> Any thoughts?
>>>>> I am not looking for code, just curious on how people would attack this
>>>>>
>>>>> Thanks
>>>>>
>>>>> Geert
>>>>
>>>>
>>>> --
>>>> Gerrit Imsieke
>>>> GeschCFCB$ftsfCFCB<hrer / Managing Director
>>>>
>>>> le-tex publishing services GmbH
>>>> Weissenfelser Str. 84, 04229 Leipzig, Germany
>>>> Phone +49 341 355356 110, Fax +49 341 355356 510
>>>> gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de
>>>>
>>>> Registergericht / Commercial Register: Amtsgericht Leipzig
>>>> Registernummer / Registration Number: HRB 24930
>>>>
>>>> GeschCFCB$ftsfCFCB<hrer: Gerrit Imsieke, Svea Jelonek,
>>>> Thomas Schmidt, Dr. Reinhard VCFCB6ckler
>>
>>
>> --
>> Gerrit Imsieke
>> GeschCB$ftsfCB<hrer / Managing Director
>> le-tex publishing services GmbH
>> Weissenfelser Str. 84, 04229 Leipzig, Germany
>> Phone +49 341 355356 110, Fax +49 341 355356 510
>> gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de
>>
>> Registergericht / Commercial Register: Amtsgericht Leipzig
>> Registernummer / Registration Number: HRB 24930
>>
>> GeschCB$ftsfCB<hrer: Gerrit Imsieke, Svea Jelonek,
>> Thomas Schmidt, Dr. Reinhard VCB6ckler
>>
>



--
Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^

Current Thread