Re: [xsl] breaking up XML on page break element

Subject: Re: [xsl] breaking up XML on page break element
From: "Michael Müller-Hillebrand mmh@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 16 Jul 2014 16:46:18 -0000
Hi Gerrit,

What a great, cool, solution! I think of applying this to split FO
page-sequences.

Because we will have to deal with very large documents the question of
streaming comes to my mind.
I have not yet looked up the many many restrictions for streamable
expressions, but does anyone of you have a feeling whether streaming could be
used here? Like: Process each segment of relevant nodes separately.

Thanks for your suggestions,

- Michael

Am 04.07.2014 um 20:20 schrieb Imsieke, Gerrit, le-tex
gerrit.imsieke@xxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>:

> <?xml version="1.0" encoding="UTF-8"?>
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
>  version="2.0">
>
>  <xsl:output indent="yes"/>
>
>  <xsl:template match="* | @*" mode="#default">
>    <xsl:copy>
>      <xsl:apply-templates select="@*, node()" mode="#current"/>
>    </xsl:copy>
>  </xsl:template>
>
>  <xsl:template match="book" mode="#default">
>    <xsl:variable name="context" select="." as="element(book)" />
>    <xsl:copy>
>      <xsl:for-each-group select="descendant::node()[not(node())]"
group-starting-with="pb">
>        <xsl:copy-of select="self::pb"/>
>        <xsl:apply-templates select="$context/*" mode="split">
>          <xsl:with-param name="restricted-to"
select="current-group()/ancestor-or-self::node()" tunnel="yes"/>
>        </xsl:apply-templates>
>      </xsl:for-each-group>
>    </xsl:copy>
>  </xsl:template>
>
>  <xsl:template match="node()" mode="split">
>    <xsl:param name="restricted-to" as="node()+" tunnel="yes" />
>    <xsl:if test="exists(. intersect $restricted-to)">
>      <xsl:copy>
>        <xsl:copy-of select="@*" />
>        <xsl:apply-templates mode="#current" />
>      </xsl:copy>
>    </xsl:if>
>  </xsl:template>
>
>  <xsl:template match="pb" mode="split"/>
>
> </xsl:stylesheet>
>
> On 04.07.2014 18:31, Geert Bormans geert@xxxxxxxxxxxxxxxxxxx wrote:
>> Thanks Gerrit,
>> (I admit I need to read this twice to get it, but that might be caused
>> by the 0-1 and me not trying to miss all of the fun in Rio)
>> I will look into it after the match
>>
>>
>> At 17:18 4/07/2014, you wrote:
>>> I tackle it by what I call C"b,Eupward projectionC"b,B:
>>>
>>> When processing the top-level element, do a for-each-group of all
>>> descendants that are terminal nodes (those without children), with a
>>> group-starting-with at the splitting points.
>>>
>>> For each group, process the book (or the HTML body, or whatever common
>>> ancestor there is) once in another mode, with a tunneled parameter
>>> 'restricted-to' that contains, for each group, the terminal nodes and
>>> their ancestors.
>>>
>>> When processing each group, for each node that you encounter, test
>>> whether the node is contained in the tunneled variable (using
>>> intersect). If it is, reproduce the node and continue in this mode, if
>>> it isnC"b,b"t contained, do nothing.
>>>
>>> There may be an option to discard or to reproduce the splitting elements.
>>>
>>> Examples for this technique are in
>>> https://subversion.le-tex.de/common/evolve-hub/evolve-hub.xsl, modes
>>> hub:split-at-tab and hub:split-at-br
>>>
>>> They are a bit more complex than your case because they split
>>> paragraphs that may contain tables or footnotes that in turn can
>>> contain other paragraphs. I introduced the function
>>> hub:same-scope($splitting-element, $containing-element) to split only
>>> at splitting elements that are contained within the paragraph that
>>> should be split, rather than in a paragraph that is contained in a
>>> footnote or table cell that is somehow contained in the given paragraph.
>>>
>>> I might prepare a synthetic standalone example if anyone is
>>> interested, and furthermore on the condition that interested parties
>>> root for Germany instead of France today.
>>>
>>> Gerrit
>>>
>>> On 04.07.2014 16:43, Geert Bormans geert@xxxxxxxxxxxxxxxxxxx wrote:
>>>> Hi all,
>>>>
>>>> Here is a fun one I thought I could share
>>>>
>>>> I have a nicely nested XML (a bit TEI like)
>>>> and markers for page breaks can happen everywhere in the document (as
>>>> empty elements)
>>>>
>>>> Now I want to break the document per page, reconstructing the structure
>>>> So in a first step, I want to isolate the pagebreak to the highest level
>>>>
>>>> <book>
>>>> <title>...</title>
>>>> <section>
>>>> <para>aaa<pb/>bbb</para>
>>>> </section>
>>>> </book>
>>>>
>>>> to become
>>>>
>>>> <book>
>>>> <title>...</title>
>>>> <section>
>>>> <para>aaa</para>
>>>> </section>
>>>> <pb/>
>>>> <section>
>>>> <para>bbb</para>
>>>> </section>
>>>> </book>
>>>>
>>>> Bearing in mind I need a generic solution
>>>> and pagebreaks can happen at every level
>>>>
>>>> Any thoughts?
>>>> I am not looking for code, just curious on how people would attack this
>>>>
>>>> Thanks
>>>>
>>>> Geert

[demime 1.01d removed an attachment of type application/pgp-signature which had a name of signature.asc]

Current Thread