Subject: Re: [xsl] breaking up XML on page break element From: "Michael Müller-Hillebrand mmh@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Wed, 16 Jul 2014 16:46:18 -0000 |
Hi Gerrit, What a great, cool, solution! I think of applying this to split FO page-sequences. Because we will have to deal with very large documents the question of streaming comes to my mind. I have not yet looked up the many many restrictions for streamable expressions, but does anyone of you have a feeling whether streaming could be used here? Like: Process each segment of relevant nodes separately. Thanks for your suggestions, - Michael Am 04.07.2014 um 20:20 schrieb Imsieke, Gerrit, le-tex gerrit.imsieke@xxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>: > <?xml version="1.0" encoding="UTF-8"?> > <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" > version="2.0"> > > <xsl:output indent="yes"/> > > <xsl:template match="* | @*" mode="#default"> > <xsl:copy> > <xsl:apply-templates select="@*, node()" mode="#current"/> > </xsl:copy> > </xsl:template> > > <xsl:template match="book" mode="#default"> > <xsl:variable name="context" select="." as="element(book)" /> > <xsl:copy> > <xsl:for-each-group select="descendant::node()[not(node())]" group-starting-with="pb"> > <xsl:copy-of select="self::pb"/> > <xsl:apply-templates select="$context/*" mode="split"> > <xsl:with-param name="restricted-to" select="current-group()/ancestor-or-self::node()" tunnel="yes"/> > </xsl:apply-templates> > </xsl:for-each-group> > </xsl:copy> > </xsl:template> > > <xsl:template match="node()" mode="split"> > <xsl:param name="restricted-to" as="node()+" tunnel="yes" /> > <xsl:if test="exists(. intersect $restricted-to)"> > <xsl:copy> > <xsl:copy-of select="@*" /> > <xsl:apply-templates mode="#current" /> > </xsl:copy> > </xsl:if> > </xsl:template> > > <xsl:template match="pb" mode="split"/> > > </xsl:stylesheet> > > On 04.07.2014 18:31, Geert Bormans geert@xxxxxxxxxxxxxxxxxxx wrote: >> Thanks Gerrit, >> (I admit I need to read this twice to get it, but that might be caused >> by the 0-1 and me not trying to miss all of the fun in Rio) >> I will look into it after the match >> >> >> At 17:18 4/07/2014, you wrote: >>> I tackle it by what I call C"b,Eupward projectionC"b,B: >>> >>> When processing the top-level element, do a for-each-group of all >>> descendants that are terminal nodes (those without children), with a >>> group-starting-with at the splitting points. >>> >>> For each group, process the book (or the HTML body, or whatever common >>> ancestor there is) once in another mode, with a tunneled parameter >>> 'restricted-to' that contains, for each group, the terminal nodes and >>> their ancestors. >>> >>> When processing each group, for each node that you encounter, test >>> whether the node is contained in the tunneled variable (using >>> intersect). If it is, reproduce the node and continue in this mode, if >>> it isnC"b,b"t contained, do nothing. >>> >>> There may be an option to discard or to reproduce the splitting elements. >>> >>> Examples for this technique are in >>> https://subversion.le-tex.de/common/evolve-hub/evolve-hub.xsl, modes >>> hub:split-at-tab and hub:split-at-br >>> >>> They are a bit more complex than your case because they split >>> paragraphs that may contain tables or footnotes that in turn can >>> contain other paragraphs. I introduced the function >>> hub:same-scope($splitting-element, $containing-element) to split only >>> at splitting elements that are contained within the paragraph that >>> should be split, rather than in a paragraph that is contained in a >>> footnote or table cell that is somehow contained in the given paragraph. >>> >>> I might prepare a synthetic standalone example if anyone is >>> interested, and furthermore on the condition that interested parties >>> root for Germany instead of France today. >>> >>> Gerrit >>> >>> On 04.07.2014 16:43, Geert Bormans geert@xxxxxxxxxxxxxxxxxxx wrote: >>>> Hi all, >>>> >>>> Here is a fun one I thought I could share >>>> >>>> I have a nicely nested XML (a bit TEI like) >>>> and markers for page breaks can happen everywhere in the document (as >>>> empty elements) >>>> >>>> Now I want to break the document per page, reconstructing the structure >>>> So in a first step, I want to isolate the pagebreak to the highest level >>>> >>>> <book> >>>> <title>...</title> >>>> <section> >>>> <para>aaa<pb/>bbb</para> >>>> </section> >>>> </book> >>>> >>>> to become >>>> >>>> <book> >>>> <title>...</title> >>>> <section> >>>> <para>aaa</para> >>>> </section> >>>> <pb/> >>>> <section> >>>> <para>bbb</para> >>>> </section> >>>> </book> >>>> >>>> Bearing in mind I need a generic solution >>>> and pagebreaks can happen at every level >>>> >>>> Any thoughts? >>>> I am not looking for code, just curious on how people would attack this >>>> >>>> Thanks >>>> >>>> Geert [demime 1.01d removed an attachment of type application/pgp-signature which had a name of signature.asc]
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] breaking up XML on page b, Wendell Piez wapiez@ | Thread | Re: [xsl] breaking up XML on page b, Imsieke, Gerrit, le- |
Re: [xsl] ImportError when importin, Scott Young sjyoung@ | Date | Re: [xsl] breaking up XML on page b, Imsieke, Gerrit, le- |
Month |