Subject: Re: [xsl] breaking up XML on page break element From: "Wendell Piez wapiez@xxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Mon, 7 Jul 2014 17:55:32 -0000 |
Uhoh I wasn't reading ... ... compare my solution here: https://github.com/wendellpiez/MITH_XSLT/blob/master/xslt/p-promote.xsl plus there's an older version here: http://piez.org/wendell/projects/Interedition2011/lib/p5o-browser-html.xsl In Luminescent (my "hobby" LMNL processing framework) there's a fair amount of this stuff (reducing and promoting hierarchies). The fact that we can generalize methods to do this in XSLT 2.0 is fantastic. :-) Cheers, Wendell On Mon, Jul 7, 2014 at 4:53 AM, Geert Bormans geert@xxxxxxxxxxxxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > Hi Gerrit, > > First my congratulations to the German team > (I admit they should have scored an extra goal... > given I made a bet at the office for 2-0, that would have brought me some > cash :-) > > Thanks very much for this solution. > It is exactly what I was looking for. > It seems robust and elegant, and I love patterns with a name ;-) > > Thanks a ton > > Geert > > > At 20:20 4/07/2014, you wrote: >> >> <?xml version="1.0" encoding="UTF-8"?> >> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >> version="2.0"> >> >> <xsl:output indent="yes"/> >> >> <xsl:template match="* | @*" mode="#default"> >> <xsl:copy> >> <xsl:apply-templates select="@*, node()" mode="#current"/> >> </xsl:copy> >> </xsl:template> >> >> <xsl:template match="book" mode="#default"> >> <xsl:variable name="context" select="." as="element(book)" /> >> <xsl:copy> >> <xsl:for-each-group select="descendant::node()[not(node())]" >> group-starting-with="pb"> >> <xsl:copy-of select="self::pb"/> >> <xsl:apply-templates select="$context/*" mode="split"> >> <xsl:with-param name="restricted-to" >> select="current-group()/ancestor-or-self::node()" tunnel="yes"/> >> </xsl:apply-templates> >> </xsl:for-each-group> >> </xsl:copy> >> </xsl:template> >> >> <xsl:template match="node()" mode="split"> >> <xsl:param name="restricted-to" as="node()+" tunnel="yes" /> >> <xsl:if test="exists(. intersect $restricted-to)"> >> <xsl:copy> >> <xsl:copy-of select="@*" /> >> <xsl:apply-templates mode="#current" /> >> </xsl:copy> >> </xsl:if> >> </xsl:template> >> >> <xsl:template match="pb" mode="split"/> >> >> </xsl:stylesheet> >> >> On 04.07.2014 18:31, Geert Bormans geert@xxxxxxxxxxxxxxxxxxx wrote: >>> >>> Thanks Gerrit, >>> (I admit I need to read this twice to get it, but that might be caused >>> by the 0-1 and me not trying to miss all of the fun in Rio) >>> I will look into it after the match >>> >>> >>> At 17:18 4/07/2014, you wrote: >>>> >>>> I tackle it by what I call CB"b,Eupward projectionCB"b,C : >> >> : >>>> >>>> >>>> When processing the top-level element, do a for-each-group of all >>>> descendants that are terminal nodes (those without children), with a >>>> group-starting-with at the splitting points. >>>> >>>> For each group, process the book (or the HTML body, or whatever common >>>> ancestor there is) once in another mode, with a tunneled parameter >>>> 'restricted-to' that contains, for each group, the terminal nodes and >>>> their ancestors. >>>> >>>> When processing each group, for each node that you encounter, test >>>> whether the node is contained in the tunneled variable (using >>>> intersect). If it is, reproduce the node and continue in this mode, if >>>> it isnCB"b,b"t contained, do nothing. >> >> . >>>> >>>> >>>> There may be an option to discard or to reproduce the splitting >>>> elements. >>>> >>>> Examples for this technique are in >>>> https://subversion.le-tex.de/common/evolve-hub/evolve-hub.xsl, modes >>>> hub:split-at-tab and hub:split-at-br >>>> >>>> They are a bit more complex than your case because they split >>>> paragraphs that may contain tables or footnotes that in turn can >>>> contain other paragraphs. I introduced the function >>>> hub:same-scope($splitting-element, $containing-element) to split only >>>> at splitting elements that are contained within the paragraph that >>>> should be split, rather than in a paragraph that is contained in a >>>> footnote or table cell that is somehow contained in the given paragraph. >>>> >>>> I might prepare a synthetic standalone example if anyone is >>>> interested, and furthermore on the condition that interested parties >>>> root for Germany instead of France today. >>>> >>>> Gerrit >>>> >>>> On 04.07.2014 16:43, Geert Bormans geert@xxxxxxxxxxxxxxxxxxx wrote: >>>>> >>>>> Hi all, >>>>> >>>>> Here is a fun one I thought I could share >>>>> >>>>> I have a nicely nested XML (a bit TEI like) >>>>> and markers for page breaks can happen everywhere in the document (as >>>>> empty elements) >>>>> >>>>> Now I want to break the document per page, reconstructing the structure >>>>> So in a first step, I want to isolate the pagebreak to the highest >>>>> level >>>>> >>>>> <book> >>>>> <title>...</title> >>>>> <section> >>>>> <para>aaa<pb/>bbb</para> >>>>> </section> >>>>> </book> >>>>> >>>>> to become >>>>> >>>>> <book> >>>>> <title>...</title> >>>>> <section> >>>>> <para>aaa</para> >>>>> </section> >>>>> <pb/> >>>>> <section> >>>>> <para>bbb</para> >>>>> </section> >>>>> </book> >>>>> >>>>> Bearing in mind I need a generic solution >>>>> and pagebreaks can happen at every level >>>>> >>>>> Any thoughts? >>>>> I am not looking for code, just curious on how people would attack this >>>>> >>>>> Thanks >>>>> >>>>> Geert >>>> >>>> >>>> -- >>>> Gerrit Imsieke >>>> GeschCFCB$ftsfCFCB<hrer / Managing Director >>>> >>>> le-tex publishing services GmbH >>>> Weissenfelser Str. 84, 04229 Leipzig, Germany >>>> Phone +49 341 355356 110, Fax +49 341 355356 510 >>>> gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de >>>> >>>> Registergericht / Commercial Register: Amtsgericht Leipzig >>>> Registernummer / Registration Number: HRB 24930 >>>> >>>> GeschCFCB$ftsfCFCB<hrer: Gerrit Imsieke, Svea Jelonek, >>>> Thomas Schmidt, Dr. Reinhard VCFCB6ckler >> >> >> -- >> Gerrit Imsieke >> GeschCB$ftsfCB<hrer / Managing Director >> le-tex publishing services GmbH >> Weissenfelser Str. 84, 04229 Leipzig, Germany >> Phone +49 341 355356 110, Fax +49 341 355356 510 >> gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de >> >> Registergericht / Commercial Register: Amtsgericht Leipzig >> Registernummer / Registration Number: HRB 24930 >> >> GeschCB$ftsfCB<hrer: Gerrit Imsieke, Svea Jelonek, >> Thomas Schmidt, Dr. Reinhard VCB6ckler >> > -- Wendell Piez | http://www.wendellpiez.com XML | XSLT | electronic publishing Eat Your Vegetables _____oo_________o_o___ooooo____ooooooo_^
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] breaking up XML on page b, Geert Bormans geert@ | Thread | Re: [xsl] breaking up XML on page b, Michael Müller-Hille |
Re: [xsl] breaking up XML on page b, Geert Bormans geert@ | Date | [xsl] New edition of the XML FAQ, Peter Flynn peter@xx |
Month |