Subject: Re: [xsl] Dividing documents based on size of contents From: Chris von See <chris@xxxxxxxxxxxxx> Date: Wed, 27 May 2009 12:09:49 -0700 |
Cheers Chris
I think this is a case for "sibling recursion" - in fact, it's the example I
use on training courses, if I think the group is capable of tackling the
problem (it tends to cause significant headache, and people are typically
amazed how after 3 hours head-scratching, the answer turns out to be about
ten lines of code).
It's probably easiest to do this in two phases: the first phase copies the
documentDivision elements, inserting a <documentBreak/> element where
appropriate, and the second phase uses for-each-group
starting-with="documentBreak" to create the document elements.
The sibling recursion works like this
<xsl:template match="documentDivision">
<xsl:param name="size-so-far" as="xs:integer"/>
<xsl:variable name="new-size-so-far" as="xs:integer"
select="$size-so-far + count(pagebreak)"/>
<xsl:variable name="start-new-document" as="xs:boolean"
select="$new-size-so-far gt 100"/>
<xsl:copy-of select="."/>
<xsl:if test="$start-new-document">
<documentBreak/>
</xsl:if>
<xsl:apply-templates select="following- sibling::documentDivision[1]">
<xsl:with-param name="size-so-far"
select="if ($start-new-document) then 0 else $new-size-so- far"/>
</xsl:with-param>
</xsl:apply-templates>
</xsl:template>
and then you start the process off with
<xsl:template match="document"> <xsl:apply-templates select="documentDivision[1]"/> </xsl:template>
Regards,
Michael Kay http://www.saxonica.com/ http://twitter.com/michaelhkay
-----Original Message----- From: Chris von See [mailto:chris@xxxxxxxxxxxxx] Sent: 27 May 2009 02:54 To: xsl-list Subject: [xsl] Dividing documents based on size of contents
Hi all -
I have what I think is a fairly simple problem, but I'm having trouble with the implementation in XSLT. Any help you could give would be greatly appreciated.
I have a document which is subdivided into multiple sections, with each section, in turn, divided into pages as shown below:
<document> <documentDivision> ... arbitrary content ... <pagebreak /> ... arbitrary content ... <pagebreak /> </documentDivision>
... arbitrary number of <documentDivision> elements ...
</document>
Each <documentDivision> section of the document can have an arbitrary number of <pagebreak> elements, and an arbitrary amount of content between <pagebreak>s.
I'd like to be able to break the input <document> into multiple <document>s, each of which has the minimum number of <documentDivision> sections that give it a <pagebreak> count ~100 pages. I'd like to break the input at <documentDivision> boundaries, but I don't need the output documents to be equally sized or to be exactly 100 pages long - just as close to that size as I can reasonably get while maintaining the <documentDivision> boundaries.
So for example if I have an input document that looks like this:
<document> <documentDivision> ... content containing 50 <pagebreak /> elements ... </documentDivision> <documentDivision> ... content containing 50 <pagebreak /> elements ... </documentDivision> <documentDivision> ... content containing 127 <pagebreak /> elements ... </documentDivision> <documentDivision> ... content containing 5 <pagebreak /> elements ... </documentDivision> <documentDivision> ... content containing 23 <pagebreak /> elements ... </documentDivision> <documentDivision> ... content containing 78 <pagebreak /> elements ... </documentDivision> </document>
the output documents should look like this, with each output document being "close" to 100 pages in length:
<!-- This doc has enough <documentDivision> elements to give exactly 100 pages. --> <document> <documentDivision> ... content containing 50 <pagebreak /> elements ... </documentDivision> <documentDivision> ... content containing 50 <pagebreak /> elements ... </documentDivision> </document>
<!-- This doc has a single <documentDivision> element with 127 pages - close enough! --> <document> <documentDivision> ... content containing 127 <pagebreak /> elements ... </documentDivision> </document>
<!-- This doc has a three <documentDivision> elements of 5, 23 and 78 pages each - close enough! --> <document> <documentDivision> ... content containing 5 <pagebreak /> elements ... </documentDivision> <documentDivision> ... content containing 23 <pagebreak /> elements ... </documentDivision> <documentDivision> ... content containing 78 <pagebreak /> elements ... </documentDivision> </document>
I've been able to figure out how to get the number of <pagebreak>s per <documentDivision> and how to calculate the number of <pagebreak>s in any given group of <documentDivision> sections, but what I'm not sure of is how to maintain information about the point at which I last created a new output document so that I can determine what group of <documentDivision> elements has a page count around 100 and should therefore be used to create a new output document. It seems that the best way to carry this context would be via params to xsl;apply- templates, but I'm not clear on how to set up the XSLT code so that the state gets maintained as I iterate through <documentDivision> elements. It also seems like there should be some XPath expression that I can use with xsl:for-each-group, but I can't quite figure out how to write that such that each group has only the minimum number of <documentDivision> elements needed to accumulate 100-ish pages.
Do you have any guidance on ways to do this? I think I'm just having a mental block, and a swift kick in the right direction should do the trick.
Thanks Chris
Chris von See Senior Geek TechAdapt, Inc. 2910 Heights Dr. Bellingham, WA 98226
E: chris@xxxxxxxxxxxxx P: +1 360 223 1514 F: +1 360 544 0112
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] Dividing documents based , Michael Kay | Thread | [xsl] Number check in multiple file, Byomokesh Sahoo |
RE: [xsl] Script blocks in XSLT, Scott Trenda | Date | [xsl] Trouble understanding the Tes, Keith Gilbert |
Month |