Subject: RE: [xsl] Dividing documents based on size of contents From: "Michael Kay" <mike@xxxxxxxxxxxx> Date: Wed, 27 May 2009 09:12:51 +0100 |
I think this is a case for "sibling recursion" - in fact, it's the example I use on training courses, if I think the group is capable of tackling the problem (it tends to cause significant headache, and people are typically amazed how after 3 hours head-scratching, the answer turns out to be about ten lines of code). It's probably easiest to do this in two phases: the first phase copies the documentDivision elements, inserting a <documentBreak/> element where appropriate, and the second phase uses for-each-group starting-with="documentBreak" to create the document elements. The sibling recursion works like this <xsl:template match="documentDivision"> <xsl:param name="size-so-far" as="xs:integer"/> <xsl:variable name="new-size-so-far" as="xs:integer" select="$size-so-far + count(pagebreak)"/> <xsl:variable name="start-new-document" as="xs:boolean" select="$new-size-so-far gt 100"/> <xsl:copy-of select="."/> <xsl:if test="$start-new-document"> <documentBreak/> </xsl:if> <xsl:apply-templates select="following-sibling::documentDivision[1]"> <xsl:with-param name="size-so-far" select="if ($start-new-document) then 0 else $new-size-so-far"/> </xsl:with-param> </xsl:apply-templates> </xsl:template> and then you start the process off with <xsl:template match="document"> <xsl:apply-templates select="documentDivision[1]"/> </xsl:template> Regards, Michael Kay http://www.saxonica.com/ http://twitter.com/michaelhkay > -----Original Message----- > From: Chris von See [mailto:chris@xxxxxxxxxxxxx] > Sent: 27 May 2009 02:54 > To: xsl-list > Subject: [xsl] Dividing documents based on size of contents > > Hi all - > > I have what I think is a fairly simple problem, but I'm > having trouble with the implementation in XSLT. Any help you > could give would be greatly appreciated. > > I have a document which is subdivided into multiple sections, > with each section, in turn, divided into pages as shown below: > > <document> > <documentDivision> > ... arbitrary content ... > <pagebreak /> > ... arbitrary content ... > <pagebreak /> > </documentDivision> > > ... arbitrary number of <documentDivision> elements ... > > </document> > > Each <documentDivision> section of the document can have an > arbitrary number of <pagebreak> elements, and an arbitrary > amount of content between <pagebreak>s. > > I'd like to be able to break the input <document> into > multiple <document>s, each of which has the minimum number of > <documentDivision> sections that give it a <pagebreak> count > ~100 pages. I'd like to break the input at > <documentDivision> boundaries, but I don't need the output > documents to be equally sized or to be exactly 100 pages long > - just as close to that size as I can reasonably get while > maintaining the <documentDivision> boundaries. > > So for example if I have an input document that looks like this: > > <document> > <documentDivision> > ... content containing 50 <pagebreak /> elements ... > </documentDivision> > <documentDivision> > ... content containing 50 <pagebreak /> elements ... > </documentDivision> > <documentDivision> > ... content containing 127 <pagebreak /> elements ... > </documentDivision> > <documentDivision> > ... content containing 5 <pagebreak /> elements ... > </documentDivision> > <documentDivision> > ... content containing 23 <pagebreak /> elements ... > </documentDivision> > <documentDivision> > ... content containing 78 <pagebreak /> elements ... > </documentDivision> > </document> > > the output documents should look like this, with each output > document being "close" to 100 pages in length: > > <!-- This doc has enough <documentDivision> elements to give > exactly 100 pages. --> <document> > <documentDivision> > ... content containing 50 <pagebreak /> elements ... > </documentDivision> > <documentDivision> > ... content containing 50 <pagebreak /> elements ... > </documentDivision> > </document> > > <!-- This doc has a single <documentDivision> element with > 127 pages - close enough! --> <document> > <documentDivision> > ... content containing 127 <pagebreak /> elements ... > </documentDivision> > </document> > > <!-- This doc has a three <documentDivision> elements of 5, > 23 and 78 pages each - close enough! --> <document> > <documentDivision> > ... content containing 5 <pagebreak /> elements ... > </documentDivision> > <documentDivision> > ... content containing 23 <pagebreak /> elements ... > </documentDivision> > <documentDivision> > ... content containing 78 <pagebreak /> elements ... > </documentDivision> > </document> > > I've been able to figure out how to get the number of > <pagebreak>s per <documentDivision> and how to calculate the > number of <pagebreak>s in any given group of > <documentDivision> sections, but what I'm not sure of is how > to maintain information about the point at which I last > created a new output document so that I can determine what > group of <documentDivision> elements has a page count around > 100 and should therefore be used to create a new output > document. It seems that the best way to carry this context > would be via params to xsl;apply- templates, but I'm not > clear on how to set up the XSLT code so that the state gets > maintained as I iterate through <documentDivision> elements. > It also seems like there should be some XPath expression that > I can use with xsl:for-each-group, but I can't quite figure > out how to write that such that each group has only the > minimum number of <documentDivision> elements needed to > accumulate 100-ish pages. > > Do you have any guidance on ways to do this? I think I'm > just having a mental block, and a swift kick in the right > direction should do the trick. > > > Thanks > Chris
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] Dividing documents based , Emmanuel Begue | Thread | Re: [xsl] Dividing documents based , Chris von See |
[xsl] Issues using keys to find dis, Kamlesh Bafna | Date | Re: [xsl] Issues using keys to find, Martin Honnen |
Month |