Re: Printing XML + XSLT (2nd try)

Subject: Re: Printing XML + XSLT (2nd try)
From: Francis Norton <francis@xxxxxxxxxxx>
Date: Tue, 08 Feb 2000 05:28:37 +0000
Nikolai Grigoriev wrote:
> 
...
> 
> I am *really* interested in it, and have some ideas on how to express it
> in XSL FO; but XSL FO have become kinda offtopic here ;-). My idea
> is to implement something like DSSSL indirect sosofos using XPath
> to address formatting objects on the current/preceding/following pages.
> The current version of the XSL FO permits roles on elements; integrating
> XPath into FO has thus become relatively straightforward... However,
> if no one is interested in discussing it publicly, we can continue the
> thread
> privately.
> 
Right, that's three ayes and no nayes, so <deepBreath /> here goes...

First, I should say that I don't know what is possible with FOs - I
really haven't been looking at them (though it may be time to start!).

>From the top: -

Take a requirement for n-level reporting, where data may be grouped in
each level and printed with header, footer, carried-forward and
brought-forward sections, and each of these sections may have different
lengths from other sections  in the same or any other level. An example
might be a sales report of titles grouped by author grouped by
publisher. The report as a whole has a header, a footer, and any running
totals will be printed out before and after each intermediate page break
(ie carried-forward and brought-forward). Likewise for each publisher
and author, though since the title doesn't contain any aggregate
information it doesn't normally need anything other than a header
section.

The two main challenges are telling when a page-break is necessary, and
printing out the right stuff once a page breaks a group.

Let's take page-break detection. Obviously we don't want to print more
than there is room for. But if are about to print an item at level n
then we need to have room for the level n-1 .. level 0 carried-forward
sections.

In other words we shoudn't print another title if that means we won't
have room for the author and publisher carried-forward sections.

Basically you have to use look-ahead:

	rule 1:	if you don't have room to page-break after the next item,
page-break before it. 

So you need the ability to measure the length of hypothetical
page-breaks. This is easy for fixed-length sections and fixed width
fonts. But if you have variable length report sections (that is, not
variable because of the varying number of items in the next lower level,
but because maybe there's free text in there) you then need to be able
to estimate how long an individual section will take on the page. This
is a problem in some environments because either they don't support that
feature at all, or the only way you get the result is to actually print
something out and then see where the cursor is. (And I don't know what
the answer is for FOs...)

Since we may need to do a page break at any time it becomes much simpler
to use an event-oriented approach to coding this rather than a nested or
recursive loop approach. (Trust me on this - I wasted enough time trying
to get the nested loop approach to work...)

I found that the event based approach became much simpler if I
de-normalised the data, so instead of viewing the data hierarchically,
like this (missing out the non-key properties for simplicity) -

	publisher: Faber and Faber
		|
		+- author: Ted Hughes
		|	|
		|	+- title: Tales from Ovid
		|	|
		|	+- title: Birthday Letters
		|
		+- author: Seamus Heaney
		|	|
		|	+- title: Beowulf
		|
		+- author: Wendy Cope
		|	|
		|	+- title: Making cocoa for Kingsley Amis
	...

you represent it as a single table, like this -

	publisher:	author:		title:
	Faber and Faber	Ted Hughes	Tales from Ovid
	Faber and Faber	Ted Hughes	Birthday Letters
	Faber and Faber	Seamus Heaney	Beowulf
	Faber and Faber	Wendy Cope	Making cocoa for Kingsley Amis
...

Then add in grouped information columns such as the running total of
sales by title, author and publisher and you have a data model that will
make your life *much* simpler. (Needless to say - in an XML group - this
is an abstract data model to be implemented with any convenient
combination of data structures or methods.)

	rule 2: de-normalise the data so that you have a simple data stream of
actual and calculated data

OK, now it's getting a bit like writing an event-based parser, which is
a powerful model.

Instead of lexical analysis into symbol events, we step through the
de-normalised data stream, and analyse it into end-of-page or
end-of-group events. From these events we can simply work out what
section or sections to print next, with all the section data coming from
the current row of the data stream.

	rule 3: process the data stream into page- and group-boundary events,
then map these events into section-printing rules

The data flow is something like this (with data in square brackets and
processes in round ones):


                   [raw data set]
                         |
                         v
           (data grouper / de-normaliser)
                         |
                         v
          [data stream with running totals]
                         |
                         v
      (page and group boundary event analyser)
                |                  ^
                v                  |
    [current-row based events]  [paper position]
                |                  ^
                v                  |
              (section print routines)


That's the brain dump on report printing. If this makes sense to anyone
and is useful I'd enjoy putting it into practice in an XML context. If
it doesn't make sense but looks like it might be useful, let me know and
I'll try to clarify my ramblings.

Francis.


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread