RE: [xsl] How to mark every 5th output record.

Subject: RE: [xsl] How to mark every 5th output record.
From: "Patrick Bergeron" <pbergeron@xxxxxxxxxxx>
Date: Tue, 11 Mar 2008 11:30:03 -0400
Hello.

Thank you for a long and considerate post. It's true that taming the spec in
any language would be a challenge. It's a myriad of special cases and
exceptions, but it's also unfortunately a standard.

In retrospect it would have been a lot easier to do it in C++, especially
since we have access to the source code of the application that exports the
XML. The choice to use XSLT was chosen as a stress test to validate the XML
schema and to prove to third parties that they could use XSLT to implement
their own file converters.  In other words: "If we can export to *that*
format using XSLT, then our customers can export to any file format".  

Regarding your defense of XSLT, I'm not trying to force xslt to do something
it wasn't designed to do. I'm simply trying to find the path of least
resistance to accomplish that last 0.05% to meet spec compliance.

Patrick Bergeron


-----Original Message-----
From: Wendell Piez [mailto:wapiez@xxxxxxxxxxxxxxxx] 
Sent: Tuesday, March 11, 2008 10:55 AM
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: RE: [xsl] How to mark every 5th output record.

Patrick,

Just because your logic is currently at 2900 lines of code doesn't 
mean it has to be. In fact, if its approach to processing is as 
imperative as what you've suggested you "should" be able to do, 
chances are reasonably good that someone who's familiar and 
comfortable with the XSLT processing model could reduce it radically 
by refactoring.


Nor is pipelining (the term of art for processing your output as 
input) inherently such a bad thing. Indeed, in XSLT 2.0, it can be 
done transparently in one stylesheet. Depending on your architecture 
and implementation, it need not be inefficient.

As Mike said, the details of what you are trying to do are critical. 
For one thing, if your logic is complex, that's an indication that 
the process you are designing involves upconversion. If so, you 
should tell us right off whether you can use XSLT 2.0 or whether you 
are limited to 1.0. XSLT 1.0 wasn't designed for upconversion (its 
general assumption is that the dataset is clean and optimally 
structured and ordered going in, and transformations are geared 
mainly to presentation not data processing), which isn't to say that 
it can't be done. Rather, it's to say that when consulting the 
experts on how to do things, you will constantly hear the refrain 
"It's easier in 2.0".

As you have learned, XSLT is declarative and functional, not 
imperative. Variables are variables in the sense they are in algebra 
-- values defined in relation to other values in a processing context 
-- not just labels for memory registers, which you can reassign at 
will (a dangerous and destructive practice, since this means that any 
bug is at risk of infecting parts of the system far beyond where it 
does its immediate damage). While for you at this moment, this fact 
may present an impediment to using XSLT well, it's still not really a 
problem, as it offers numerous advantages at many layers of the 
system including yours (once you know how to take advantage of it), 
especially as complexity scales up.

I know this is a defense, not a solution. But if your platform 
resources are really so tight, maybe you need something with a 
different processing model than XSLT (maybe a SAX filter or series of 
them, or a Perl or Python script), at least for part of your problem. 
If things are that difficult, there's a reason. Either you are trying 
to use the language for something it wasn't designed for and doesn't 
do well, or you are approaching it wrong. Or both. My guess, from 
your description, is that the specification itself is a monster, and 
that taming it would be difficult in any language.

As far as that goes, in general, there's filtering, grouping and 
sorting. Sometimes any or all of these require additional processing 
to determine criteria for them. Also, sometimes sorting has to happen 
before grouping (that is, logically prior if not necessarily 
temporally), sometimes after -- that is, both are reordering or 
rearranging operations (as is filtering, strictly speaking).

In my experience, the sequence (1) data analysis followed by (2) 
filtering followed by (3) reordering has made sense. Often (1) and 
(2) can be collapsed. If (1) is done well, usually (3) can be done in 
one pass. Your requirement is tricky because you want grouping to 
occur after filtering and sorting, which is often (though not always) 
impractical in one pass.

As Mike indicated before, XSLT 2.0 provides features that make 
necessary facilities for (1) (in the general case) available during 
later operations, which frequently reduces the need for pipelining 
since analysis can be done on the fly. On the other hand, when you 
need to pipeline, XSLT 2.0 makes that easier too.

Cheers,
Wendell

At 10:03 AM 3/11/2008, you wrote:
>As I said the rules under which I process my list are quite complex. So
much
>so that my XSLT stylesheet is over 2900 lines of code (and yes, that's just
>nuts).
>
>Different records (and types of records) are processed using different
>rules, other records are deferred for later processing, others merged
>together to produce a final one, some are skipped altogether, some complex
>operations are performed on yet another set of records, etc. The output
file
>format is crazy, and the spec for the file format is about as obscure and
>obtuse as I have ever seen in 20 years programming.
>
>But in the end, I end up with a text file that has 1 line per "output
>record", but these "output records" have almost nothing to do with the
input
>records, and I need to separate them with a marker every 5th.
>
>I can't really do  (position() mod 5) on my original input data because it
>has no correlation to the order of the output records, and it's impossible
>to create an expression that would select them properly in the order I
need.
>
>Is my only option to create another tree that contains all of my output
>record results, and then iterate over that tree once again, and putput the
>same data verbatim, only this time insert a marker every 5th?
>
>Gheesh, talk about using a tank to shoot a bird.
>
>I'm trying to avoid doing this for other reasons:
>
>1) My input data set is quite large.
>2) The XSLT processor is running on an embedded platform with limited
>memory.
>3) I'm already paying the price of doing a copy of the data in an earlier
>pass, I'd like to not pay the price again.
>
>Is there really, really, really _any_ other way of doing this without
making
>a 3rd copy of my data set?


======================================================================
Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
   Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================

Current Thread