Re: [xsl] Seek ways to make my streaming XSLT code run faster (My streaming XSLT program has been running 12 hours and is only a quarter of the way to completion)

Subject: Re: [xsl] Seek ways to make my streaming XSLT code run faster (My streaming XSLT program has been running 12 hours and is only a quarter of the way to completion)
From: "Wendell Piez wapiez@xxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sun, 10 Aug 2025 15:30:14 -0000
Hello,

To restate what Liam just said, more blatantly: this is an indexing
problem. Streaming mode is getting in the way. What Liam is suggesting is a
two-step solution where streaming is used to create a document efficiently
that is more efficient to index, and to index that document. (Right, Liam?)

That can be an effective approach to managing the complexity as well as the
scale, but the basic fact is that this is still indexing. This is why it
could make more sense to load it into an XQuery engine to benefit from its
front-loading of indexing into the XML.

But it could still be done straightforwardly in XSLT. I suggest Roger think
hard about Liam's suggestion and break it down like this:

1. Write XSLT to provide the result you want for a single value (the 'ABC'
in the example) using xsl:key - no streaming. (If scale is an impediment at
this point, use a reduced sample and test over that.)

2. Assess efficiency -  is there a way to streamline the source data to
make this more efficient and faster, by simplifying the XML source and
hence the XSLT? Design a source data format optimized for step 1.
Demonstrate the improvement with a new XSLT.

3. Once this works, turn to a new/different XSLT that can produce this
optimal source from your current  (full-size) data set. This XSLT might
well use streaming. Streaming won't make it faster but it will reduce
memory use so it doesn't bomb out. Your output should be smaller (maybe
much smaller) than your input.

4. Produce this optimal source, test, and return to step 2 as necessary -
except with your new XSLTs, both the 'digester' and the 'indexer'.

5. Scale up to your 1900.

Don't think about iteration, loops, or streams. Just think about how to
make it easy for XPath to see what it is doing at any given point.

The two steps (first, homogenize, then index) could be done in a single
XSLT with phases, or in XProc or the pipelining framework of your choice.

Of course there's a reasonable chance I am oversimplifying and getting it
wrong - but maybe not by much.

Cheers, Wendell


On Sat, Aug 9, 2025 at 7:32b/PM Liam R. E. Quin liam@xxxxxxxxxxxxxxxx <
xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

> On Sat, 2025-08-09 at 23:00 +0000, Liam R. E. Quin
> liam@xxxxxxxxxxxxxxxx wrote:
> > On Sat, 2025-08-09 at 22:25 +0000, Roger L Costello
> > costello@xxxxxxxxx
> > >
> > > I want to iterate over all 1900 identifiers and for each of them,
> > > iterate over all 5 million records to see which records contain the
> > > identifier. There is a loop within a loop:
> > >
> > > For each 1900 identifiers do
> > >     For each 5 million records do
> > >          Check record against identifier
> >
> > Outside streaming, you could
> >       apply-templates select="/records/record"
> > and then have a template
> >     match="VOR_identifier[ancestor::record[
> >       not(Airport_SID_Primary_Records)
> >     ]
> >
> > and then process the record in a different mode?
>
> to be clear you can't do that in streaming mode. If the document is too
> large and you need to stream, you could have a template to match
> record, and take a grounded snapshot of it, and process that in a
> different (non-streaming) mode.
>
> Any time you start thinking in terms of loops i think itbs time to take
> a step back, especially in streaming, and ask if you can use template
> match expressions to do more of the work, and also whether you can work
> back from the result a bit more.
>
> >
> > Otherwise yes, XQuery.
> >
>
> --
> Liam Quin, https://www.delightfulcomputing.com/
> Available for XML/Document/Information Architecture/XSLT/
> XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
> Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org
>
>
>

--
...Wendell Piez... ...wendell -at- nist -dot- gov...
...wendellpiez.com... ...pellucidliterature.org... ...pausepress.org...
...github.com/wendellpiez... ...gitlab.coko.foundation/wendell...

Current Thread