|
Subject: Re: [xsl] Seek ways to make my streaming XSLT code run faster (My streaming XSLT program has been running 12 hours and is only a quarter of the way to completion) From: "David Birnbaum djbpitt@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Sun, 10 Aug 2025 18:19:54 -0000 |
Dear All,
OP didnbt mention whether the task is imagined as a one-off or as, say,
a service. If itbs a one-off, front loading the indexing by using an
XQuery database doesnbt sound like an automatic saving with respect to
efficiency (over indexing within XSLT; see below) because the code builds
an index once and then uses it once. At the same time, thinking in terms
of XQuery database indexing keeps the focus on indexing, which can often
pay off (massively) with nested loops. Streaming helps (potential) with
large memory demands, but looping over the same data repeatedly may not
be that sort of task.
Whether XQuery database indexing is more efficient than indexing with
<xsl:key> in the case of a one-off is less clear.
Best,
David
On Aug 10, 2025, at 11:30b/AM, Wendell Piez wapiez@xxxxxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
o;?Hello,
To restate what Liam just said, more blatantly: this is an indexing
problem. Streaming mode is getting in the way. What Liam is
suggesting is a two-step solution where streaming is used to create a
document efficiently that is more efficient to index, and to index
that document. (Right, Liam?)
That can be an effective approach to managing the complexity as well
as the scale, but the basic fact is that this is still indexing. This
is why it could make more sense to load it into an XQuery engine to
benefit from its front-loading of indexing into the XML.
But it could still be done straightforwardly in XSLT. I suggest Roger
think hard about Liam's suggestion and break it down like this:
1. Write XSLT to provide the result you want for a single value (the
'ABC' in the example) using xsl:key - no streaming. (If scale is an
impediment at this point, use a reduced sample and test over that.)
2. Assess efficiency - is there a way to streamline the source data
to make this more efficient and faster, by simplifying the XML source
and hence the XSLT? Design a source data format optimized for step 1.
Demonstrate the improvement with a new XSLT.
3. Once this works, turn to a new/different XSLT that can produce
this optimal source from your current (full-size) data set. This
XSLT might well use streaming. Streaming won't make it faster but it
will reduce memory use so it doesn't bomb out. Your output should be
smaller (maybe much smaller) than your input.
4. Produce this optimal source, test, and return to step 2 as
necessary - except with your new XSLTs, both the 'digester' and the
'indexer'.
5. Scale up to your 1900.
Don't think about iteration, loops, or streams. Just think about how
to make it easy for XPath to see what it is doing at any given point.
The two steps (first, homogenize, then index) could be done in a
single XSLT with phases, or in XProc or the pipelining framework of
your choice.
Of course there's a reasonable chance I am oversimplifying and
getting it wrong - but maybe not by much.
Cheers, Wendell
On Sat, Aug 9, 2025 at 7:32b/PM Liam R. E. Quin
liam@xxxxxxxxxxxxxxxx <xsl-list-service@lists. mulberrytech.com>
wrote:
On Sat, 2025-08-09 at 23:00 +0000, Liam R. E. Quin
liam@xxxxxxxxxxxxxxxx wrote:
> On Sat, 2025-08-09 at 22:25 +0000, Roger L Costello
> costello@xxxxxxxxx
> >
> > I want to iterate over all 1900 identifiers and for each of
them,
> > iterate over all 5 million records to see which records
contain the
> > identifier. There is a loop within a loop:
> >
> > For each 1900 identifiers do
> > For each 5 million records do
> > Check record against identifier
>
> Outside streaming, you could
> apply-templates select="/records/record"
> and then have a template
> match="VOR_identifier[ancestor::record[
> not(Airport_SID_Primary_Records)
> ]
>
> and then process the record in a different mode?
to be clear you can't do that in streaming mode. If the document
is too
large and you need to stream, you could have a template to match
record, and take a grounded snapshot of it, and process that in a
different (non-streaming) mode.
Any time you start thinking in terms of loops i think itbs time
to take
a step back, especially in streaming, and ask if you can use
template
match expressions to do more of the work, and also whether you
can work
back from the result a bit more.
>
> Otherwise yes, XQuery.
>
--
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:
http://www.fromoldbooks.org
--
...Wendell Piez... ...wendell -at- nist -dot- gov...
...wendellpiez.com... ...pellucidliterature.org...
...pausepress.org...
...github.com/wendellpiez... ...gitlab.coko.foundation/wendell...
XSL-List info and archiveEasyUnsubscribe (by email)
XSL-List info and archiveEasyUnsubscribe (by email)
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| Re: [xsl] Seek ways to make my stre, Wendell Piez wapiez@ | Thread | Re: [xsl] Seek ways to make my stre, Liam R. E. Quin liam |
| Re: [xsl] Seek ways to make my stre, Wendell Piez wapiez@ | Date | Re: [xsl] Seek ways to make my stre, Liam R. E. Quin liam |
| Month |