Subject: Re: [xsl] Seek ways to make my streaming XSLT code run faster (My streaming XSLT program has been running 12 hours and is only a quarter of the way to completion) From: "Wendell Piez wapiez@xxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Sun, 10 Aug 2025 15:30:14 -0000 |
Hello, To restate what Liam just said, more blatantly: this is an indexing problem. Streaming mode is getting in the way. What Liam is suggesting is a two-step solution where streaming is used to create a document efficiently that is more efficient to index, and to index that document. (Right, Liam?) That can be an effective approach to managing the complexity as well as the scale, but the basic fact is that this is still indexing. This is why it could make more sense to load it into an XQuery engine to benefit from its front-loading of indexing into the XML. But it could still be done straightforwardly in XSLT. I suggest Roger think hard about Liam's suggestion and break it down like this: 1. Write XSLT to provide the result you want for a single value (the 'ABC' in the example) using xsl:key - no streaming. (If scale is an impediment at this point, use a reduced sample and test over that.) 2. Assess efficiency - is there a way to streamline the source data to make this more efficient and faster, by simplifying the XML source and hence the XSLT? Design a source data format optimized for step 1. Demonstrate the improvement with a new XSLT. 3. Once this works, turn to a new/different XSLT that can produce this optimal source from your current (full-size) data set. This XSLT might well use streaming. Streaming won't make it faster but it will reduce memory use so it doesn't bomb out. Your output should be smaller (maybe much smaller) than your input. 4. Produce this optimal source, test, and return to step 2 as necessary - except with your new XSLTs, both the 'digester' and the 'indexer'. 5. Scale up to your 1900. Don't think about iteration, loops, or streams. Just think about how to make it easy for XPath to see what it is doing at any given point. The two steps (first, homogenize, then index) could be done in a single XSLT with phases, or in XProc or the pipelining framework of your choice. Of course there's a reasonable chance I am oversimplifying and getting it wrong - but maybe not by much. Cheers, Wendell On Sat, Aug 9, 2025 at 7:32b/PM Liam R. E. Quin liam@xxxxxxxxxxxxxxxx < xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > On Sat, 2025-08-09 at 23:00 +0000, Liam R. E. Quin > liam@xxxxxxxxxxxxxxxx wrote: > > On Sat, 2025-08-09 at 22:25 +0000, Roger L Costello > > costello@xxxxxxxxx > > > > > > I want to iterate over all 1900 identifiers and for each of them, > > > iterate over all 5 million records to see which records contain the > > > identifier. There is a loop within a loop: > > > > > > For each 1900 identifiers do > > > For each 5 million records do > > > Check record against identifier > > > > Outside streaming, you could > > apply-templates select="/records/record" > > and then have a template > > match="VOR_identifier[ancestor::record[ > > not(Airport_SID_Primary_Records) > > ] > > > > and then process the record in a different mode? > > to be clear you can't do that in streaming mode. If the document is too > large and you need to stream, you could have a template to match > record, and take a grounded snapshot of it, and process that in a > different (non-streaming) mode. > > Any time you start thinking in terms of loops i think itbs time to take > a step back, especially in streaming, and ask if you can use template > match expressions to do more of the work, and also whether you can work > back from the result a bit more. > > > > > Otherwise yes, XQuery. > > > > -- > Liam Quin, https://www.delightfulcomputing.com/ > Available for XML/Document/Information Architecture/XSLT/ > XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. > Barefoot Web-slave, antique illustrations: http://www.fromoldbooks.org > > > -- ...Wendell Piez... ...wendell -at- nist -dot- gov... ...wendellpiez.com... ...pellucidliterature.org... ...pausepress.org... ...github.com/wendellpiez... ...gitlab.coko.foundation/wendell...
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Seek ways to make my stre, Liam R. E. Quin liam | Thread | Re: [xsl] Seek ways to make my stre, David Birnbaum djbpi |
Re: [xsl] Seek ways to make my stre, Liam R. E. Quin liam | Date | Re: [xsl] Seek ways to make my stre, David Birnbaum djbpi |
Month |