Re: [xsl] Seek ways to make my streaming XSLT code run faster (My streaming XSLT program has been running 12 hours and is only a quarter of the way to completion)

Subject: Re: [xsl] Seek ways to make my streaming XSLT code run faster (My streaming XSLT program has been running 12 hours and is only a quarter of the way to completion)
From: "Sheila Thomson coder@xxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sat, 9 Aug 2025 22:47:52 -0000
Might it be quicker to load this document into an XML dB and use XQuery?  Is
that an option?

Sheila

On 9 August 2025 23:39:41 BST, "Martin Honnen martin.honnen@xxxxxx"
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
>On 10/08/2025 00:25, Roger L Costello costello@xxxxxxxxx wrote:
>> Hi Folks,
>>
>> My XML document consists of 5 million <record> elements:
>>
>> <records>
>>        <record>...</record>
>>        <record>...</record>
>> </records>
>>
>> Each <record> element has a child element that indicates the type of
(aviation) data in the record:
>>
>> <records>
>>      <record>
>>          <VHF_NAVAID_Primary_Records>...</VHF_NAVAID_Primary_Records>
>>      </record>
>>      <record>
>>          <Airport_SID_Primary_Records>...</Airport_SID_Primary_Records>
>>      </record>
>> </records>
>>
>> Each of the child elements contain elements appropriate to its type:
>>
>> <records>
>>      <record>
>>          <VHF_NAVAID_Primary_Records>
>>              <VOR_Identifier>ABC </VOR_Identifier>
>>              <DME_Ident>AND </DME_Ident>
>>          </VHF_NAVAID_Primary_Records>
>>      </record>
>>      <record>
>>          <Airport_SID_Primary_Records>
>>              <SID_Identifier>ABC </SID_Identifier>
>>          </Airport_SID_Primary_Records>
>>      </record>
>> </records>
>>
>> I want to find all <record> elements whose child is not an
<Airport_SID_Primary_Records> element and whose child element contains an
identifier element with value "ABC ." Here's the output I desire:
>>
>> <results>
>>      <result>
>>          <identifier>ABC </identifier>
>>          <record>VHF_NAVAID_Primary_Records</record>
>>          <field>
>>              <VOR_Identifier>ABC </VOR_Identifier>
>>          </field>
>>      </result>
>> </results>
>>
>> Identifier "ABC " is just one of 1900 identifiers. These identifiers are
stored in an XML file, identifiers.xml
>>
>> <identifiers>
>>     <identifier>ABC </identifier>
>>     <identifier>DEF </identifier>
>> </identifiers>
>>
>> I want to iterate over all 1900 identifiers and for each of them, iterate
over all 5 million records to see which records contain the identifier. There
is a loop within a loop:
>>
>> For each 1900 identifiers do
>>      For each 5 million records do
>>           Check record against identifier
>>
>> I am using streaming XSLT to accomplish this task.
>>
>> My streaming program has been running 12 hours and it has only processed a
quarter of the identifiers. I'd like to see if you have suggestions on ways to
speed up my streaming program. I am thinking that this part of my program is
probably slow:
>>
>> <xsl:for-each select="*[name(.) ne 'Airport_SID_Primary_Records']
>>      		[name(.) ne 'Airport_STAR_Primary_Records']
>>      		[name(.) ne 'Airport_Approach_Primary_Records']
>>     		[ends-with(name(.),'Primary_Records')]">
>>      <xsl:for-each select="*[(. eq $identifier) and (name(.) ne
'Recommended_Navaid')]">
>>          <result>
>>              <identifier><xsl:value-of select="$identifier"/></identifier>
>>              <record><xsl:value-of select="name(..)"/></record>
>>              <field><xsl:sequence select="."/></field>
>>          </result>
>>      </xsl:for-each>
>> </xsl:for-each>
>>
>> That code is for processing a <record> element. The code checks that the
<record> element's child element is not an <Airport_SID_Primary_Records>
element, not an <Airport_STAR_Primary_Records> element, not an
<Airport_Approach_Primary_Records> element, and its element name ends with
"Primary_Records". If the <record> element's child element satisfies all those
criteria, then the code iterates over all the elements inside the <record>
element's child element that contain a value matching the identifier and with
an element name not equal to "Recommended_Navaid."
>>
>> Is there a way to rewrite the code to make it execute faster?
>>
>> Here is my complete program:
>>
>> <xsl:stylesheet 	xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
>>      		xmlns:xs="http://www.w3.org/2001/XMLSchema";
>>      		exclude-result-prefixes="#all"
>>      		version="3.0">
>>           <xsl:output method="xml" />
>>           <xsl:variable name="identifiers"
select="doc('identifiers.xml')/*"/>
>>           <xsl:template name="main">
>>          <results>
>>              <xsl:for-each select="$identifiers/*">
>>                  <xsl:variable name="identifier" select="."
as="xs:string"/>
>>                  <xsl:source-document href="records.xml" streamable="yes">
>
>
>So that approach processes the records.xml 1900 times with streaming.
>
>Is the order of the resulting elements important? Otherwise you could stream
once and check all your identifiers for each record, as I tried to indicate in
my first answer:
>
>
><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
>B  B  xmlns:xs="http://www.w3.org/2001/XMLSchema";
>B  B  exclude-result-prefixes="#all"
>B  B  version="3.0">
>
>B  B  <xsl:output method="xml" />
>
>B  B  <xsl:variable name="identifiers" select="doc('idents.xml')/*"/>
>
>B  B  <xsl:template name="main">
>B  B  B  B  <xsl:source-document href="records.xml"
>streamable="yes">
>B  B  B  B  B  B  <results>
>
>B  B  B  B  B  B  B  B  B  B  <xsl:for-each select="records/record">
>B  B  B  B  B  B  B  B  B  B  B  B  <xsl:variable name="record"
select="copy-of(.)"/>
>B  B  B  B  B  B  B  B  B  B  B  B  <xsl:for-each
select="$identifiers/identifier">
>B  B  B  B  B  B  B  B  B  B  B  B  B  B <xsl:variable name="ident"
select="."/>
>B  B  B  B  B  B  B  B  B  B  B  B  B  B ...B  (use $ident to process
$record)
>B  B  B  B  B  B  B  B  B  B  B  B  </xsl:for-each>
>B  B  B  B  B  B  B  B  B  B  </xsl:for-each>
>B  B  B  B  B  B  B  B  </xsl:for-each>
>B  B  B  B  B  B  </results>
>B  B  B  B  </xsl:source-document>
>B  B  </xsl:template>
>
></xsl:stylesheet>
>
>
>That should give you the same elements as your intent, but only streaming
once through the 5 millions records. The order of elements in the result will
be different perhaps, not sure whether it matters.

Current Thread