Re: [xsl] Seek ways to make my streaming XSLT code run faster (My streaming XSLT program has been running 12 hours and is only a quarter of the way to completion)

Subject: Re: [xsl] Seek ways to make my streaming XSLT code run faster (My streaming XSLT program has been running 12 hours and is only a quarter of the way to completion)
From: "Martin Honnen martin.honnen@xxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sat, 9 Aug 2025 22:39:39 -0000
On 10/08/2025 00:25, Roger L Costello costello@xxxxxxxxx wrote:
Hi Folks,

My XML document consists of 5 million <record> elements:

<records>
       <record>...</record>
       <record>...</record>
</records>

Each <record> element has a child element that indicates the type of
(aviation) data in the record:

<records> <record> <VHF_NAVAID_Primary_Records>...</VHF_NAVAID_Primary_Records> </record> <record> <Airport_SID_Primary_Records>...</Airport_SID_Primary_Records> </record> </records>

Each of the child elements contain elements appropriate to its type:

<records>
     <record>
         <VHF_NAVAID_Primary_Records>
             <VOR_Identifier>ABC </VOR_Identifier>
             <DME_Ident>AND </DME_Ident>
         </VHF_NAVAID_Primary_Records>
     </record>
     <record>
         <Airport_SID_Primary_Records>
             <SID_Identifier>ABC </SID_Identifier>
         </Airport_SID_Primary_Records>
     </record>
</records>

I want to find all <record> elements whose child is not an
<Airport_SID_Primary_Records> element and whose child element contains an
identifier element with value "ABC ." Here's the output I desire:

<results> <result> <identifier>ABC </identifier> <record>VHF_NAVAID_Primary_Records</record> <field> <VOR_Identifier>ABC </VOR_Identifier> </field> </result> </results>

Identifier "ABC " is just one of 1900 identifiers. These identifiers are
stored in an XML file, identifiers.xml

<identifiers> <identifier>ABC </identifier> <identifier>DEF </identifier> </identifiers>

I want to iterate over all 1900 identifiers and for each of them, iterate
over all 5 million records to see which records contain the identifier. There
is a loop within a loop:

For each 1900 identifiers do For each 5 million records do Check record against identifier

I am using streaming XSLT to accomplish this task.

My streaming program has been running 12 hours and it has only processed a
quarter of the identifiers. I'd like to see if you have suggestions on ways to
speed up my streaming program. I am thinking that this part of my program is
probably slow:

<xsl:for-each select="*[name(.) ne 'Airport_SID_Primary_Records'] [name(.) ne 'Airport_STAR_Primary_Records'] [name(.) ne 'Airport_Approach_Primary_Records'] [ends-with(name(.),'Primary_Records')]"> <xsl:for-each select="*[(. eq $identifier) and (name(.) ne
'Recommended_Navaid')]">
         <result>
             <identifier><xsl:value-of select="$identifier"/></identifier>
             <record><xsl:value-of select="name(..)"/></record>
             <field><xsl:sequence select="."/></field>
         </result>
     </xsl:for-each>
</xsl:for-each>

That code is for processing a <record> element. The code checks that the
<record> element's child element is not an <Airport_SID_Primary_Records>
element, not an <Airport_STAR_Primary_Records> element, not an
<Airport_Approach_Primary_Records> element, and its element name ends with
"Primary_Records". If the <record> element's child element satisfies all those
criteria, then the code iterates over all the elements inside the <record>
element's child element that contain a value matching the identifier and with
an element name not equal to "Recommended_Navaid."

Is there a way to rewrite the code to make it execute faster?


Here is my complete program:

<xsl:stylesheet 	xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
     		xmlns:xs="http://www.w3.org/2001/XMLSchema";
     		exclude-result-prefixes="#all"
     		version="3.0">

<xsl:output method="xml" />

<xsl:variable name="identifiers" select="doc('identifiers.xml')/*"/>

     <xsl:template name="main">
         <results>
             <xsl:for-each select="$identifiers/*">
                 <xsl:variable name="identifier" select="."
as="xs:string"/>
<xsl:source-document href="records.xml" streamable="yes">


So that approach processes the records.xml 1900 times with streaming.

Is the order of the resulting elements important? Otherwise you could
stream once and check all your identifiers for each record, as I tried
to indicate in my first answer:


<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; B B xmlns:xs="http://www.w3.org/2001/XMLSchema"; B B exclude-result-prefixes="#all" B B version="3.0">

B B <xsl:output method="xml" />

B B <xsl:variable name="identifiers" select="doc('idents.xml')/*"/>

B  B  <xsl:template name="main">
B  B  B  B  <xsl:source-document href="records.xml"
streamable="yes">
B  B  B  B  B  B  <results>

B  B  B  B  B  B  B  B  B  B  <xsl:for-each select="records/record">
B  B  B  B  B  B  B  B  B  B  B  B  <xsl:variable name="record"
select="copy-of(.)"/>
B  B  B  B  B  B  B  B  B  B  B  B  <xsl:for-each
select="$identifiers/identifier">
B  B  B  B  B  B  B  B  B  B  B  B  B  B <xsl:variable name="ident"
select="."/>
B  B  B  B  B  B  B  B  B  B  B  B  B  B ...B  (use $ident to process
$record)
B  B  B  B  B  B  B  B  B  B  B  B  </xsl:for-each>
B  B  B  B  B  B  B  B  B  B  </xsl:for-each>
B  B  B  B  B  B  B  B  </xsl:for-each>
B  B  B  B  B  B  </results>
B  B  B  B  </xsl:source-document>
B  B  </xsl:template>

</xsl:stylesheet>


That should give you the same elements as your intent, but only streaming once through the 5 millions records. The order of elements in the result will be different perhaps, not sure whether it matters.

Current Thread