[xsl] Seek ways to make my streaming XSLT code run faster (My streaming XSLT program has been running 12 hours and is only a quarter of the way to completion)

Subject: [xsl] Seek ways to make my streaming XSLT code run faster (My streaming XSLT program has been running 12 hours and is only a quarter of the way to completion)
From: "Roger L Costello costello@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sat, 9 Aug 2025 22:25:27 -0000
Hi Folks,

My XML document consists of 5 million <record> elements:

<records>
      <record>...</record>
      <record>...</record>
</records>

Each <record> element has a child element that indicates the type of
(aviation) data in the record:

<records>
    <record>
        <VHF_NAVAID_Primary_Records>...</VHF_NAVAID_Primary_Records>
    </record>
    <record>
        <Airport_SID_Primary_Records>...</Airport_SID_Primary_Records>
    </record>
</records>

Each of the child elements contain elements appropriate to its type:

<records>
    <record>
        <VHF_NAVAID_Primary_Records>
            <VOR_Identifier>ABC </VOR_Identifier>
            <DME_Ident>AND </DME_Ident>
        </VHF_NAVAID_Primary_Records>
    </record>
    <record>
        <Airport_SID_Primary_Records>
            <SID_Identifier>ABC </SID_Identifier>
        </Airport_SID_Primary_Records>
    </record>
</records>

I want to find all <record> elements whose child is not an
<Airport_SID_Primary_Records> element and whose child element contains an
identifier element with value "ABC ." Here's the output I desire:

<results>
    <result>
        <identifier>ABC </identifier>
        <record>VHF_NAVAID_Primary_Records</record>
        <field>
            <VOR_Identifier>ABC </VOR_Identifier>
        </field>
    </result>
</results>

Identifier "ABC " is just one of 1900 identifiers. These identifiers are
stored in an XML file, identifiers.xml

<identifiers>
   <identifier>ABC </identifier>
   <identifier>DEF </identifier>
</identifiers>

I want to iterate over all 1900 identifiers and for each of them, iterate over
all 5 million records to see which records contain the identifier. There is a
loop within a loop:

For each 1900 identifiers do
    For each 5 million records do
         Check record against identifier

I am using streaming XSLT to accomplish this task.

My streaming program has been running 12 hours and it has only processed a
quarter of the identifiers. I'd like to see if you have suggestions on ways to
speed up my streaming program. I am thinking that this part of my program is
probably slow:

<xsl:for-each select="*[name(.) ne 'Airport_SID_Primary_Records']
    		[name(.) ne 'Airport_STAR_Primary_Records']
    		[name(.) ne 'Airport_Approach_Primary_Records']
   		[ends-with(name(.),'Primary_Records')]">
    <xsl:for-each select="*[(. eq $identifier) and (name(.) ne
'Recommended_Navaid')]">
        <result>
            <identifier><xsl:value-of select="$identifier"/></identifier>
            <record><xsl:value-of select="name(..)"/></record>
            <field><xsl:sequence select="."/></field>
        </result>
    </xsl:for-each>
</xsl:for-each>

That code is for processing a <record> element. The code checks that the
<record> element's child element is not an <Airport_SID_Primary_Records>
element, not an <Airport_STAR_Primary_Records> element, not an
<Airport_Approach_Primary_Records> element, and its element name ends with
"Primary_Records". If the <record> element's child element satisfies all those
criteria, then the code iterates over all the elements inside the <record>
element's child element that contain a value matching the identifier and with
an element name not equal to "Recommended_Navaid."

Is there a way to rewrite the code to make it execute faster?

Here is my complete program:

<xsl:stylesheet 	xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
    		xmlns:xs="http://www.w3.org/2001/XMLSchema";
    		exclude-result-prefixes="#all"
    		version="3.0">

    <xsl:output method="xml" />

    <xsl:variable name="identifiers" select="doc('identifiers.xml')/*"/>

    <xsl:template name="main">
        <results>
            <xsl:for-each select="$identifiers/*">
                <xsl:variable name="identifier" select="." as="xs:string"/>
                <xsl:source-document href="records.xml" streamable="yes">
                    <xsl:for-each select="/*/record/copy-of(.)">
                        <xsl:for-each select="*[name(.) ne
'Airport_SID_Primary_Records']
                            			[name(.) ne 'Airport_STAR_Primary_Records']
                            			[name(.) ne
'Airport_Approach_Primary_Records']
                            			[ends-with(name(.),'Primary_Records')]">
                            <xsl:for-each select="*[(. eq $identifier) and
(name(.) ne 'Recommended_Navaid')]">
                                <result>
                                    <identifier><xsl:value-of
select="$identifier"/></identifier>
                                    <record><xsl:value-of
select="name(..)"/></record>
                                    <field><xsl:sequence select="."/></field>
                                </result>
                            </xsl:for-each>
                        </xsl:for-each>
                    </xsl:for-each>
                </xsl:source-document>
            </xsl:for-each>
        </results>
    </xsl:template>

</xsl:stylesheet>

Here is (an abbreviated) identifiers.xml document:

<identifiers>
   <identifier>ABC </identifier>
   <identifier>DEF </identifier>
</identifiers>

Here is my (Windows) .bat file:

java -classpath %CLASSPATH% net.sf.saxon.Transform -t -it:main -xsl:test.xsl
-o:out.xml

Current Thread