Subject: Re: [xsl] Seek ways to make my streaming XSLT code run faster (My streaming XSLT program has been running 12 hours and is only a quarter of the way to completion) From: "Martin Honnen martin.honnen@xxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Sat, 9 Aug 2025 22:39:39 -0000 |
Hi Folks,(aviation) data in the record:
My XML document consists of 5 million <record> elements:
<records> <record>...</record> <record>...</record> </records>
Each <record> element has a child element that indicates the type of
<records> <record> <VHF_NAVAID_Primary_Records>...</VHF_NAVAID_Primary_Records> </record> <record> <Airport_SID_Primary_Records>...</Airport_SID_Primary_Records> </record> </records>
Each of the child elements contain elements appropriate to its type:
<records> <record> <VHF_NAVAID_Primary_Records> <VOR_Identifier>ABC </VOR_Identifier> <DME_Ident>AND </DME_Ident> </VHF_NAVAID_Primary_Records> </record> <record> <Airport_SID_Primary_Records> <SID_Identifier>ABC </SID_Identifier> </Airport_SID_Primary_Records> </record> </records>
I want to find all <record> elements whose child is not an
<Airport_SID_Primary_Records> element and whose child element contains an identifier element with value "ABC ." Here's the output I desire:
stored in an XML file, identifiers.xml
<results> <result> <identifier>ABC </identifier> <record>VHF_NAVAID_Primary_Records</record> <field> <VOR_Identifier>ABC </VOR_Identifier> </field> </result> </results>
Identifier "ABC " is just one of 1900 identifiers. These identifiers are
<identifiers> <identifier>ABC </identifier> <identifier>DEF </identifier> </identifiers>
I want to iterate over all 1900 identifiers and for each of them, iterate
over all 5 million records to see which records contain the identifier. There is a loop within a loop:
For each 1900 identifiers do For each 5 million records do Check record against identifier
I am using streaming XSLT to accomplish this task.
My streaming program has been running 12 hours and it has only processed a
quarter of the identifiers. I'd like to see if you have suggestions on ways to speed up my streaming program. I am thinking that this part of my program is probably slow:
'Recommended_Navaid')]">
<xsl:for-each select="*[name(.) ne 'Airport_SID_Primary_Records'] [name(.) ne 'Airport_STAR_Primary_Records'] [name(.) ne 'Airport_Approach_Primary_Records'] [ends-with(name(.),'Primary_Records')]"> <xsl:for-each select="*[(. eq $identifier) and (name(.) ne
<result> <identifier><xsl:value-of select="$identifier"/></identifier> <record><xsl:value-of select="name(..)"/></record> <field><xsl:sequence select="."/></field> </result> </xsl:for-each> </xsl:for-each>
That code is for processing a <record> element. The code checks that the
<record> element's child element is not an <Airport_SID_Primary_Records> element, not an <Airport_STAR_Primary_Records> element, not an <Airport_Approach_Primary_Records> element, and its element name ends with "Primary_Records". If the <record> element's child element satisfies all those criteria, then the code iterates over all the elements inside the <record> element's child element that contain a value matching the identifier and with an element name not equal to "Recommended_Navaid."
as="xs:string"/>
Is there a way to rewrite the code to make it execute faster?
Here is my complete program:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="#all" version="3.0">
<xsl:output method="xml" />
<xsl:variable name="identifiers" select="doc('identifiers.xml')/*"/>
<xsl:template name="main"> <results> <xsl:for-each select="$identifiers/*"> <xsl:variable name="identifier" select="."
<xsl:source-document href="records.xml" streamable="yes">
Is the order of the resulting elements important? Otherwise you could stream once and check all your identifiers for each record, as I tried to indicate in my first answer:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" B B xmlns:xs="http://www.w3.org/2001/XMLSchema" B B exclude-result-prefixes="#all" B B version="3.0">
B B <xsl:template name="main"> B B B B <xsl:source-document href="records.xml" streamable="yes"> B B B B B B <results>
B B B B B B B B B B <xsl:for-each select="records/record"> B B B B B B B B B B B B <xsl:variable name="record" select="copy-of(.)"/> B B B B B B B B B B B B <xsl:for-each select="$identifiers/identifier"> B B B B B B B B B B B B B B <xsl:variable name="ident" select="."/> B B B B B B B B B B B B B B ...B (use $ident to process $record) B B B B B B B B B B B B </xsl:for-each> B B B B B B B B B B </xsl:for-each> B B B B B B B B </xsl:for-each> B B B B B B </results> B B B B </xsl:source-document> B B </xsl:template>
That should give you the same elements as your intent, but only streaming once through the 5 millions records. The order of elements in the result will be different perhaps, not sure whether it matters.
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] Seek ways to make my streamin, Roger L Costello cos | Thread | Re: [xsl] Seek ways to make my stre, Sheila Thomson coder |
[xsl] Seek ways to make my streamin, Roger L Costello cos | Date | Re: [xsl] Seek ways to make my stre, Sheila Thomson coder |
Month |