Subject: Re: [xsl] Seek ways to make my streaming XSLT code run faster (My streaming XSLT program has been running 12 hours and is only a quarter of the way to completion) From: "Sheila Thomson coder@xxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Sat, 9 Aug 2025 22:47:52 -0000 |
Might it be quicker to load this document into an XML dB and use XQuery? Is that an option? Sheila On 9 August 2025 23:39:41 BST, "Martin Honnen martin.honnen@xxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > >On 10/08/2025 00:25, Roger L Costello costello@xxxxxxxxx wrote: >> Hi Folks, >> >> My XML document consists of 5 million <record> elements: >> >> <records> >> <record>...</record> >> <record>...</record> >> </records> >> >> Each <record> element has a child element that indicates the type of (aviation) data in the record: >> >> <records> >> <record> >> <VHF_NAVAID_Primary_Records>...</VHF_NAVAID_Primary_Records> >> </record> >> <record> >> <Airport_SID_Primary_Records>...</Airport_SID_Primary_Records> >> </record> >> </records> >> >> Each of the child elements contain elements appropriate to its type: >> >> <records> >> <record> >> <VHF_NAVAID_Primary_Records> >> <VOR_Identifier>ABC </VOR_Identifier> >> <DME_Ident>AND </DME_Ident> >> </VHF_NAVAID_Primary_Records> >> </record> >> <record> >> <Airport_SID_Primary_Records> >> <SID_Identifier>ABC </SID_Identifier> >> </Airport_SID_Primary_Records> >> </record> >> </records> >> >> I want to find all <record> elements whose child is not an <Airport_SID_Primary_Records> element and whose child element contains an identifier element with value "ABC ." Here's the output I desire: >> >> <results> >> <result> >> <identifier>ABC </identifier> >> <record>VHF_NAVAID_Primary_Records</record> >> <field> >> <VOR_Identifier>ABC </VOR_Identifier> >> </field> >> </result> >> </results> >> >> Identifier "ABC " is just one of 1900 identifiers. These identifiers are stored in an XML file, identifiers.xml >> >> <identifiers> >> <identifier>ABC </identifier> >> <identifier>DEF </identifier> >> </identifiers> >> >> I want to iterate over all 1900 identifiers and for each of them, iterate over all 5 million records to see which records contain the identifier. There is a loop within a loop: >> >> For each 1900 identifiers do >> For each 5 million records do >> Check record against identifier >> >> I am using streaming XSLT to accomplish this task. >> >> My streaming program has been running 12 hours and it has only processed a quarter of the identifiers. I'd like to see if you have suggestions on ways to speed up my streaming program. I am thinking that this part of my program is probably slow: >> >> <xsl:for-each select="*[name(.) ne 'Airport_SID_Primary_Records'] >> [name(.) ne 'Airport_STAR_Primary_Records'] >> [name(.) ne 'Airport_Approach_Primary_Records'] >> [ends-with(name(.),'Primary_Records')]"> >> <xsl:for-each select="*[(. eq $identifier) and (name(.) ne 'Recommended_Navaid')]"> >> <result> >> <identifier><xsl:value-of select="$identifier"/></identifier> >> <record><xsl:value-of select="name(..)"/></record> >> <field><xsl:sequence select="."/></field> >> </result> >> </xsl:for-each> >> </xsl:for-each> >> >> That code is for processing a <record> element. The code checks that the <record> element's child element is not an <Airport_SID_Primary_Records> element, not an <Airport_STAR_Primary_Records> element, not an <Airport_Approach_Primary_Records> element, and its element name ends with "Primary_Records". If the <record> element's child element satisfies all those criteria, then the code iterates over all the elements inside the <record> element's child element that contain a value matching the identifier and with an element name not equal to "Recommended_Navaid." >> >> Is there a way to rewrite the code to make it execute faster? >> >> Here is my complete program: >> >> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >> xmlns:xs="http://www.w3.org/2001/XMLSchema" >> exclude-result-prefixes="#all" >> version="3.0"> >> <xsl:output method="xml" /> >> <xsl:variable name="identifiers" select="doc('identifiers.xml')/*"/> >> <xsl:template name="main"> >> <results> >> <xsl:for-each select="$identifiers/*"> >> <xsl:variable name="identifier" select="." as="xs:string"/> >> <xsl:source-document href="records.xml" streamable="yes"> > > >So that approach processes the records.xml 1900 times with streaming. > >Is the order of the resulting elements important? Otherwise you could stream once and check all your identifiers for each record, as I tried to indicate in my first answer: > > ><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >B B xmlns:xs="http://www.w3.org/2001/XMLSchema" >B B exclude-result-prefixes="#all" >B B version="3.0"> > >B B <xsl:output method="xml" /> > >B B <xsl:variable name="identifiers" select="doc('idents.xml')/*"/> > >B B <xsl:template name="main"> >B B B B <xsl:source-document href="records.xml" >streamable="yes"> >B B B B B B <results> > >B B B B B B B B B B <xsl:for-each select="records/record"> >B B B B B B B B B B B B <xsl:variable name="record" select="copy-of(.)"/> >B B B B B B B B B B B B <xsl:for-each select="$identifiers/identifier"> >B B B B B B B B B B B B B B <xsl:variable name="ident" select="."/> >B B B B B B B B B B B B B B ...B (use $ident to process $record) >B B B B B B B B B B B B </xsl:for-each> >B B B B B B B B B B </xsl:for-each> >B B B B B B B B </xsl:for-each> >B B B B B B </results> >B B B B </xsl:source-document> >B B </xsl:template> > ></xsl:stylesheet> > > >That should give you the same elements as your intent, but only streaming once through the 5 millions records. The order of elements in the result will be different perhaps, not sure whether it matters.
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Seek ways to make my stre, Martin Honnen martin | Thread | Re: [xsl] Seek ways to make my stre, Graydon graydon@xxxx |
Re: [xsl] Seek ways to make my stre, Martin Honnen martin | Date | Re: [xsl] Seek ways to make my stre, Martin Honnen martin |
Month |