Re: [xsl] An efficient XSLT program that searches a large XML document for all occurrences of a string?

Subject: Re: [xsl] An efficient XSLT program that searches a large XML document for all occurrences of a string?
From: "Martin Honnen martin.honnen@xxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 2 May 2024 13:35:22 -0000
On 02/05/2024 15:08, Roger L Costello costello@xxxxxxxxx wrote:
Hi Folks,

I have an XSLT program that locates all leaf elements which have the string
value 'DNKK'. My program outputs the element and the name of its parent:

<xsl:template match="/"> <results> <xsl:for-each select="//*[not(*)][. eq 'DNKK']"> <result> <xsl:sequence select="."/> <parent><xsl:value-of select="name(..)"/></parent> </result> </xsl:for-each> </results> </xsl:template>

The input XML document is large, nearly 5GB.

When I run my program SAXON throws the OutOfMemoryError message shown
below.

To solve the OutOfMemoryError I could add to my heap space (-Xmx) when I
invoke Java. But I wonder if there a way to write my program so that it is
more efficient (i.e., doesn't require so much memory)?


Saxon EE is the only XSLT 3 processor implementing streaming so there you could try


<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; B version="3.0" B xmlns:xs="http://www.w3.org/2001/XMLSchema"; B exclude-result-prefixes="#all" B expand-text="yes">

B <xsl:param name="search-term" as="xs:string" select="'DNKK'"/>

B <xsl:output indent="yes"/>

B  <xsl:mode streamable="yes" on-no-match="shallow-skip"
use-accumulators="#all"/>

B  <xsl:accumulator name="string-value" as="xs:string?"
initial-value="()" streamable="yes">
B B B  <xsl:accumulator-rule match="*" select="()"/>
B B B  <xsl:accumulator-rule match="text()" select="$value || ."/>
B  </xsl:accumulator>

B  <xsl:template match="*">
B B B  <xsl:apply-templates/>
B B B  <xsl:variable name="string-value"
select="accumulator-after('string-value')"/>
B B B  <xsl:if test="not(empty($string-value)) and $string-value =
$search-term">
B B B B B B  <result>
B B B B B B B B  <xsl:copy>{$string-value}</xsl:copy>
B B B B B B B B  <parent>{node-name(..)}</parent>
B B B B B B  </result>
B B B  </xsl:if>
B  </xsl:template>

</xsl:stylesheet>

Current Thread