Re: [xsl] An efficient XSLT program that searches a large XML document for all occurrences of a string?

Subject: Re: [xsl] An efficient XSLT program that searches a large XML document for all occurrences of a string?
From: "Bauman, Syd s.bauman@xxxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 2 May 2024 22:49:52 -0000
I do not know any answer to the question, and without the data (and perhaps
some further information about your system, like default heap space) I cannot
reproduce the problem.

But my first instinct, for no intelligent reason whatsoever, is to use
template application for flow of control:

 <xsl:template match="/">
    <xsl:apply-templates select="//*[ not(*) ][. eq 'DNKK']"/>
  </xsl:template>

  <xsl:template match="*">
    <result>
      <xsl:sequence select="."/>
      <parent><xsl:value-of select="name(..)"/></parent>
    </result>
  </xsl:template>

For all I know that is worse, not better; but it is what I would try first. I
also might try things like

  *
using <xsl:copy> instead of <xsl:sequence>
  *
putting the parent name on an attribute of <result> instead of as a child
element
  *
actually selecting text nodes, rather than elements
  *
learning streaming and using EE (as already suggested)
  *
divide-and-conquer: on a first pass knock out portions of the tree that are
irrelevant or divide input file into several smaller pieces

________________________________
> Hi Folks,
>
> I have an XSLT program that locates all leaf elements which have the string
value 'DNKK'. My program outputs the element and the name of its parent:
>
>      <xsl:template match="/">
>          <results>
>              <xsl:for-each select="//*[not(*)][. eq 'DNKK']">
>                  <result>
>                      <xsl:sequence select="."/>
>                      <parent><xsl:value-of select="name(..)"/></parent>
>                  </result>
>              </xsl:for-each>
>          </results>
>      </xsl:template>
>
> The input XML document is large, nearly 5GB.
>
> When I run my program SAXON throws the OutOfMemoryError message shown
below.
>
> To solve the OutOfMemoryError I could add to my heap space (-Xmx) when I
invoke Java. But I wonder if there a way to write my program so that it is
more efficient (i.e., doesn't require so much memory)?
>

Can you use Saxon EE so that it is worth pondering XSLT 3 with streaming?

Current Thread