[xsl] An efficient XSLT program that searches a large XML document for all occurrences of a string?

Subject: [xsl] An efficient XSLT program that searches a large XML document for all occurrences of a string?
From: "Roger L Costello costello@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 2 May 2024 13:07:39 -0000
Hi Folks,

I have an XSLT program that locates all leaf elements which have the string
value 'DNKK'. My program outputs the element and the name of its parent:

    <xsl:template match="/">
        <results>
            <xsl:for-each select="//*[not(*)][. eq 'DNKK']">
                <result>
                    <xsl:sequence select="."/>
                    <parent><xsl:value-of select="name(..)"/></parent>
                </result>
            </xsl:for-each>
        </results>
    </xsl:template>

The input XML document is large, nearly 5GB.

When I run my program SAXON throws the OutOfMemoryError message shown below.

To solve the OutOfMemoryError I could add to my heap space (-Xmx) when I
invoke Java. But I wonder if there a way to write my program so that it is
more efficient (i.e., doesn't require so much memory)?

/Roger

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.base/java.util.Arrays.copyOf(Arrays.java:3841)
        at
net.sf.saxon.tree.util.FastStringBuffer.ensureCapacity(FastStringBuffer.java:
575)
        at
net.sf.saxon.tree.tiny.CompressedWhitespace.uncompress(CompressedWhitespace.j
ava:112)
        at
net.sf.saxon.tree.tiny.WhitespaceTextImpl.appendStringValue(WhitespaceTextImp
l.java:82)
        at
net.sf.saxon.tree.tiny.TinyParentNodeImpl.getStringValueCS(TinyParentNodeImpl
.java:99)
        at
net.sf.saxon.tree.tiny.TinyTree.getTypedValueOfElement(TinyTree.java:530)
        at
net.sf.saxon.tree.tiny.TinyElementImpl.atomize(TinyElementImpl.java:105)
        at net.sf.saxon.expr.Atomizer.evaluateItem(Atomizer.java:384)
        at net.sf.saxon.expr.Atomizer.evaluateItem(Atomizer.java:40)
        at
net.sf.saxon.expr.ValueComparison.effectiveBooleanValue(ValueComparison.java:
347)
        at
com.saxonica.ee.bytecode.ByteCodeCandidate.effectiveBooleanValue(ByteCodeCand
idate.java:132)
        at
net.sf.saxon.expr.FilterIterator$NonNumeric.matches(FilterIterator.java:177)
        at
net.sf.saxon.expr.FilterIterator.getNextMatchingItem(FilterIterator.java:76)
        at net.sf.saxon.expr.FilterIterator.next(FilterIterator.java:62)
        at
net.sf.saxon.om.FocusTrackingIterator.next(FocusTrackingIterator.java:75)
        at
net.sf.saxon.expr.FilterIterator.getNextMatchingItem(FilterIterator.java:75)
        at net.sf.saxon.expr.FilterIterator.next(FilterIterator.java:62)
        at
net.sf.saxon.om.FocusTrackingIterator.next(FocusTrackingIterator.java:75)
        at
net.sf.saxon.om.SequenceIterator.forEachOrFail(SequenceIterator.java:134)
        at
net.sf.saxon.expr.instruct.ForEach.processLeavingTail(ForEach.java:489)
        at
net.sf.saxon.expr.instruct.Instruction.process(Instruction.java:136)
        at
net.sf.saxon.expr.instruct.ElementCreator.processLeavingTail(ElementCreator.j
ava:346)
        at
net.sf.saxon.expr.instruct.ElementCreator.processLeavingTail(ElementCreator.j
ava:292)
        at
net.sf.saxon.expr.instruct.TemplateRule.applyLeavingTail(TemplateRule.java:37
4)
        at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:555)
        at
net.sf.saxon.trans.XsltController.applyTemplates(XsltController.java:659)
        at
net.sf.saxon.s9api.AbstractXsltTransformer.applyTemplatesToSource(AbstractXsl
tTransformer.java:360)
        at
net.sf.saxon.s9api.Xslt30Transformer.applyTemplates(Xslt30Transformer.java:28
5)
        at net.sf.saxon.Transform.processFile(Transform.java:1313)
        at net.sf.saxon.Transform.doTransform(Transform.java:853)
        at net.sf.saxon.Transform.main(Transform.java:82)

Current Thread