|
Subject: [xsl] [Summary] An efficient XSLT program that searches a large XML document for all occurrences of a string From: "Roger L Costello costello@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Fri, 3 May 2024 07:48:36 -0000 |
Hi Folks,
Thank you for your excellent responses!
I decided to go the XSLT streaming route. Below is my summary of how to do
it.
First, the problem statement:
I have an XSLT program that locates all leaf elements which have the string
value 'DNKK'. My program outputs the element and the name of its parent:
<xsl:template match="/">
<results>
<xsl:for-each select="//*[not(*)][. eq 'DNKK']">
<result>
<xsl:sequence select="."/>
<parent><xsl:value-of select="name(..)"/></parent>
</result>
</xsl:for-each>
</results>
</xsl:template>
The input XML document is large, nearly 5GB.
When I run my XSLT program, SAXON throws an OutOfMemoryError message.
To solve the OutOfMemoryError I could add to my heap space (-Xmx) when I
invoke Java. I tried that, adding as much as 10GB of heap space, and I still
got the OutOfMemoryError message.
So I went with XSLT streaming. Here's how to do it.
The SAXON documentation [1] says this: "Using the xsl:source-document
instruction, with the attribute streamable="yes". Here the source document is
identified within the stylesheet itself. Typically such a stylesheet will have
a named template as its entry point, and will not have any principal source
document supplied externally."
So, I created an XSLT document containing just a named template. Michael Kay
showed how to reformulate my XSLT code to be streamable. Here's the complete
XSLT program:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0">
<xsl:template name="test">
<xsl:source-document href="Input.xml" streamable="yes">
<results>
<xsl:for-each select="//text()[. eq 'DNKK']">
<result>
<xsl:element name="{name(..)}">DNKK</xsl:element>
<parent><xsl:value-of select="name(../..)"/></parent>
</result>
</xsl:for-each>
</results>
</xsl:source-document>
</xsl:template>
</xsl:stylesheet>
I saved that to the file get-records.xsl
Then I opened a command window and typed this:
java -classpath %CLASSPATH% net.sf.saxon.Transform -it:test
-xsl:get-records.xsl -o:results.xml
The SAXON documentation [2] says this about the -it (initial template) flag:
-it[:template-name] Selects the initial named template to be executed.
I ran it and it worked beautifully!
/Roger
[1]
https://www.saxonica.com/html/documentation10/sourcedocs/streaming/xslt-strea
ming.html
[2]
https://www.saxonica.com/documentation12/index.html#!using-xsl/commandline
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| Re: [xsl] An efficient XSLT program, Christian Grün chris | Thread | Re: [xsl] [Summary] An efficient XS, Martin Honnen martin |
| Re: [xsl] An efficient XSLT program, Liam R. E. Quin liam | Date | Re: [xsl] [Summary] An efficient XS, Martin Honnen martin |
| Month |