Subject: [xsl] [Summary] An efficient XSLT program that searches a large XML document for all occurrences of a string From: "Roger L Costello costello@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Fri, 3 May 2024 07:48:36 -0000 |
Hi Folks, Thank you for your excellent responses! I decided to go the XSLT streaming route. Below is my summary of how to do it. First, the problem statement: I have an XSLT program that locates all leaf elements which have the string value 'DNKK'. My program outputs the element and the name of its parent: <xsl:template match="/"> <results> <xsl:for-each select="//*[not(*)][. eq 'DNKK']"> <result> <xsl:sequence select="."/> <parent><xsl:value-of select="name(..)"/></parent> </result> </xsl:for-each> </results> </xsl:template> The input XML document is large, nearly 5GB. When I run my XSLT program, SAXON throws an OutOfMemoryError message. To solve the OutOfMemoryError I could add to my heap space (-Xmx) when I invoke Java. I tried that, adding as much as 10GB of heap space, and I still got the OutOfMemoryError message. So I went with XSLT streaming. Here's how to do it. The SAXON documentation [1] says this: "Using the xsl:source-document instruction, with the attribute streamable="yes". Here the source document is identified within the stylesheet itself. Typically such a stylesheet will have a named template as its entry point, and will not have any principal source document supplied externally." So, I created an XSLT document containing just a named template. Michael Kay showed how to reformulate my XSLT code to be streamable. Here's the complete XSLT program: <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0"> <xsl:template name="test"> <xsl:source-document href="Input.xml" streamable="yes"> <results> <xsl:for-each select="//text()[. eq 'DNKK']"> <result> <xsl:element name="{name(..)}">DNKK</xsl:element> <parent><xsl:value-of select="name(../..)"/></parent> </result> </xsl:for-each> </results> </xsl:source-document> </xsl:template> </xsl:stylesheet> I saved that to the file get-records.xsl Then I opened a command window and typed this: java -classpath %CLASSPATH% net.sf.saxon.Transform -it:test -xsl:get-records.xsl -o:results.xml The SAXON documentation [2] says this about the -it (initial template) flag: -it[:template-name] Selects the initial named template to be executed. I ran it and it worked beautifully! /Roger [1] https://www.saxonica.com/html/documentation10/sourcedocs/streaming/xslt-strea ming.html [2] https://www.saxonica.com/documentation12/index.html#!using-xsl/commandline
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] An efficient XSLT program, Christian Grün chris | Thread | Re: [xsl] [Summary] An efficient XS, Martin Honnen martin |
Re: [xsl] An efficient XSLT program, Liam R. E. Quin liam | Date | Re: [xsl] [Summary] An efficient XS, Martin Honnen martin |
Month |