Subject: Re: [xsl] An efficient XSLT program that searches a large XML document for all occurrences of a string? From: "Dimitre Novatchev dnovatchev@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Thu, 2 May 2024 23:25:06 -0000 |
While Martin, Michael Kay and other people provided valuable advice towards streaming, it is probably a good moment to raise the question why, and should such a huge document be created and probably continuously augmented with new, additional data. It has been proven in practice that horizontal scaling can be implemented much easier than vertical scaling, while the latter is quite limited. I believe that if a large XML document cannot be split into mutually non-overlapping and comprising subtrees (horizontally), then most likely the complexity of this document is unnecessarily huge. Imagine having all the data about the 100B+ stars in the Milky Way put into a single XML document... If I were related to any activity that collects and structures such large quantities of data, I would envisage splitting and keeping this data into smaller, manageable chunks, wherever possible. Thanks, Dimitre On Thu, May 2, 2024 at 6:07b/AM Roger L Costello costello@xxxxxxxxx < xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > Hi Folks, > > I have an XSLT program that locates all leaf elements which have the > string value 'DNKK'. My program outputs the element and the name of its > parent: > > <xsl:template match="/"> > <results> > <xsl:for-each select="//*[not(*)][. eq 'DNKK']"> > <result> > <xsl:sequence select="."/> > <parent><xsl:value-of select="name(..)"/></parent> > </result> > </xsl:for-each> > </results> > </xsl:template> > > The input XML document is large, nearly 5GB. > > When I run my program SAXON throws the OutOfMemoryError message shown > below. > > To solve the OutOfMemoryError I could add to my heap space (-Xmx) when I > invoke Java. But I wonder if there a way to write my program so that it is > more efficient (i.e., doesn't require so much memory)? > > /Roger > > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space > at java.base/java.util.Arrays.copyOf(Arrays.java:3841) > at > net.sf.saxon.tree.util.FastStringBuffer.ensureCapacity(FastStringBuffer.java: 575) > at > net.sf.saxon.tree.tiny.CompressedWhitespace.uncompress(CompressedWhitespace.j ava:112) > at > net.sf.saxon.tree.tiny.WhitespaceTextImpl.appendStringValue(WhitespaceTextImp l.java:82) > at > net.sf.saxon.tree.tiny.TinyParentNodeImpl.getStringValueCS(TinyParentNodeImpl .java:99) > at > net.sf.saxon.tree.tiny.TinyTree.getTypedValueOfElement(TinyTree.java:530) > at > net.sf.saxon.tree.tiny.TinyElementImpl.atomize(TinyElementImpl.java:105) > at net.sf.saxon.expr.Atomizer.evaluateItem(Atomizer.java:384) > at net.sf.saxon.expr.Atomizer.evaluateItem(Atomizer.java:40) > at > net.sf.saxon.expr.ValueComparison.effectiveBooleanValue(ValueComparison.java: 347) > at > com.saxonica.ee.bytecode.ByteCodeCandidate.effectiveBooleanValue(ByteCodeCand idate.java:132) > at > net.sf.saxon.expr.FilterIterator$NonNumeric.matches(FilterIterator.java:177) > at > net.sf.saxon.expr.FilterIterator.getNextMatchingItem(FilterIterator.java:76) > at net.sf.saxon.expr.FilterIterator.next(FilterIterator.java:62) > at net.sf.saxon.om > .FocusTrackingIterator.next(FocusTrackingIterator.java:75) > at > net.sf.saxon.expr.FilterIterator.getNextMatchingItem(FilterIterator.java:75) > at net.sf.saxon.expr.FilterIterator.next(FilterIterator.java:62) > at net.sf.saxon.om > .FocusTrackingIterator.next(FocusTrackingIterator.java:75) > at net.sf.saxon.om > .SequenceIterator.forEachOrFail(SequenceIterator.java:134) > at > net.sf.saxon.expr.instruct.ForEach.processLeavingTail(ForEach.java:489) > at > net.sf.saxon.expr.instruct.Instruction.process(Instruction.java:136) > at > net.sf.saxon.expr.instruct.ElementCreator.processLeavingTail(ElementCreator.j ava:346) > at > net.sf.saxon.expr.instruct.ElementCreator.processLeavingTail(ElementCreator.j ava:292) > at > net.sf.saxon.expr.instruct.TemplateRule.applyLeavingTail(TemplateRule.java:37 4) > at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:555) > at > net.sf.saxon.trans.XsltController.applyTemplates(XsltController.java:659) > at > net.sf.saxon.s9api.AbstractXsltTransformer.applyTemplatesToSource(AbstractXsl tTransformer.java:360) > at > net.sf.saxon.s9api.Xslt30Transformer.applyTemplates(Xslt30Transformer.java:28 5) > at net.sf.saxon.Transform.processFile(Transform.java:1313) > at net.sf.saxon.Transform.doTransform(Transform.java:853) > at net.sf.saxon.Transform.main(Transform.java:82)
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] An efficient XSLT program, Michael Kay michaelk | Thread | Re: [xsl] An efficient XSLT program, Michael Kay michaelk |
Re: [xsl] An efficient XSLT program, Bauman, Syd s.bauman | Date | Re: [xsl] An efficient XSLT program, Liam R. E. Quin liam |
Month |