Subject: Re: [xsl] Does <xsl:copy> use a lot of memory? Is there an alternative that is more efficient? From: Dimitre Novatchev <dnovatchev@xxxxxxxxx> Date: Sun, 2 Sep 2012 08:34:34 -0700 |
In case the input document is parsed successfully, wouldn't the use of <xsl:sequence> (instead of <xsl:copy-of>) result in using less memory? Cheers, Dimitre On Sun, Sep 2, 2012 at 7:31 AM, Michael Kay <mike@xxxxxxxxxxxx> wrote: > Memory is used for the source document and for intermediate variables. In > Saxon, and I suspect in most processors, no memory is used for the result > tree provided that the transformation is writing directly to a serializer. > > Intrinsically, all xsl:copy has to do is to send two events - startElement > and endElement - to the serializer. > > I would strongly suspect that the out of memory error occurs during building > of the source tree, and will happen whatever transformation you run. For a > 370Mb input document, you should probably allocate at least 2Gb of memory, > preferably more. > > Michael Kay > Saxonica > > > On 02/09/2012 13:47, Costello, Roger L. wrote: >> >> Hi Folks, >> >> Does <xsl:copy> use a lot of memory? >> >> Is there an alternative that is more efficient? >> >> Consider this problem. I have an XML document in which some elements have >> an id attribute and others have an idref attribute. If an element A >> references element B, then I want to embed B inside A. >> >> Example: I want to convert this: >> >> <Test> >> <A idref="b" /> >> <B id="b" /> >> </Test> >> >> to this: >> >> <Test> >> <A> >> <B id="b" /> >> </A> >> <B id="b" /> >> </Test> >> >> Notice that A references B, and after processing B is nested inside A. >> >> Here's a template that handles elements with a reference: >> >> <xsl:key name="ids" match="*[@id]" use="@id"/> >> >> <xsl:template match="*[@idref]"> >> <xsl:variable name="refed-element" select="key('ids', >> @idref)"/> >> <xsl:copy> >> <xsl:copy-of select="@* except @idref" /> >> <xsl:sequence select="$refed-element" /> >> </xsl:copy> >> </xsl:template> >> >> The complete program is below. >> >> It works fine if: >> >> (a) The XML document is small. >> (b) I don't have to repeat this embedding process too many times. >> >> However, such is not the case. I am dealing with an XML document that is >> 370 MB in size and has tens of thousands of references. And I have to repeat >> the embedding process multiple times. >> >> Saxon gives me an "out of memory error." >> >> I suspect the reason for this is due to the <xsl:copy> command. I believe >> it is making new copies, thereby consuming lots of memory. True? >> >> So, is there an alternative to <xsl:copy> that is more efficient? >> >> Is there a way to express the above template rule that is more efficient? >> >> /Roger >> >> ----------------------------------------------------------------------------------------- >> <?xml version="1.0" encoding="UTF-8"?> >> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >> exclude-result-prefixes="#all" >> version="2.0"> >> >> <xsl:output method="xml" /> >> <xsl:key name="ids" match="*[@id]" use="@id"/> >> <xsl:template match="*[@idref]"> >> <xsl:variable name="refed-element" select="key('ids', >> @idref)"/> >> <xsl:copy> >> <xsl:copy-of select="@* except @idref" /> >> <xsl:sequence select="$refed-element" /> >> </xsl:copy> >> </xsl:template> >> <xsl:template match="node()"> >> <xsl:copy> >> <xsl:copy-of select="@*"/> >> <xsl:apply-templates /> >> </xsl:copy> >> </xsl:template> >> >> </xsl:stylesheet> > -- Cheers, Dimitre Novatchev --------------------------------------- Truly great madness cannot be achieved without significant intelligence. --------------------------------------- To invent, you need a good imagination and a pile of junk ------------------------------------- Never fight an inanimate object ------------------------------------- To avoid situations in which you might make mistakes may be the biggest mistake of all ------------------------------------ Quality means doing it right when no one is looking. ------------------------------------- You've achieved success in your field when you don't know whether what you're doing is work or play ------------------------------------- Facts do not cease to exist because they are ignored. ------------------------------------- I finally figured out the only reason to be alive is to enjoy it.
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Does <xsl:copy> use a lot, Michael Kay | Thread | Re: [xsl] Does <xsl:copy> use a lot, Michael Kay |
Re: [xsl] Does <xsl:copy> use a lot, Michael Kay | Date | Re: [xsl] Theory question about key, Michael Kay |
Month |