Re: [xsl] Does <xsl:copy> use a lot of memory? Is there an alternative that is more efficient?

Subject: Re: [xsl] Does <xsl:copy> use a lot of memory? Is there an alternative that is more efficient?
From: Dimitre Novatchev <dnovatchev@xxxxxxxxx>
Date: Sun, 2 Sep 2012 08:34:34 -0700
In case the input document is parsed successfully, wouldn't the use of
<xsl:sequence> (instead of <xsl:copy-of>) result in using less memory?

Cheers,
Dimitre

On Sun, Sep 2, 2012 at 7:31 AM, Michael Kay <mike@xxxxxxxxxxxx> wrote:
> Memory is used for the source document and for intermediate variables. In
> Saxon, and I suspect in most processors, no memory is used for the result
> tree provided that the transformation is writing directly to a serializer.
>
> Intrinsically, all xsl:copy has to do is to send two events - startElement
> and endElement - to the serializer.
>
> I would strongly suspect that the out of memory error occurs during building
> of the source tree, and will happen whatever transformation  you run. For a
> 370Mb input document, you should probably allocate at least 2Gb of memory,
> preferably more.
>
> Michael Kay
> Saxonica
>
>
> On 02/09/2012 13:47, Costello, Roger L. wrote:
>>
>> Hi Folks,
>>
>> Does <xsl:copy> use a lot of memory?
>>
>> Is there an alternative that is more efficient?
>>
>> Consider this problem. I have an XML document in which some elements have
>> an id attribute and others have an idref attribute. If an element A
>> references element B, then I want to embed B inside A.
>>
>> Example: I want to convert this:
>>
>> <Test>
>>      <A idref="b" />
>>      <B id="b" />
>> </Test>
>>
>> to this:
>>
>> <Test>
>>      <A>
>>          <B id="b" />
>>      </A>
>>      <B id="b" />
>> </Test>
>>
>> Notice that A references B, and after processing B is nested inside A.
>>
>> Here's a template that handles elements with a reference:
>>
>>      <xsl:key name="ids" match="*[@id]" use="@id"/>
>>
>>      <xsl:template match="*[@idref]">
>>                   <xsl:variable name="refed-element" select="key('ids',
>> @idref)"/>
>>                   <xsl:copy>
>>              <xsl:copy-of select="@* except @idref" />
>>              <xsl:sequence select="$refed-element" />
>>          </xsl:copy>
>>               </xsl:template>
>>
>> The complete program is below.
>>
>> It works fine if:
>>
>> (a) The XML document is small.
>> (b) I don't have to repeat this embedding process too many times.
>>
>> However, such is not the case. I am dealing with an XML document that is
>> 370 MB in size and has tens of thousands of references. And I have to repeat
>> the embedding process multiple times.
>>
>> Saxon gives me an "out of memory error."
>>
>> I suspect the reason for this is due to the <xsl:copy> command. I believe
>> it is making new copies, thereby consuming lots of memory. True?
>>
>> So, is there an alternative to <xsl:copy> that is more efficient?
>>
>> Is there a way to express the above template rule that is more efficient?
>>
>> /Roger
>>
>> -----------------------------------------------------------------------------------------
>> <?xml version="1.0" encoding="UTF-8"?>
>> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
>>                  exclude-result-prefixes="#all"
>>                  version="2.0">
>>
>>      <xsl:output method="xml" />
>>           <xsl:key name="ids" match="*[@id]" use="@id"/>
>>           <xsl:template match="*[@idref]">
>>                   <xsl:variable name="refed-element" select="key('ids',
>> @idref)"/>
>>                   <xsl:copy>
>>              <xsl:copy-of select="@* except @idref" />
>>              <xsl:sequence select="$refed-element" />
>>          </xsl:copy>
>>               </xsl:template>
>>                <xsl:template match="node()">
>>                   <xsl:copy>
>>              <xsl:copy-of select="@*"/>
>>              <xsl:apply-templates />
>>          </xsl:copy>
>>               </xsl:template>
>>
>> </xsl:stylesheet>
>



-- 
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
To avoid situations in which you might make mistakes may be the
biggest mistake of all
------------------------------------
Quality means doing it right when no one is looking.
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play
-------------------------------------
Facts do not cease to exist because they are ignored.
-------------------------------------
I finally figured out the only reason to be alive is to enjoy it.

Current Thread