RE: [xsl] Does <xsl:copy> use a lot of memory? Is there an alternative that is more efficient?

Subject: RE: [xsl] Does <xsl:copy> use a lot of memory? Is there an alternative that is more efficient?
From: "Costello, Roger L." <costello@xxxxxxxxx>
Date: Mon, 3 Sep 2012 13:57:13 +0000
Michael Kay wrote:

> In Saxon, and I suspect in most processors, no memory is used for the
> result tree provided that the transformation is writing directly to a
> serializer.

So if I use xsl:copy and the results of the copy are immediately output, then
there is little or no memory consumption. Yes?

However, in my situation I need to store the results of xsl:copy into a
variable. Then I process the variable. That processing also uses xsl:copy. I
put those results into another variable. And again and again.

So in my situation is xsl:copy consuming lots of memory?

In other words, if I don't immediately output the results of xsl:copy, then
memory consumption grows and grows. Yes?

/Roger

-----Original Message-----
From: Michael Kay [mailto:mike@xxxxxxxxxxxx]
Sent: Sunday, September 02, 2012 10:31 AM
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [xsl] Does <xsl:copy> use a lot of memory? Is there an
alternative that is more efficient?

Memory is used for the source document and for intermediate variables.
In Saxon, and I suspect in most processors, no memory is used for the
result tree provided that the transformation is writing directly to a
serializer.

Intrinsically, all xsl:copy has to do is to send two events -
startElement and endElement - to the serializer.

I would strongly suspect that the out of memory error occurs during
building of the source tree, and will happen whatever transformation
you run. For a 370Mb input document, you should probably allocate at
least 2Gb of memory, preferably more.

Michael Kay
Saxonica

On 02/09/2012 13:47, Costello, Roger L. wrote:
> Hi Folks,
>
> Does <xsl:copy> use a lot of memory?
>
> Is there an alternative that is more efficient?
>
> Consider this problem. I have an XML document in which some elements have an
id attribute and others have an idref attribute. If an element A references
element B, then I want to embed B inside A.
>
> Example: I want to convert this:
>
> <Test>
>      <A idref="b" />
>      <B id="b" />
> </Test>
>
> to this:
>
> <Test>
>      <A>
>          <B id="b" />
>      </A>
>      <B id="b" />
> </Test>
>
> Notice that A references B, and after processing B is nested inside A.
>
> Here's a template that handles elements with a reference:
>
>      <xsl:key name="ids" match="*[@id]" use="@id"/>
>
>      <xsl:template match="*[@idref]">
>
>          <xsl:variable name="refed-element" select="key('ids', @idref)"/>
>
>          <xsl:copy>
>              <xsl:copy-of select="@* except @idref" />
>              <xsl:sequence select="$refed-element" />
>          </xsl:copy>
>
>      </xsl:template>
>
> The complete program is below.
>
> It works fine if:
>
> (a) The XML document is small.
> (b) I don't have to repeat this embedding process too many times.
>
> However, such is not the case. I am dealing with an XML document that is 370
MB in size and has tens of thousands of references. And I have to repeat the
embedding process multiple times.
>
> Saxon gives me an "out of memory error."
>
> I suspect the reason for this is due to the <xsl:copy> command. I believe it
is making new copies, thereby consuming lots of memory. True?
>
> So, is there an alternative to <xsl:copy> that is more efficient?
>
> Is there a way to express the above template rule that is more efficient?
>
> /Roger
>
-----------------------------------------------------------------------------
------------
> <?xml version="1.0" encoding="UTF-8"?>
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
>                  exclude-result-prefixes="#all"
>                  version="2.0">
>
>      <xsl:output method="xml" />
>
>      <xsl:key name="ids" match="*[@id]" use="@id"/>
>
>      <xsl:template match="*[@idref]">
>
>          <xsl:variable name="refed-element" select="key('ids', @idref)"/>
>
>          <xsl:copy>
>              <xsl:copy-of select="@* except @idref" />
>              <xsl:sequence select="$refed-element" />
>          </xsl:copy>
>
>      </xsl:template>
>
>
>      <xsl:template match="node()">
>
>          <xsl:copy>
>              <xsl:copy-of select="@*"/>
>              <xsl:apply-templates />
>          </xsl:copy>
>
>      </xsl:template>
>
> </xsl:stylesheet>

Current Thread