RE: [xsl] String conversion problem when string is large

Subject: RE: [xsl] String conversion problem when string is large
From: "Bulgrien, Kevin" <Kevin.Bulgrien@xxxxxxxxxxxx>
Date: Wed, 21 Mar 2012 13:31:48 -0500
> -----Original Message-----
> From: Michael Kay [mailto:mike@xxxxxxxxxxxx]
> Sent: Tuesday, March 20, 2012 3:50 PM
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: Re: [xsl] String conversion problem when string is large
>
> Try changing this:
>
>            <xsl:with-param name="HexData">
>              <xsl:value-of select="substring-after($HexData, ',')" />
>            </xsl:with-param>
>
> to this:
>
>            <xsl:with-param name="HexData"
> select="substring-after($HexData, ',')" />
>
>
> Passing the parameter as a string will be MUCH more efficient
> than passing it as a TinyTree.
>
> Even better, though probably not necessary, would be to pass
> the original unchanged string plus an offset.
>
> Michael Kay
> Saxonica

This problem is proving to be quite educational or insightful.

So far, of five engines I have (xsltproc, sablotron, AltovaXML.exe, msxsl.exe,
and saxonhe.jar), only saxonhe.jar works with this data set and the recursive
implementations I have tried.  It would appear that for this conversion to
work in most engines, it must be implemented a completely different way.

The following implementation is an attempt to use suggestions to use an index.
Count is passed in to avoid a repeated and expensive computation of the
terminating case.

  <xsl:call-template name="HexToDec">
    <xsl:with-param name="HexData" select="." />
    <xsl:with-param name="Count" select="@Count" />
  </xsl:call-template>

  <!--
=======================================================================
   !
   ! Convert data in the form "0xhh,0xhh,...", to a comma-separated list of
   ! decimal numbers in the form "n,n,...".  Count is the number of items
   ! in HexData.  Index is the item currently being converted.
   !
   ! --><xsl:template name="HexToDec">
  <xsl:param name="HexData" />
  <xsl:param name="Count" />
  <xsl:param name="Index" select="0" />
  <xsl:if test="$Index &lt; $Count">
    <xsl:variable name="Hex" select="'0123456789ABCDEF'" />
    <xsl:text>,</xsl:text>
    <xsl:value-of
      select="string-length(
                substring-before(
                  $Hex, substring($HexData, $Index * 5 + 3, 1))) * 16 +
              string-length(
                substring-before(
                   $Hex,substring($HexData, $Index * 5 + 4, 1)))"
      />
    <xsl:call-template name="HexToDec">
      <xsl:with-param name="Count"   select="$Count" />
      <xsl:with-param name="HexData" select="$HexData" />
      <xsl:with-param name="Index"   select="$Index + 1" />
    </xsl:call-template>
  </xsl:if>
</xsl:template>

It appears that simply using a plain variable reference in the parameter
constitutes a pass by reference, and that part of the original implementation
problem was that the sub-string operations were allocating new copies of the
string in memory at each level of the recursion.

This implementation halves Saxon's memory usage at the cost of increasing
execution time though the last example below can almost half it again.

  W/Index & Count
  Execution time: 1m 41.223s (101223ms)
  Memory used: 57863448

vs.

  Original after collapsing the xsl:with-param to use select attribute:
  Execution time: 854ms
  Memory used: 110204152

vs.

  Like W/Index & Count except that Count is computed.
  Execution time: 11m 9.027s (669027ms)
  Memory used: 32725384

vs.

  Execution time: 1m 40.987s (100987ms)
  Memory used: 60866736

  <xsl:call-template name="HexToDec">
    <xsl:with-param name="HexData" select="." />
    <xsl:with-param name="Count" select="@Count" />
  </xsl:call-template>

  <xsl:template name="HexToDec">
    <xsl:param name="HexData" />
    <xsl:param name="Count" />
    <xsl:param name="Index" select="0" />
    <xsl:param name="Hex" select="'0123456789ABCDEF'" />
    <xsl:if test="$Index &lt; $Count">
      <xsl:text>,</xsl:text>
      <xsl:value-of
        select="string-length(
                  substring-before(
                    $Hex, substring($HexData, $Index * 5 + 3, 1))) * 16 +
                string-length(
                  substring-before(
                     $Hex,substring($HexData, $Index * 5 + 4, 1)))"
        />
      <xsl:call-template name="HexToDec">
        <xsl:with-param name="Count"   select="$Count" />
        <xsl:with-param name="HexData" select="$HexData" />
        <xsl:with-param name="Index"   select="$Index + 1" />
      </xsl:call-template>
    </xsl:if>
  </xsl:template>

vs.

  Execution time: 1m 40.994s (100994ms)
  Memory used: 29581720

  <xsl:call-template name="HexToDec">
    <xsl:with-param  name="HexData" select="." />
    <xsl:with-param  name="Count" select="@Count" />
    <xsl:with-param  name="Hex" select="'0123456789ABCDEF'" />
  </xsl:call-template>

  <xsl:template name="HexToDec">
    <xsl:param name="HexData" />
    <xsl:param name="Count" />
    <xsl:param name="Index" select="0" />
    <xsl:param name="Hex" />
    <xsl:if test="$Index &lt; $Count">
      <xsl:text>,</xsl:text>
      <xsl:value-of
        select="string-length(
                  substring-before(
                    $Hex, substring($HexData, $Index * 5 + 3, 1))) * 16 +
                string-length(
                  substring-before(
                     $Hex,substring($HexData, $Index * 5 + 4, 1)))"
        />
      <xsl:call-template name="HexToDec">
        <xsl:with-param name="HexData" select="$HexData" />
        <xsl:with-param name="Index"   select="$Index + 1" />
        <xsl:with-param name="Count"   select="$Count" />
        <xsl:with-param name="Hex"     select="$Hex" />
      </xsl:call-template>
    </xsl:if>
  </xsl:template>

I had no idea what all was going on under the hood.

Based on memory usage, perhaps I am getting more of an idea how to pass by
reference.

> On 20/03/2012 19:58, Bulgrien, Kevin wrote:
> > -----Original Message-----
> > From: Bulgrien, Kevin [mailto:Kevin.Bulgrien@xxxxxxxxxxxx]
> > Sent: Tuesday, March 20, 2012 2:06 PM
> > To: 'xsl-list@xxxxxxxxxxxxxxxxxxxxxx'
> > Subject: RE: [xsl] String conversion problem when string is large
> >
> > -----Original Message-----
> > From: Michael Kay [mailto:mike@xxxxxxxxxxxx]
> > Sent: Tuesday, March 20, 2012 1:39 PM
> > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> > Subject: Re: [xsl] String conversion problem when string is large
> >
> > The simplest solution is to just find a different XSLT
> processor, one that implements tail recursion optimization.
> Saxon, for example.
> >
> > You could rewrite the code either to use XSLT 2.0 string
> handling or to use divide-and-conquer recursion, but unless
> there is something that ties you to your current XSLT
> processor there is no need to change the code.
> >
> > Michael Kay
> > Saxonica
> > -----
> >
> > I didn't expect that answer... I guess that's encouraging.
> >
> > I have tried the Java version of SaxonB 9-1-0-8j, but some
> links appeared to be broken (or else something on my company
> proxy choked) on the SourceForge relative to the most recent
> .zip of SaxonHE9-4 so I didn't try it before today.  Since
> your reply, I tried some creative Googling and turned up a
> download link that works.  I'll give try SaxonHE9-4-0-3J.zip a try.
> >
> > -----
> >
> > Well, I tried SaxonHE9-4 and got:
> >
> > $ java -Xms1g -Xmx2g -jar ~/bin/saxon9he.jar -t
> > -s:develop/idiffout.xml -xsl:idiffout.xsl -o:idiffout.csv Saxon-HE
> > 9.4.0.3J from Saxonica Java version 1.6.0_22
> > Warning: at xsl:stylesheet on line 2 column 80 of idiffout.xsl:
> >    Running an XSLT 1 stylesheet with an XSLT 2 processor Stylesheet
> > compilation time: 437 milliseconds Processing
> > file:/home/kbulgrien/cvs/r8000/update/IDiff2DUA/develop/idiffout.xml
> > Using parser
> > com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser
> > Building tree for
> >
> file:/home/kbulgrien/cvs/r8000/update/IDiff2DUA/develop/idiffout.xml
> > using class net.sf.saxon.tree.tiny.TinyBuilder
> > Tree built in 162 milliseconds
> > Tree size: 6616 nodes, 570130 characters, 10303 attributes
> Exception
> > in thread "main" java.lang.OutOfMemoryError: Java heap space
> >          at
> net.sf.saxon.tree.util.FastStringBuffer.condense(FastStringBuf
> fer.java:485)
> >          at
> net.sf.saxon.expr.instruct.DocumentInstr.evaluateItem(Document
> Instr.java:308)
> >          at
> net.sf.saxon.expr.parser.ExpressionTool.evaluate(ExpressionToo
> l.java:320)
> >          at
> net.sf.saxon.expr.instruct.GeneralVariable.getSelectValue(Gene
> ralVariable.java:529)
> >          at
> net.sf.saxon.expr.instruct.Instruction.assembleParams(Instruct
> ion.java:187)
> >          at
> net.sf.saxon.expr.instruct.CallTemplate.processLeavingTail(Cal
> lTemplate.java:369)
> >          at
> net.sf.saxon.expr.instruct.Block.processLeavingTail(Block.java:615)
> >          at
> net.sf.saxon.expr.instruct.Choose.processLeavingTail(Choose.java:794)
> >          at
> net.sf.saxon.expr.instruct.Block.processLeavingTail(Block.java:615)
> >          at
> net.sf.saxon.expr.instruct.Template.expand(Template.java:231)
> >          at
> net.sf.saxon.expr.instruct.CallTemplate$CallTemplatePackage.pr
ocessLeavingTail(CallTemplate.java:526)
> >          at
> net.sf.saxon.expr.instruct.ApplyTemplates.apply(ApplyTemplates
> .java:239)
> >          at
> net.sf.saxon.expr.instruct.ApplyTemplates.processLeavingTail(A
> pplyTemplates.java:199)
> >          at
> net.sf.saxon.expr.instruct.Block.processLeavingTail(Block.java:615)
> >          at
> net.sf.saxon.expr.instruct.Choose.processLeavingTail(Choose.java:794)
> >          at
> net.sf.saxon.expr.instruct.Block.processLeavingTail(Block.java:615)
> >          at
> net.sf.saxon.expr.instruct.Template.applyLeavingTail(Template.
> java:212)
> >          at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1034)
> >          at
> net.sf.saxon.expr.instruct.ApplyTemplates$ApplyTemplatesPackag
e.processLeavingTail(ApplyTemplates.java:476)
> >          at
> net.sf.saxon.expr.instruct.ApplyTemplates.apply(ApplyTemplates
> .java:239)
> >          at
> net.sf.saxon.expr.instruct.ApplyTemplates.processLeavingTail(A
> pplyTemplates.java:199)
> >          at
> net.sf.saxon.expr.instruct.Block.processLeavingTail(Block.java:615)
> >          at
> net.sf.saxon.expr.instruct.Template.applyLeavingTail(Template.
> java:212)
> >          at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1034)
> >          at
> net.sf.saxon.expr.instruct.ApplyTemplates.apply(ApplyTemplates
> .java:237)
> >          at
> net.sf.saxon.expr.instruct.ApplyTemplates.processLeavingTail(A
> pplyTemplates.java:199)
> >          at
> net.sf.saxon.expr.instruct.Choose.processLeavingTail(Choose.java:794)
> >          at
> net.sf.saxon.expr.instruct.Block.processLeavingTail(Block.java:615)
> >          at
> net.sf.saxon.expr.instruct.Template.applyLeavingTail(Template.
> java:212)
> >          at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1034)
> >          at
> net.sf.saxon.expr.instruct.ApplyTemplates.apply(ApplyTemplates
> .java:237)
> >          at
> >
> net.sf.saxon.expr.instruct.ApplyTemplates.processLeavingTail(ApplyTemp
> > lates.java:199)
> >
> > $ tail -1 idiffout.csv | awk 'BEGIN { FS=","; } { print NF
> " vs " $9 "
> > }' -
> > 3954 vs 53392
> >
> > I don't know if there is a better way to invoke the
> processor or not, nor if I should try the .NET version instead.
> > I suppose it is possible that something else in the overall
> transform is to blame, but the transform exploded in the same spot.
> >
> > Kevin Bulgrien
> >
> >
> > This message and/or attachments may include information
> subject to GD Corporate Policy 07-105 and is intended to be
> accessed only by authorized personnel of General Dynamics and
> approved service providers.  Use, storage and transmission
> are governed by General Dynamics and its policies.
> Contractual restrictions apply to third parties.  Recipients
> should refer to the policies or contract to determine proper
> handling.  Unauthorized review, use, disclosure or
> distribution is prohibited.  If you are not an intended
> recipient, please contact the sender and destroy all copies
> of the original message.
>
>

This message and/or attachments may include information subject to GD
Corporate Policy 07-105 and is intended to be accessed only by authorized
personnel of General Dynamics and approved service providers.  Use, storage
and transmission are governed by General Dynamics and its policies.
Contractual restrictions apply to third parties.  Recipients should refer to
the policies or contract to determine proper handling.  Unauthorized review,
use, disclosure or distribution is prohibited.  If you are not an intended
recipient, please contact the sender and destroy all copies of the original
message.

Current Thread