Subject: Re: [xsl] String conversion problem when string is large|
From: Michael Kay <mike@xxxxxxxxxxxx>
Date: Tue, 20 Mar 2012 18:39:16 +0000
Michael Kay Saxonica
I have an XML file produced by a third-party tool. An XSL Transform is used to transform data in this file to a CSV that is fed into another third-party tool. The XSL Transform has worked fine for a very long time. A recent use, however, exposed a weakness in the implementation that causes most processors to crash as a result of recursion depth.
The recursion depth problem occurs in an element that contains hex-encoded binary data like:
The CSV file needs to be generated with decimal values instead of hexadecimal values.
The XML data that needs to be translated has not usually been very large, so no particular problems have been observed for a number of years, but a problem arose when a data set came up where a container has ~50,000 hex-encoded bytes.
The translator converts the first value, and recursively calls the translation on the remaining string. This becomes a problem when 50,000 values need to be translated.
Perhaps this data set means that XSLT is no longer the right tool for this particular job, and possibly the XSL transform should simply dump the hex data for post-processing by a different tool. As my experience with XSL Is limited I'm interested in opinions on whether there are translation methods that might not be so expensive.
At risk of exposing my ignorance, this was the original code:
<!-- ======================================================================= ! ! Convert REG_BINARY data stream from a comma-separated list of hex numbers ! in the form "0xhh,0xhh,...", to a comma-separated list of decimal numbers ! in the form "n,n,...". ! ! TODO: Pass in @count and verify that the data stream length matches the ! actual length of the datasream. ! ! --> <xsl:template name="HexToDec"> <xsl:param name="HexData" /> <!-- Convert first number of HexData --> <xsl:variable name="Base" select="'0123456789ABCDEF'" /> <xsl:variable name="Num1" select="string-length(substring-before($Base,substring($HexData,3,1)))" /> <xsl:variable name="Num2" select="string-length(substring-before($Base,substring($HexData,4,1)))" /> <xsl:choose> <xsl:when test="string-length($HexData) = 0"> <!-- Do nothing --> </xsl:when> <xsl:otherwise> <xsl:text>,</xsl:text> <xsl:value-of select="$Num1 * 16 + $Num2" /> <xsl:call-template name="BinToDec"> <xsl:with-param name="HexData"> <xsl:value-of select="substring-after($HexData, ',')" /> </xsl:with-param> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template>
A rewrite using a different algorithm proved even less efficient, and was a tutorial on arcane and probably unwise use of xsl:for-each:
<xsl:template name="HexToDec"> <xsl:param name="HexData" /> <!-- Find the length of the data stream to convert. --> <xsl:variable name="HexSize" select="(string-length($HexData)+1) div 5" /> <!-- String used to convert each byte of HexData to decimal. --> <xsl:variable name="Hex" select="'0123456789ABCDEF'" /> <!-- ! All but the final byte has a comma, so when using the string length ! to determine how many bytes there are, add 1 to the string length ! before dividing by the normal length of a hex-encoded byte with ! separator. --> <xsl:for-each select="(//*)[position()<= $HexSize]"> <!-- ! Emulated "for" loop repeated content here. Use position() to get ! loop index. position() is one-based. ! --> <xsl:text>,</xsl:text> <xsl:value-of select="string-length ( substring-before ( $Hex, substring ( $HexData, (position() - 1) * 5 + 3, 1 ) ) ) * 16 + string-length ( substring-before ( $Hex, substring ( $HexData, (position() - 1) * 5 + 4, 1 ) ) ) * 1" /> </xsl:for-each> </xsl:template>
It probably will not help much to regurgitate yet another rewrite based on use of replace, as it blows the stack too. I confess I have a hard time picturing in my head what happens when the XSL transform runs, so I appear to be repeating the same mistakes over and over the code is respun in different ways.
If it was possible to read to this point without screaming in horror, or falling off your chair laughing, perhaps a comment on use or avoiding use of XSL in this scenario could help me raise a level in XSL understanding.
I have both XSLT 1.0 and 2.0 engines at my disposal. Apologies in advance if I have failed to identify a prior response to a similar problem in the archives or at the most helpful http://www.dpawson.co.uk/xsl site.
This message and/or attachments may include information subject to GD Corporate Policy 07-105 and is intended to be accessed only by authorized personnel of General Dynamics and approved service providers. Use, storage and transmission are governed by General Dynamics and its policies. Contractual restrictions apply to third parties. Recipients should refer to the policies or contract to determine proper handling. Unauthorized review, use, disclosure or distribution is prohibited. If you are not an intended recipient, please contact the sender and destroy all copies of the original message.