Subject: Re: [xsl] Finding first difference between 2 text strings From: <mlcook@xxxxxxxxxx> Date: Mon, 14 Sep 2009 13:57:26 -0400 |
What a clever/impressive/compact solution! David's solution is the one I decided to use because it avoids potential problems with stack overflow during recursion. I don't understand all the details of the function, but that's one advantage of reusable code! With our data, the regexp processing didn't seem to be stressed too much since I got reasonable results for strings up to 1000 characters in length. Our text strings also contain '(', ')', and '?', so they had to be added to the list of special characters to be processed. I suppose the use of the ')' in the function could be replaced by a character not occurring in the text data. Since we're also processing just ASCII text, and not Unicode, I replaced the hex codes in the translation with just a space for each special character. The ordering of special characters doesn't matter (to me), so a blank seemed to work fine. The hex codes also seemed to throw-off the resulting position of the mismatch, although I didn't investigate thoroughly. My changes to the function amount to the following (with similar changes for $b): <xsl:variable name="single-quote">'</xsl:variable> <xsl:param name="a" as="xs:string" /> <xsl:variable name="aa-pattern" select="concat('.,+*\{}[]()?', $single-quote)" /> <xsl:variable name="aa" select="translate($a, $aa-pattern, ' ')"/> Say, invoke the function as: <xsl:variable name="pos1" select=" f:mismatch2 ($a, $b)" /> I also went ahead and reversed the strings so that I could find the last character in the string difference, and then extract the whole section that was different: <xsl:variable name="rev-a" select="codepoints-to-string(reverse(string-to-codepoints($a)))" /> <xsl:variable name="rev-b" select="codepoints-to-string(reverse(string-to-codepoints($b)))" /> <xsl:variable name="pos2" select=" f:mismatch2 ($rev-a, $rev-b)" /> Then output this string: substring($a, $pos1, string-length($a) - $pos2 - $pos1 + 2) or this string, depending on which sub-section is desired for the user (and, actually, I output both for a "from"/"to" comparison): substring($b, $pos1, string-length($b) - $pos2 - $pos1 + 2) Processing time was not excessive, and I got some useful output from our data. Thanks again to David and the others who supplied working solutions! -- Mike Cook > An alternative definition, that appears to give the same results is: > > <xsl:function name="f:mismatch2" as="xs:integer?"> > <xsl:param name="a" as="xs:string" /> > <xsl:param name="b" as="xs:string" /> > <xsl:variable name="aa" > select="translate($a,'.+*\{}[]',' ; > 007;')"/> > <xsl:variable name="bb" > select="translate($b,'.+*\{}[]',' ; > 007;')"/> > <xsl:variable name="r" select="concat('^:',replace($bb,'.','($0'),replace($bb,'.',')?'),'.*')"/ > > <xsl:sequence select="1+string-length(replace(concat(':',$aa),$r,'$1'))"/> > </xsl:function> > > If $b is long, this might stretch the capabilities of the regexp engine > though.... > > David This email and any attachments are only for use by the intended recipient(s) and may contain legally privileged, confidential, proprietary or otherwise private information. Any unauthorized use, reproduction, dissemination, distribution or other disclosure of the contents of this e-mail or its attachments is strictly prohibited. If you have received this email in error, please notify the sender immediately and delete the original.
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] Finding first difference , Hermann Stamm-Wilbra | Thread | Re: [xsl] Finding first difference , Hermann Stamm-Wilbra |
[xsl] Problem installing libxsltmod, Jaap van Arragon | Date | Re: [xsl] Finding first difference , Hermann Stamm-Wilbra |
Month |