Subject: Re: [xsl] Finding first difference between 2 text strings From: Hermann Stamm-Wilbrandt <STAMMW@xxxxxxxxxx> Date: Mon, 14 Sep 2009 20:21:32 +0200 |
> ... I don't understand all > the details of the function, but that's one advantage of reusable code! ... Inserting debugging statements in David's stylesheet helps: a: abcdefghijklmnopqrstuvwxyz b: abcdefghijklmnopqrstuvw1y0 aa: abcdefghijklmnopqrstuvwxyz bb: abcdefghijklmnopqrstuvw1y0 r: ^:(a(b(c(d(e(f(g(h(i(j(k(l(m(n(o(p(q(r(s(t(u(v(w(1(y (0)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?.* Probably it is a good idea to understand/verify code one wants to rely on ...?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?)?. Mit besten Gruessen / Best wishes, Hermann Stamm-Wilbrandt Developer, XML Compiler WebSphere DataPower SOA Appliances ---------------------------------------------------------------------- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter Geschaeftsfuehrung: Erich Baier Sitz der Gesellschaft: Boeblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 <mlcook@xxxxxxxxx m> To 09/14/2009 07:57 <xsl-list@xxxxxxxxxxxxxxxxxxxxxx> PM cc Subject Please respond to Re: [xsl] Finding first difference xsl-list@xxxxxxxx between 2 text strings lberrytech.com What a clever/impressive/compact solution! David's solution is the one I decided to use because it avoids potential problems with stack overflow during recursion. I don't understand all the details of the function, but that's one advantage of reusable code! With our data, the regexp processing didn't seem to be stressed too much since I got reasonable results for strings up to 1000 characters in length. Our text strings also contain '(', ')', and '?', so they had to be added to the list of special characters to be processed. I suppose the use of the ')' in the function could be replaced by a character not occurring in the text data. Since we're also processing just ASCII text, and not Unicode, I replaced the hex codes in the translation with just a space for each special character. The ordering of special characters doesn't matter (to me), so a blank seemed to work fine. The hex codes also seemed to throw-off the resulting position of the mismatch, although I didn't investigate thoroughly. My changes to the function amount to the following (with similar changes for $b): <xsl:variable name="single-quote">'</xsl:variable> <xsl:param name="a" as="xs:string" /> <xsl:variable name="aa-pattern" select="concat('.,+*\{}[]()?', $single-quote)" /> <xsl:variable name="aa" select="translate($a, $aa-pattern, ' ')"/> Say, invoke the function as: <xsl:variable name="pos1" select=" f:mismatch2 ($a, $b)" /> I also went ahead and reversed the strings so that I could find the last character in the string difference, and then extract the whole section that was different: <xsl:variable name="rev-a" select="codepoints-to-string(reverse(string-to-codepoints($a)))" /> <xsl:variable name="rev-b" select="codepoints-to-string(reverse(string-to-codepoints($b)))" /> <xsl:variable name="pos2" select=" f:mismatch2 ($rev-a, $rev-b)" /> Then output this string: substring($a, $pos1, string-length($a) - $pos2 - $pos1 + 2) or this string, depending on which sub-section is desired for the user (and, actually, I output both for a "from"/"to" comparison): substring($b, $pos1, string-length($b) - $pos2 - $pos1 + 2) Processing time was not excessive, and I got some useful output from our data. Thanks again to David and the others who supplied working solutions! -- Mike Cook > An alternative definition, that appears to give the same results is: > > <xsl:function name="f:mismatch2" as="xs:integer?"> > <xsl:param name="a" as="xs:string" /> > <xsl:param name="b" as="xs:string" /> > <xsl:variable name="aa" > select="translate($a,'.+*\{}[]',' ; > 007;')"/> > <xsl:variable name="bb" > select="translate($b,'.+*\{}[]',' ; > 007;')"/> > <xsl:variable name="r" select="concat('^:',replace($bb,'.','($0'),replace($bb,'.',')?'),'.*')"/ > > <xsl:sequence select="1+string-length(replace(concat(':',$aa),$r,'$1'))"/> > </xsl:function> > > If $b is long, this might stretch the capabilities of the regexp engine > though.... > > David This email and any attachments are only for use by the intended recipient (s) and may contain legally privileged, confidential, proprietary or otherwise private information. Any unauthorized use, reproduction, dissemination, distribution or other disclosure of the contents of this e-mail or its attachments is strictly prohibited. If you have received this email in error, please notify the sender immediately and delete the original.
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Finding first difference , mlcook | Thread | Re: [xsl] Finding first difference , David Carlisle |
Re: [xsl] Finding first difference , mlcook | Date | Re: [xsl] Finding first difference , David Carlisle |
Month |