Re: [xsl] Finding first difference between 2 text strings

Subject: Re: [xsl] Finding first difference between 2 text strings
From: Dimitre Novatchev <dnovatchev@xxxxxxxxx>
Date: Fri, 11 Sep 2009 13:46:02 -0700
On Thu, Sep 10, 2009 at 7:10 AM,  <mlcook@xxxxxxxxxx> wrote:
> The XSLT function "compare" will compare 2 strings and indicate which is
> alphabetically first.


Here is a binary search solution in XPath 2.0 :)


The transformation below:

<xsl:stylesheet version="2.0"
	xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
	xmlns:xs="http://www.w3.org/2001/XMLSchema";
	xmlns:f="http://fxsl.sf.net";
	>
	<xsl:output method="text"/>

	<xsl:template match="/">
	   <xsl:sequence select="f:fstDiff('abcde', 'abceefg')"/>
	</xsl:template>

  <xsl:function name="f:fstDiff" as="xs:double">
    <xsl:param name="pS1" as="xs:string"/>
    <xsl:param name="pS2" as="xs:string"/>

    <xsl:sequence select=
     "for $len in min((string-length($pS1), string-length($pS2))),
          $comleteMatchResult in $len +1,
          $leftResult in f:auxFstDiff(substring($pS1, 1, $len),
                                      substring($pS2, 1, $len)
                                      )
        return
           min(($leftResult, $comleteMatchResult))
     "/>
  </xsl:function>

  <xsl:function name="f:auxFstDiff" as="xs:double">
    <xsl:param name="pS1" as="xs:string"/>
    <xsl:param name="pS2" as="xs:string"/>

    <xsl:sequence select=
     "for $len in string-length($pS1)
       return
          if($len eq 1)
             then 1 + number($pS1 eq $pS2)
             else
               for $halfLen in $len idiv 2,
                   $leftDiffPos in f:auxFstDiff(substring($pS1,1,$halfLen),
                                                substring($pS2,1,$halfLen)
                                                )
                return
                  if($leftDiffPos le $halfLen)
                    then $leftDiffPos
                    else $leftDiffPos +
f:auxFstDiff(substring($pS1,$halfLen+1),
                                                     substring($pS2,$halfLen+
1)
                                                     )
                                      - 1

     "/>
  </xsl:function>
</xsl:stylesheet>

when applied on any XML source document (not used), returns the correct
answer:

4

Because this is theoretically O(log2(N)), expect this transformation
to perform (hopefully) dramatically faster than the linear search,
given long enough strings.



--
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play
-------------------------------------
I enjoy the massacre of ads. This sentence will slaughter ads without
a messy bloodbath.





>
> Is there any function, or simple transformation, that will tell me the
> position of the first character where the two strings differ?
>
> "compare" might determine this, but it isn't telling. B I suppose strings
> can be compared in various ways internally, so the position of the first
> difference might not be a side-effect of "compare".
>
> Comparing strings can also be time consuming for long strings, so I'd
> like an "inexpensive" solution, if there is one.
>
> Any suggestions for finding the first difference in 2 strings (besides a
> linear march through the strings in my XSL transformation)?
>
> Thanks, Mike
>
> This email and any attachments are only for use by the intended recipient(s)
and may contain legally privileged, confidential, proprietary or otherwise
private information. B Any unauthorized use, reproduction, dissemination,
distribution or other disclosure of the contents of this e-mail or its
attachments is strictly prohibited. B If you have received this email in
error, please notify the sender immediately and delete the original.

Current Thread