Re: [xsl] Finding first difference between 2 text strings

Subject: Re: [xsl] Finding first difference between 2 text strings
From: Dimitre Novatchev <dnovatchev@xxxxxxxxx>
Date: Fri, 11 Sep 2009 14:04:04 -0700
Ooopsss...

This is DVC (Divide and Conquer) -- and not binary search. The
complexity is still linear!

The only advantage of this solution is that it will not cause stack
overflow for any practical uses (for example the maximum depth of the
call stack when comparing two 1MB strings is just 19).

The same could probably be achieved with a clever tail-recursive
solution, but not all processors support tail-recursion-optimization,
and those that do, don't have the same definition for
"tail-recursive".

DVC, on the other side, is not dependent on the implementation of any
XSLT processor.

About efficiency: I suspect that if the strings are converted to
codepoints first, then the similar solution for integer sequences
could potentially be faster -- would our dear implementors of XSLT
processors comment on this, please?

--
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play
-------------------------------------
I enjoy the massacre of ads. This sentence will slaughter ads without
a messy bloodbath.


On Fri, Sep 11, 2009 at 1:46 PM, Dimitre Novatchev <dnovatchev@xxxxxxxxx>
wrote:
> On Thu, Sep 10, 2009 at 7:10 AM, B <mlcook@xxxxxxxxxx> wrote:
>> The XSLT function "compare" will compare 2 strings and indicate which is
>> alphabetically first.
>
>
> Here is a binary search solution in XPath 2.0 :)
>
>
> The transformation below:
>
> <xsl:stylesheet version="2.0"
> B  B  B  B xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
> B  B  B  B xmlns:xs="http://www.w3.org/2001/XMLSchema";
> B  B  B  B xmlns:f="http://fxsl.sf.net";
> B  B  B  B >
> B  B  B  B <xsl:output method="text"/>
>
> B  B  B  B <xsl:template match="/">
> B  B  B  B  B  <xsl:sequence select="f:fstDiff('abcde', 'abceefg')"/>
> B  B  B  B </xsl:template>
>
> B <xsl:function name="f:fstDiff" as="xs:double">
> B  B <xsl:param name="pS1" as="xs:string"/>
> B  B <xsl:param name="pS2" as="xs:string"/>
>
> B  B <xsl:sequence select=
> B  B  "for $len in min((string-length($pS1), string-length($pS2))),
> B  B  B  B  B $comleteMatchResult in $len +1,
> B  B  B  B  B $leftResult in f:auxFstDiff(substring($pS1, 1, $len),
> B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B substring($pS2, 1,
$len)
> B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B )
> B  B  B  B return
> B  B  B  B  B  min(($leftResult, $comleteMatchResult))
> B  B  "/>
> B </xsl:function>
>
> B <xsl:function name="f:auxFstDiff" as="xs:double">
> B  B <xsl:param name="pS1" as="xs:string"/>
> B  B <xsl:param name="pS2" as="xs:string"/>
>
> B  B <xsl:sequence select=
> B  B  "for $len in string-length($pS1)
> B  B  B  return
> B  B  B  B  B if($len eq 1)
> B  B  B  B  B  B  then 1 + number($pS1 eq $pS2)
> B  B  B  B  B  B  else
> B  B  B  B  B  B  B  for $halfLen in $len idiv 2,
> B  B  B  B  B  B  B  B  B  $leftDiffPos in
f:auxFstDiff(substring($pS1,1,$halfLen),
> B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B 
B substring($pS2,1,$halfLen)
> B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B )
> B  B  B  B  B  B  B  B return
> B  B  B  B  B  B  B  B  B if($leftDiffPos le $halfLen)
> B  B  B  B  B  B  B  B  B  B then $leftDiffPos
> B  B  B  B  B  B  B  B  B  B else $leftDiffPos +
f:auxFstDiff(substring($pS1,$halfLen+1),
> B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B 
B  substring($pS2,$halfLen+1)
> B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B 
B  )
> B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B  B - 1
>
> B  B  "/>
> B </xsl:function>
> </xsl:stylesheet>
>
> when applied on any XML source document (not used), returns the correct
answer:
>
> 4
>
> Because this is theoretically O(log2(N)), expect this transformation
> to perform (hopefully) dramatically faster than the linear search,
> given long enough strings.
>
>
>
> --
> Cheers,
> Dimitre Novatchev
> ---------------------------------------
> Truly great madness cannot be achieved without significant intelligence.
> ---------------------------------------
> To invent, you need a good imagination and a pile of junk
> -------------------------------------
> Never fight an inanimate object
> -------------------------------------
> You've achieved success in your field when you don't know whether what
> you're doing is work or play
> -------------------------------------
> I enjoy the massacre of ads. This sentence will slaughter ads without
> a messy bloodbath.
>
>
>
>
>
>>
>> Is there any function, or simple transformation, that will tell me the
>> position of the first character where the two strings differ?
>>
>> "compare" might determine this, but it isn't telling. B I suppose strings
>> can be compared in various ways internally, so the position of the first
>> difference might not be a side-effect of "compare".
>>
>> Comparing strings can also be time consuming for long strings, so I'd
>> like an "inexpensive" solution, if there is one.
>>
>> Any suggestions for finding the first difference in 2 strings (besides a
>> linear march through the strings in my XSL transformation)?
>>
>> Thanks, Mike
>>
>> This email and any attachments are only for use by the intended
recipient(s) and may contain legally privileged, confidential, proprietary or
otherwise private information. B Any unauthorized use, reproduction,
dissemination, distribution or other disclosure of the contents of this e-mail
or its attachments is strictly prohibited. B If you have received this email
in error, please notify the sender immediately and delete the original.

Current Thread