Re: [xsl] Similarity metric in XSLT 2?

Subject: Re: [xsl] Similarity metric in XSLT 2?
From: Markus Flatscher <markus.flatscher@xxxxxxxxx>
Date: Sat, 31 Mar 2012 13:27:22 -0400
Imsieke,
all,

Commons Lang [1] has an implementation of Levenshtein [2], and it
seems like calling it from XSLT with Saxon-PE works nicely.

Secondstring (http://secondstring.sourceforge.net/) is another Java
library that implements many, many more approximate string matching
algorithms, and is part of Simile Vicino
(http://code.google.com/p/simile-vicino/), which in turn is part of
Google's Freebase/Gridworks code base
(https://github.com/lbjay/gridworks).

I haven't had any luck calling any Secondstring or Vicino methods
using Saxon yet. I'd love to hear from anyone who has.

[1] http://commons.apache.org/lang/
[2]
http://commons.apache.org/lang/api-release/src-html/org/apache/commons/lang3/
StringUtils.html#line.6061

On Fri, Mar 30, 2012 at 4:24 PM, Imsieke, Gerrit, le-tex
<gerrit.imsieke@xxxxxxxxx> wrote:
> I can only affirm that I'd be interested in such a library, too.
>
> The last time that I needed string similarity metrics (4 yrs ago), I used
> Perl with XML::LibXML and String::Similarity.
>
> If there were such a module / extension function for XPath / XSLT, I'd
> probably used it more often. If you find a Java library that is easy to
> interface with from Java-based XSLT processors, please let me know. I think
> that Levenshtein or more advanced algorithms will be too slow when
> implemented in XSLT, but may be readily available as an extension function.
>
> Gerrit
>
>
> On 2012-03-30 20:18, Martin Holmes wrote:
>>
>> Hi all,
>>
>> I'm faced with a situation in which I have to match an input string
>> against a set of possible candidates, and I need to find the match which
>> is most similar to it (I'm trying to identify correspondences between
>> two sets of files which have similar, but not identical, content).
>>
>> Has anyone done anything like measuring string similarity in XSLT 2.0?
>> If so, how did you approach it?
>>
>> All help appreciated,
>> Martin
>>
>
> --
> Gerrit Imsieke
> Geschdftsf|hrer / Managing Director
> le-tex publishing services GmbH
> Weissenfelser Str. 84, 04229 Leipzig, Germany
> Phone +49 341 355356 110, Fax +49 341 355356 510
> gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de
>
> Registergericht / Commercial Register: Amtsgericht Leipzig
> Registernummer / Registration Number: HRB 24930
>
> Geschdftsf|hrer: Gerrit Imsieke, Svea Jelonek,
> Thomas Schmidt, Dr. Reinhard Vvckler

Current Thread