[xsl] Re: Similarity metric in XSLT 2?

Subject: [xsl] Re: Similarity metric in XSLT 2?
From: Martin Holmes <mholmes@xxxxxxx>
Date: Fri, 30 Mar 2012 13:48:54 -0700
On 12-03-30 01:24 PM, Imsieke, Gerrit, le-tex wrote:
I can only affirm that I'd be interested in such a library, too.

The last time that I needed string similarity metrics (4 yrs ago), I
used Perl with XML::LibXML and String::Similarity.

If there were such a module / extension function for XPath / XSLT, I'd
probably used it more often. If you find a Java library that is easy to
interface with from Java-based XSLT processors, please let me know. I
think that Levenshtein or more advanced algorithms will be too slow when
implemented in XSLT, but may be readily available as an extension function.

I once implemented the Universal Similarity Metric (Normalized Compression Distance) in Pascal and Java:


<http://dh2010.cch.kcl.ac.uk/academic-programme/abstracts/papers/html/ab-693.html>

and found that it was surprisingly effective for short strings, as well as being very fast. I might look at figuring out how to call the Java library from Saxon. Implementing the metric was trivial.

Cheers,
Martin

Gerrit

On 2012-03-30 20:18, Martin Holmes wrote:
Hi all,

I'm faced with a situation in which I have to match an input string
against a set of possible candidates, and I need to find the match which
is most similar to it (I'm trying to identify correspondences between
two sets of files which have similar, but not identical, content).

Has anyone done anything like measuring string similarity in XSLT 2.0?
If so, how did you approach it?

All help appreciated,
Martin

Current Thread