Subject: Re: [xsl] Similarity metric in XSLT 2? From: Markus Flatscher <markus.flatscher@xxxxxxxxx> Date: Sat, 31 Mar 2012 13:27:22 -0400 |
Imsieke, all, Commons Lang [1] has an implementation of Levenshtein [2], and it seems like calling it from XSLT with Saxon-PE works nicely. Secondstring (http://secondstring.sourceforge.net/) is another Java library that implements many, many more approximate string matching algorithms, and is part of Simile Vicino (http://code.google.com/p/simile-vicino/), which in turn is part of Google's Freebase/Gridworks code base (https://github.com/lbjay/gridworks). I haven't had any luck calling any Secondstring or Vicino methods using Saxon yet. I'd love to hear from anyone who has. [1] http://commons.apache.org/lang/ [2] http://commons.apache.org/lang/api-release/src-html/org/apache/commons/lang3/ StringUtils.html#line.6061 On Fri, Mar 30, 2012 at 4:24 PM, Imsieke, Gerrit, le-tex <gerrit.imsieke@xxxxxxxxx> wrote: > I can only affirm that I'd be interested in such a library, too. > > The last time that I needed string similarity metrics (4 yrs ago), I used > Perl with XML::LibXML and String::Similarity. > > If there were such a module / extension function for XPath / XSLT, I'd > probably used it more often. If you find a Java library that is easy to > interface with from Java-based XSLT processors, please let me know. I think > that Levenshtein or more advanced algorithms will be too slow when > implemented in XSLT, but may be readily available as an extension function. > > Gerrit > > > On 2012-03-30 20:18, Martin Holmes wrote: >> >> Hi all, >> >> I'm faced with a situation in which I have to match an input string >> against a set of possible candidates, and I need to find the match which >> is most similar to it (I'm trying to identify correspondences between >> two sets of files which have similar, but not identical, content). >> >> Has anyone done anything like measuring string similarity in XSLT 2.0? >> If so, how did you approach it? >> >> All help appreciated, >> Martin >> > > -- > Gerrit Imsieke > Geschdftsf|hrer / Managing Director > le-tex publishing services GmbH > Weissenfelser Str. 84, 04229 Leipzig, Germany > Phone +49 341 355356 110, Fax +49 341 355356 510 > gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de > > Registergericht / Commercial Register: Amtsgericht Leipzig > Registernummer / Registration Number: HRB 24930 > > Geschdftsf|hrer: Gerrit Imsieke, Svea Jelonek, > Thomas Schmidt, Dr. Reinhard Vvckler
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] Re: Similarity metric in XSLT, Martin Holmes | Thread | [xsl] is it possible to resize an i, David Ryan |
Re: [xsl] is it possible to resize , David Ryan | Date | |
Month |