[xsl] Efficient dictionary lookup

Subject: [xsl] Efficient dictionary lookup
From: Martin Holmes <mholmes@xxxxxxx>
Date: Thu, 22 Mar 2012 14:39:01 -0700
HI all,

As part of a small pilot project, I'm implementing a set of spelling normalization rules applied through XSLT 2.0 using Saxon 9. One operation that happens extremely frequently is a dictionary lookup; basically I'm checking a word form to see if it appears in a spell-checker dictionary.

The dictionary currently consists of a whitespace-separated text string (although it could be formatted any way I choose), and I've been using fn:matches() and fn:contains() to check whether or not the form appears in the dictionary:

<xsl:function name="f:wordExists" as="xs:boolean">
<xsl:param name="inString" as="xs:string"/>
<xsl:value-of select="contains($dictModern, concat(' ', lower-case($inString), ' '))"/>
</xsl:function>


<xsl:function name="f:wordExists" as="xs:boolean">
<xsl:param name="inString" as="xs:string"/>
<xsl:value-of select="matches($dictModern, concat('\s', $inString), '\s', 'i')"/>
</xsl:function>


Both options appear to be very costly in terms of time, and I'm wondering what the most efficient way to do this might be. Is there a faster way to do text lookups like this?

Ultimately I guess I'll implement this as an external Java process, but for the moment I'm working with XSLT, and I'd like to get some speed improvement if I can.

All help appreciated,
Martin

Current Thread