RE: [xsl] XSLT Solution for hyphenation

Subject: RE: [xsl] XSLT Solution for hyphenation
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Fri, 22 Dec 2006 09:26:43 -0000
You seem to be doing exact matching on the words in your dictionary, not
regular expression matching as your use of matches() would suggest. With
exact matching you can use a key for the lookup which will be dramatically
faster.

Michael Kay
http://www.saxonica.com/

> -----Original Message-----
> From: Jeff Sese [mailto:jsese@xxxxxxxxxxxx]
> Sent: 22 December 2006 06:10
> To: Xsl-List
> Subject: [xsl] XSLT Solution for hyphenation
>
> Hi list,
>
> I have this project that applies hyphenation to an XML
> document using a list of words as a reference. The list of
> words can reach up to a million entries.
> My XSLT solution was having a template that matches text()
> nodes then insert hyphens to the matching words that are in
> the list. However the transformation takes to long to finish
> even for a relatively small file (around 1mb). Is there
> anyway to speed this or is there a better solution?
>
> Here's my stylesheet:
>
> <xsl:template match="/">
>     <xsl:apply-templates/>
> </xsl:template>
> <xsl:template match="@*|element()|comment()|processing-instruction()">
>     <xsl:copy>
>         <xsl:apply-templates select="@*|node()"/>
>     </xsl:copy>
> </xsl:template>
> <xsl:template match="text()">
>     <xsl:variable name="str" select="."/>
>     <xsl:variable name="searchStrs" as="xs:string*"
> select="$search-words[matches($str,.)]/replace(.,'[.\\?*+{}()\
[\]\^\$&#x007C;]',
> '\\$0')"/>
>     <xsl:value-of
> select="ati:replace-all($str,$searchStrs,$replaceStr)"/>
> </xsl:template>
> <xsl:function name="ati:replace-all">
>     <xsl:param name="input" as="xs:string"/>
>     <xsl:param name="words-to-replace" as="xs:string*"/>
>     <xsl:sequence select="if (exists($words-to-replace)) then
> ati:replace-all(replace($input, $words-to-replace[1],
> key('replace',$words-to-replace[1],$search-words)),remove($wor
ds-to-replace,1))
> else $input"/>
> </xsl:function>
>
> heres a sample of the look-up table:
>
> <root>
>     <wordlist>
>         <entry>
>             <search>abaissassent</search>
>             <replace>abais&#x00AD;sassent</replace>
>         </entry>
>         <entry>
>             <search>abaisshrent</search>
>             <replace>abais&#x00AD;shrent</replace>
>         </entry>
>         <entry>
>             <search>abandonnent</search>
>             <replace>aban&#x00AD;donnent</replace>
>         </entry>
>     </wordlist>
> </root>
>
> so if i have a "abaissassent" in a text() node this will be
> replaced with "aban&#x00AD;donnent".
>
> --
> *Jeff*

Current Thread