Subject: RE: [xsl] XSLT Solution for hyphenation From: "Michael Kay" <mike@xxxxxxxxxxxx> Date: Fri, 22 Dec 2006 09:26:43 -0000 |
You seem to be doing exact matching on the words in your dictionary, not regular expression matching as your use of matches() would suggest. With exact matching you can use a key for the lookup which will be dramatically faster. Michael Kay http://www.saxonica.com/ > -----Original Message----- > From: Jeff Sese [mailto:jsese@xxxxxxxxxxxx] > Sent: 22 December 2006 06:10 > To: Xsl-List > Subject: [xsl] XSLT Solution for hyphenation > > Hi list, > > I have this project that applies hyphenation to an XML > document using a list of words as a reference. The list of > words can reach up to a million entries. > My XSLT solution was having a template that matches text() > nodes then insert hyphens to the matching words that are in > the list. However the transformation takes to long to finish > even for a relatively small file (around 1mb). Is there > anyway to speed this or is there a better solution? > > Here's my stylesheet: > > <xsl:template match="/"> > <xsl:apply-templates/> > </xsl:template> > <xsl:template match="@*|element()|comment()|processing-instruction()"> > <xsl:copy> > <xsl:apply-templates select="@*|node()"/> > </xsl:copy> > </xsl:template> > <xsl:template match="text()"> > <xsl:variable name="str" select="."/> > <xsl:variable name="searchStrs" as="xs:string*" > select="$search-words[matches($str,.)]/replace(.,'[.\\?*+{}()\ [\]\^\$|]', > '\\$0')"/> > <xsl:value-of > select="ati:replace-all($str,$searchStrs,$replaceStr)"/> > </xsl:template> > <xsl:function name="ati:replace-all"> > <xsl:param name="input" as="xs:string"/> > <xsl:param name="words-to-replace" as="xs:string*"/> > <xsl:sequence select="if (exists($words-to-replace)) then > ati:replace-all(replace($input, $words-to-replace[1], > key('replace',$words-to-replace[1],$search-words)),remove($wor ds-to-replace,1)) > else $input"/> > </xsl:function> > > heres a sample of the look-up table: > > <root> > <wordlist> > <entry> > <search>abaissassent</search> > <replace>abais­sassent</replace> > </entry> > <entry> > <search>abaisshrent</search> > <replace>abais­shrent</replace> > </entry> > <entry> > <search>abandonnent</search> > <replace>aban­donnent</replace> > </entry> > </wordlist> > </root> > > so if i have a "abaissassent" in a text() node this will be > replaced with "aban­donnent". > > -- > *Jeff*
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] XSLT Solution for hyphenation, Jeff Sese | Thread | Re: [xsl] XSLT Solution for hyphena, Jeff Sese |
Re: [xsl] Sort list by a combinatio, Manuel Strehl | Date | RE: [xsl] Positional grouping with , Michael Kay |
Month |