Subject: [xsl] Efficient way to check sequence membership From: ht@xxxxxxxxxxxx (Henry S. Thompson) Date: Wed, 02 Mar 2011 21:23:53 +0000 |
A common requirement for me is to check if a particular string is a member of a set of other strings. I often need this in an inner loop, doing it hundreds if not thousands of times. So, what's the most efficient way to do this? Consider the example (a real one) of checking to see if a word is what the IR people call a 'stop' word -- a short common word of little substance. Here's a function which checks this in a straightforward way: <xsl:variable name="stopPat">it|its|itself|they|them|their|what|which|who|whom|this|that|these|those|am|is|are|was|were|be|been|being|have|has|had|having|do|does|did|doing|a|an|the|and|but|if|or|because|as|until|while|of|at|by|for|with|about|against|between|into|through|during|before|after|above|below|to|from|up|down|in|out|on|off|over|under|again|further|then|once|here|there|when|where|why|how|all|any|both|each|few|more|most|other|some|such|no|nor|not|only|own|same|so|than|too|very|s|t|can|will|just|don|should|now</xsl:variable> <xsl:variable name="stops" select="tokenize($stopPat,'\|')"/> <xsl:function name="my:stop1" as="xs:boolean"> <xsl:param name="w" as="xs:string"/> <xsl:sequence select="some $s in $stops satisfies ($s eq $w)"/> </xsl:function> For obvious reasons this seems unlikely to be very efficient. I've come up with 4 other ways to do this, and the best of them is over 6 times faster than that one, using a realistic set of test data. The worst is more than twice as _slow_ as that one. That's a factor of 12 between best and worst! Timing was done using saxon91 on a Windows box, using -repeat:100, subtracting a baseline function which always returns true w/o checking anything. I called the stop function 289 times. On 95 occasions the argument was a stop word (drawn from only 23 word types). There are 104 stop words in $stopPat. I'll follow this message with another containing the details: skip it if you don't care, or have a think about what _you_ think is the best answer before reading. ht -- Henry S. Thompson, School of Informatics, University of Edinburgh 10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 651-1426, e-mail: ht@xxxxxxxxxxxx URL: http://www.ltg.ed.ac.uk/~ht/ [mail from me _always_ has a .sig like this -- mail without it is forged spam]
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] [job] Web Developer, 1 year, , James Cummings | Thread | Re: [xsl] Efficient way to check se, Dimitre Novatchev |
Re: [xsl] Saxon 9.3 warnings on mul, Michael Kay | Date | Re: [xsl] Efficient way to check se, Dimitre Novatchev |
Month |