Subject: [xsl] HST's answers Re: [xsl] Efficient way to check sequence membership - From: ht@xxxxxxxxxxxx (Henry S. Thompson) Date: Wed, 02 Mar 2011 22:00:36 +0000 |
I've thought of five ways to do this: 1) tokenise and use "some ...", as in the previous message; 2) Add '|' at the beginning of both $stopPat and the word to be checked, and use contains; 3) Put a sequence of elements with a 'w' attribute whose value is a stop in $stops, then do boolean($stops/*[@w=$w]); 4) As above, but then define an appropriate key and use boolean($stops/key('stop',$w)); 5) Build a regexp and use match: concat('^(',$stopPat,')$') For (1) and (2), I tried both having $stopPat as in the previous message, and a variant (1a, 2a) in which the list was sorted in descending order of frequency in English. Look away now if you want to guess what the order of performance is. . . . . . . . . . . . . . . . . . Version raw time time - baseline 0 5 4 7 2 2 8 3 2a 8 3 1a 14 9 1 15 10 3 28 23 5 30 25 where 0 is the baseline where the stop function does no actual work, and the time is average over 100 iterations, in milliseconds. I'm really interested if anyone has a better approach. Of course, I'm also interested to find out if other implementations show a similar pattern. I've put up a gzipped tar file [1] of all the files you need to reproduce the experiment -- one .xsl for each version, and q.xml for input. The stopss.xsl file is there so you can test that you are getting the right answer! Replace my:stop1 with your version in that file, and check that the output is 243367200142031010020120103000130001022001513610014414440 ht [1] http://www.ltg.ed.ac.uk/~ht/memberCheck.tar.gz -- Henry S. Thompson, School of Informatics, University of Edinburgh 10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 651-1426, e-mail: ht@xxxxxxxxxxxx URL: http://www.ltg.ed.ac.uk/~ht/ [mail from me _always_ has a .sig like this -- mail without it is forged spam]
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Efficient way to check se, Imsieke, Gerrit, le- | Thread | Re: [xsl] HST's answers Re: [xsl] E, Michael Kay |
Re: [xsl] Efficient way to check se, Imsieke, Gerrit, le- | Date | Re: [xsl] Efficient way to check se, David Carlisle |
Month |