|
Subject: [xsl] HST's answers Re: [xsl] Efficient way to check sequence membership - From: ht@xxxxxxxxxxxx (Henry S. Thompson) Date: Wed, 02 Mar 2011 22:00:36 +0000 |
I've thought of five ways to do this:
1) tokenise and use "some ...", as in the previous message;
2) Add '|' at the beginning of both $stopPat and the word to be
checked, and use contains;
3) Put a sequence of elements with a 'w' attribute whose value is a stop
in $stops, then do boolean($stops/*[@w=$w]);
4) As above, but then define an appropriate key and use
boolean($stops/key('stop',$w));
5) Build a regexp and use match:
concat('^(',$stopPat,')$')
For (1) and (2), I tried both having $stopPat as in the previous
message, and a variant (1a, 2a) in which the list was sorted in
descending order of frequency in English.
Look away now if you want to guess what the order of performance
is. . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Version raw time time - baseline
0 5
4 7 2
2 8 3
2a 8 3
1a 14 9
1 15 10
3 28 23
5 30 25
where 0 is the baseline where the stop function does no actual work,
and the time is average over 100 iterations, in milliseconds.
I'm really interested if anyone has a better approach. Of course, I'm
also interested to find out if other implementations show a similar
pattern.
I've put up a gzipped tar file [1] of all the files you need to
reproduce the experiment -- one .xsl for each version, and q.xml for
input.
The stopss.xsl file is there so you can test that you are getting the
right answer! Replace my:stop1 with your version in that file, and
check that the output is
243367200142031010020120103000130001022001513610014414440
ht
[1] http://www.ltg.ed.ac.uk/~ht/memberCheck.tar.gz
--
Henry S. Thompson, School of Informatics, University of Edinburgh
10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
Fax: (44) 131 651-1426, e-mail: ht@xxxxxxxxxxxx
URL: http://www.ltg.ed.ac.uk/~ht/
[mail from me _always_ has a .sig like this -- mail without it is forged spam]
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| Re: [xsl] Efficient way to check se, Imsieke, Gerrit, le- | Thread | Re: [xsl] HST's answers Re: [xsl] E, Michael Kay |
| Re: [xsl] Efficient way to check se, Imsieke, Gerrit, le- | Date | Re: [xsl] Efficient way to check se, David Carlisle |
| Month |