Subject: Re: [xsl] why matches($title,'.*?(\.|,)\s*$')) can perform so much worse than matches($title,'(\.|,)\s*$')) From: Alex Muir <alex.g.muir@xxxxxxxxx> Date: Wed, 13 Jul 2011 12:55:16 +0000 |
Thanks Oliver, that makes a lot of sense. Regards On Wed, Jul 13, 2011 at 12:33 PM, Oliver Hallam <oliver@xxxxxxxxxxx> wrote: > The way that fn:matches usually works is that it attempts to apply the > regular expression starting from each character in the string. > > So, looking at the regular expression ".*?Expr" where Expr is another > regular expression: > > .*? is a reluctant match and so we try to match the shortest expression > possible first. > > Starting from the first character: > Try match the empty string for .*? and then try to mach Expr starting from > the first character. > Try to match a single character for .*? and then try to match Expr starting > from the second character. > Try to match two characters for .*? and then try to match Expr starting > from the third character. > ... > > Thus in trying to find a match for ".*?Expr" starting at a particular > character we try to match Expr starting at every other caharcter in the > string, until a match is found. > > fn:matches attempts to find a match for the regular expression starting from > each character in the string. If a match isnt found starting at the first > character then it will try again starting at the second and so on. So if > there are no matches then Expr is tested n^2 times where n is the number of > characters in the string, wheras matches applied to Expr would just test it > n times. > > Now, if your query was (note the '^'): > > matches($title,'^.*?(\.|,)\s*$') > > then I would expect perfomance to be very similar to: > > matches($title,'(\.|,)\s*$') > > > It would be perfectly valid (and sensible) for a query processor to realise > that the two expressions you gave were equivalent and so not perform n^2 > tests, but I am unaware of a processor that makes these kinds of > optimizations to regular expressions. > > > Oliver Hallam > XQSharp > > > On 12/07/2011 17:26, Alex Muir wrote: >> >> Hi, >> >> I'm wondering why matches($title,'.*?(\.|,)\s*$')) can perform so much >> worse than matches($title,'(\.|,)\s*$')) >> >> I found at least in one file out of thousands that I process the first >> one can take a good 30 minutes to complete and the second is quick. >> >> No doubt the .*? causes some problem but what exactly is the problem >> that is causes? >> >> Regards >> >> -- >> Alex Muir >> Instructor | Program Organizer - University Technology Student Work >> Experience Building >> University of the Gambia >> http://sites.utg.edu.gm/alex/ >> >> Low budget software development benefiting development in the Gambia, >> West Africa >> Experience of a lifetime, come to Gambia and Join UTSWEB - >> http://sites.utg.edu.gm/utsweb/ > > -- Alex Muir Instructor | Program Organizer - University Technology Student Work Experience Building University of the Gambia http://sites.utg.edu.gm/alex/ Low budget software development benefiting development in the Gambia, West Africa Experience of a lifetime, come to Gambia and Join UTSWEB - http://sites.utg.edu.gm/utsweb/
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] why matches($title,'.*?(\, Oliver Hallam | Thread | Re: [xsl] why matches($title,'.*?(\, Michael Kay |
Re: [xsl] why matches($title,'.*?(\, Oliver Hallam | Date | Re: [xsl] why matches($title,'.*?(\, Michael Kay |
Month |