Subject: Re: [xsl] Re: backticks in regex - tales of the unexpected part II From: "Abel Braaksma (Exselt)" <abel@xxxxxxxxxx> Date: Mon, 07 Apr 2014 18:58:33 +0200 |
On 7-4-2014 18:35, Ihe Onwuka wrote: > Just going by the definition of the \w class in MK's XPath 2.0 > reference - \w -> a character considered to form part of a word Essentially, it is the Unicode standard that defines whether something is outside of \p{C}, \p{Z} or \p{P}. And I would find it rather strange is "accent grave" would _not_ be considered a possible part of a word, similarly to diaeresis, breve, cedilla etc. The counterpart, the acute accent, is categorized the same. But not apostrophe, which is often considered an acute accent, but really isn't. I understand the confusion: consider the math and currency symbols, from the same XSLT book you are quoting, it tells you that they are part of it as well. How is $, + or > a word character? I don't know. I guess the Unicode consortium just had to draw the line somewhere. > So it's TS if backtick isn't a word character in your vocabulary. > Probably neither the first or the last to get caught by that one. Not sure what TS means. But I'm sure you are not the last to get caught by that one. Personally, I hardly ever use \w because I find it very hard to understand what it does and does not match. The following is word? Tell`>me$45). I find it easiest to define the subranges myself, or use the \p{Category} syntax, which I find clearer. Cheers, Abel
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Re: backticks in regex - , Abel Braaksma (Exsel | Thread | [xsl] Re: [xquery-talk] backticks i, Michael Kay |
Re: [xsl] Re: backticks in regex - , David Carlisle | Date | Re: [xsl] Re: backticks in regex - , Ihe Onwuka |
Month |