Subject: Re: [xsl] analyze-string help? From: Graydon <graydon@xxxxxxxxx> Date: Sun, 10 Jun 2012 12:20:40 -0400 |
On Sun, Jun 10, 2012 at 12:04:58PM -0400, Syd Bauman scripsit: > > I think maybe it worked because I had it at the end of the pattern > > and then later added additional characters. So I think I went from > > [A-Za-z0-9 -] to this [A-Za-z0-9 -,./] > > It was accidental? And here I thought it was a clever way to catch > gnarly characters. The hyphen in the 2nd regexp means "from space > (U+0020) to comma (U+002C)", i.e. expresses a range that matches the > same characters [ !"#$%&'()*+,] matches. Many of these characters are > a pain to type into an XSLT regexp, and thus a range like this seemed > like a nice way to catch them. Well, except that it's both subtle and clever, those banes of maintainability. One of the things I am very glad went into XSLT regular expressions are the Unicode character categories; if you want (for example), punctuation, it's "\p{P}", so I might write the provided atom definition as: [\p{L}\p{Nd}\p{P}] ("Unicode character category letters", "Unicode character category numbers, subcategory digits", "Unicode character category punctuation".) Upper-case P means "everything not", so you can neatly express things like "\P{Pd}", "any character that is not some kind of dash". In my ideal world the syntax would evolve so you could constrain the categories -- "\p{Pd except '-'}", "any character that is some kind of dash except for U+002D "hyphen-minus", for example -- since that would make this even more useful for functions that take regular expressions such as tokenize(). -- Graydon -- Graydon Saunders XML tools and processes for information delivery. graydon@xxxxxxxxx
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] analyze-string help?, Syd Bauman | Thread | Re: [xsl] analyze-string help?, Michael Kay |
Re: [xsl] analyze-string help?, Syd Bauman | Date | Re: [xsl] analyze-string help?, Dan Vint |
Month |