Subject: Re: [xsl] analyze-string gotcha/reminder From: Dimitre Novatchev <dnovatchev@xxxxxxxxx> Date: Mon, 19 Nov 2012 07:15:10 -0800 |
> It's a case where even in retrospect, it's hard to see how we could have > avoided this problem in the language design. Perhaps two separate > attributes, regex and regex-avt. But that feels very heavy-handed. Most > languages have a few quirks like this where people just have to learn the > hard way. It would be helpful if an XSLT processor issues a warning message when a single `{` and `}` are used in a regex -- this would immediately explain to the user the issue and the correction to be made. Cheers, Dimitre On Mon, Nov 19, 2012 at 1:12 AM, Michael Kay <mike@xxxxxxxxxxxx> wrote: > I feel your pain. Many of us have lost a few hairs over this one. The good > news is that you probably won't make the same mistake again, or if you do, > you will spot it far more quickly. > > It's a case where even in retrospect, it's hard to see how we could have > avoided this problem in the language design. Perhaps two separate > attributes, regex and regex-avt. But that feels very heavy-handed. Most > languages have a few quirks like this where people just have to learn the > hard way. > > Michael Kay > Saxonica > > > On 18/11/2012 18:18, Ihe Onwuka wrote: >> >> Below is a multiple match meant to extract 4 digit numbers from text >> >> <xsl:analyze-string select="$line" >> regex="(\D|^)(\d{4})(\D|$)"> >> <xsl:matching-substring> >> <year><xsl:value-of >> select="regex-group(2)"/></year> >> </xsl:matching-substring> >> </xsl:analyze-string >> >> It doesn't work. I tried exactly the same regex in XQuery using replace >> >> xquery version "1.0"; >> replace('Accounting Items Dec.31,2005 >> Dec.31,2006 Dec.31,2007 >> Dec.31,2008','(\D|^)\d{4}(\D|$)','xxxx') >> >> it worked and I got >> >> Accounting Items Dec.31xxxx >> Dec.31xxxx Dec.31xxxx Dec.31xxxx >> >> I thought maybe there was special syntax for the multiple match case - but >> no. >> Eventually I turned to the specification and found this. >> >> Note: >> Because the regex attribute is an attribute value template, curly >> brackets within the regular expression must be doubled. For example, >> to match a sequence of one to five characters, write regex=".{{1,5}}". >> For regular expressions containing many curly brackets it may be more >> convenient to use a notation such as >> regex="{'[0-9]{1,5}[a-z]{3}[0-9]{1,2}'}", or to use a variable. >> >> So I had to double up my curly braces. >> >> There's an hour of my life that I won't get back. > -- Cheers, Dimitre Novatchev --------------------------------------- Truly great madness cannot be achieved without significant intelligence. --------------------------------------- To invent, you need a good imagination and a pile of junk ------------------------------------- Never fight an inanimate object ------------------------------------- To avoid situations in which you might make mistakes may be the biggest mistake of all ------------------------------------ Quality means doing it right when no one is looking. ------------------------------------- You've achieved success in your field when you don't know whether what you're doing is work or play ------------------------------------- Facts do not cease to exist because they are ignored. ------------------------------------- Typing monkeys will write all Shakespeare's works in 200yrs.Will they write all patents, too? :) ------------------------------------- I finally figured out the only reason to be alive is to enjoy it.
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] analyze-string gotcha/rem, Andrew Welch | Thread | Re: [xsl] analyze-string gotcha/rem, Ihe Onwuka |
Re: [xsl] analyze-string gotcha/rem, Andrew Welch | Date | Re: [xsl] Line feed in xalan, disap, Chris Wolf |
Month |