Re: [xsl] analyze-string gotcha/reminder

Subject: Re: [xsl] analyze-string gotcha/reminder
From: Dimitre Novatchev <dnovatchev@xxxxxxxxx>
Date: Mon, 19 Nov 2012 07:15:10 -0800
> It's a case where even in retrospect, it's hard to see how we could have
> avoided this problem in the language design. Perhaps two separate
> attributes, regex and regex-avt. But that feels very heavy-handed. Most
> languages have a few quirks like this where people just have to learn the
> hard way.


It would be helpful if an XSLT processor issues a warning message when
a single `{` and `}` are used in a regex -- this would immediately
explain to the user the issue and the correction to be made.

Cheers,
Dimitre

On Mon, Nov 19, 2012 at 1:12 AM, Michael Kay <mike@xxxxxxxxxxxx> wrote:
> I feel your pain. Many of us have lost a few hairs over this one. The good
> news is that you probably won't make the same mistake again, or if you do,
> you will spot it far more quickly.
>
> It's a case where even in retrospect, it's hard to see how we could have
> avoided this problem in the language design. Perhaps two separate
> attributes, regex and regex-avt. But that feels very heavy-handed. Most
> languages have a few quirks like this where people just have to learn the
> hard way.
>
> Michael Kay
> Saxonica
>
>
> On 18/11/2012 18:18, Ihe Onwuka wrote:
>>
>> Below is a multiple match meant to extract 4 digit numbers from text
>>
>>                  <xsl:analyze-string select="$line"
>> regex="(\D|^)(\d{4})(\D|$)">
>>                     <xsl:matching-substring>
>>                       <year><xsl:value-of
>> select="regex-group(2)"/></year>
>>                     </xsl:matching-substring>
>>                   </xsl:analyze-string
>>
>> It doesn't work. I tried exactly the same regex  in XQuery using replace
>>
>> xquery version "1.0";
>> replace('Accounting Items                                Dec.31,2005
>>   Dec.31,2006    Dec.31,2007
>> Dec.31,2008','(\D|^)\d{4}(\D|$)','xxxx')
>>
>> it worked and I got
>>
>> Accounting Items                                Dec.31xxxx
>> Dec.31xxxx   Dec.31xxxx   Dec.31xxxx
>>
>> I thought maybe there was special syntax for the multiple match case - but
>> no.
>> Eventually I turned to the specification and found this.
>>
>> Note:
>> Because the regex attribute is an attribute value template, curly
>> brackets within the regular expression must be doubled. For example,
>> to match a sequence of one to five characters, write regex=".{{1,5}}".
>> For regular expressions containing many curly brackets it may be more
>> convenient to use a notation such as
>> regex="{'[0-9]{1,5}[a-z]{3}[0-9]{1,2}'}", or to use a variable.
>>
>> So I had to double up my curly braces.
>>
>> There's an hour of my life that I won't get back.
>



-- 
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
To avoid situations in which you might make mistakes may be the
biggest mistake of all
------------------------------------
Quality means doing it right when no one is looking.
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play
-------------------------------------
Facts do not cease to exist because they are ignored.
-------------------------------------
Typing monkeys will write all Shakespeare's works in 200yrs.Will they
write all patents, too? :)
-------------------------------------
I finally figured out the only reason to be alive is to enjoy it.

Current Thread