Re: [xsl] tokenize a string with escaped spaces

Subject: Re: [xsl] tokenize a string with escaped spaces
From: "Michael Kay mike@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sat, 4 Apr 2020 08:24:02 -0000
Double-quotes in an XML attribute value should be written as `&quot;`. Also
remember that this is an attribute value template, so curly braces need
special treatment.

I often write such things as

<xsl:variable name="regex">\S*('[^']*')?("[^"]*")?</xsl:variable>
<xsl:analyze-string regex="{$regex}">....

which reduces these problem (fortunately & and < aren't metacharacters in
regular expressions).

Michael Kay
Saxonica

> On 4 Apr 2020, at 02:17, Mark Giffin m1879@xxxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Thanks Michael. The double quotes " in the regex give errors in this
context:
>
> <xsl:analyze-string select="$attr" regex="\S*('[^']*')?("[^"]*")?">
>
> Should those be single quotes instead? Or should I put the regex in a
variable?
>
> On 4/3/2020 4:38 PM, Michael Kay mike@xxxxxxxxxxxx
<mailto:mike@xxxxxxxxxxxx> wrote:
>> Try using xsl:analyze-string with a regex of
>>
>> \S*('[^']*')?("[^"]*")?
>>
>> I've had to guess at your specification from your single example, but you
should be able to adapt it if the spec is different.
>>
>> You could also extend the regex to pick up the keyword (before '=') and
value (after '=') as captured substrings:
>>
>> (\S+)=(\S+|('[^']*')|("[^"]*"))
>>
>> and then regex-group(1) gives you the keyword, and regex-group(2) the
value.
>>
>> Michael Kay
>> Saxonica
>>
>>> On 4 Apr 2020, at 00:17, Mark Giffin m1879@xxxxxxxxxxxxx
<mailto:m1879@xxxxxxxxxxxxx> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx
<mailto:xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>> wrote:
>>>
>>> I am tokenizing an XML attribute that has info I need in it. Example:
>>>
>>> myattr="ng-model=mymodel ng-show-mymodel=='Radio button 1'"
>>>
>>> So I want to tokenize into these two values:
>>>
>>> ng-model=mymodel
>>> ng-show='Radio button 1'
>>>
>>> Using white space like tokenize($attr, '\s') gives me this, not what I
want:
>>>
>>> ng-model=mymodel
>>> ng-show='Radio
>>> button
>>> 1'
>>>
>>> Do you have a suggestion on how to do this? Doesn't have to use
tokenize().
>>>
>>> Thanks,
>>> Mark
>>>
>>
>> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
>> EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/805141> (by
email <applewebdata://6CD9FE65-1099-427D-AE7C-76090E62A6E7>)
>
> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
> EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/293509> (by
email <>)

Current Thread