Re: [xsl] Tokenize followed by compare or satisfies using contains()?

Subject: Re: [xsl] Tokenize followed by compare or satisfies using contains()?
From: "Eliot Kimber ekimber@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 2 Sep 2016 21:38:00 -0000
In the specific DITA case you're searching for strings reliably bound by
blanks, so the contains is correct in this case.

Your statement "Intrinsically, tokenizing is more complex than just
searching for a substring." is I think what I was looking for--that
suggests that as a general policy that preferring contains() over tokenize
and sequence comparison will be the better choice if performance is the
only concern (and assuming that it actually produces a meaningful
performance difference, which it very well may not).

Cheers,

E.

--
Eliot Kimber
http://contrext.com
 






On 9/2/16, 2:02 PM, "Michael Kay mike@xxxxxxxxxxxx"
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

>Well, the tokenize() seems more correct: presumably if $link-classes is
>"green" and $node/@class is "pale-green" you want the answer to be false,
>which it will be with the tokenize() approach but not with the contains()
>approach.
>
>But shouldn't the regex for the tokenize case be '\s+' rather than ' '?
>
>Performance of course is product dependent and you just have to measure
>it. Intrinsically, tokenizing is more complex than just searching for a
>substring.
>
>Michael Kay
>Saxonica
>
>
>> On 2 Sep 2016, at 17:04, Eliot Kimber ekimber@xxxxxxxxxxxx
>><xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>> 
>> In the DITA processing code, where we are using XSLT 2 and checking for
>> string matches in attribute values, I have the requirement to see if any
>> of a number of strings might match.
>> 
>> The current code is:
>> 
>> some $c in $link-classes satisfies contains($node/@class, $c))
>> 
>> Where $link-classes is a sequence of strings and @class is a
>> blank-delimited sequence of strings.
>> 
>> 
>> Another way to do this check would be:
>> 
>> $link-classes = tokenize($node/@class, ' ')
>> 
>> This is a check that will be made a lot so performance may important (or
>> it may not be). The tokenize version seems simpler and clearer to me but
>> the satisfies approach has a certain elegance that I also like.
>> 
>> My question: is there any reason to prefer one or the other of these? I
>> realize that XSLT 3 provides a new way to do token matching in strings
>>but
>> for now we're stuff with XSLT 2.
>> 
>> Cheers,
>> 
>> Eliot
>> --
>> Eliot Kimber
>> http://contrext.com

Current Thread