Re: [xsl] Implementation Advice: Grouping Strings by Character Range in XSLT 2

Subject: Re: [xsl] Implementation Advice: Grouping Strings by Character Range in XSLT 2
From: "Eliot Kimber ekimber@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 29 Apr 2016 18:38:05 -0000
I have my generated analyze-text approach working generally. However, some
of my regular expressions are not matching when I would expect them to.

For example, given this @regex value:

        
regex="'([&#xa9;&#xae;&#x2120;&#x2122;]+)|([&#xa6;&#xb2;&#xb3;&#xb9;&#xbc;&
#xbd;&#xbe;&#xd0;&#xd7;&#xdd;&#xde;&#xf0;&#xfd;&#xfe;&#x160;&#x161;&#x2202;
&#x220f;&#x2211;&#x2212;&#x222b;&#x2260;&#x2264;&#x2265;]+)|([&#x27a4;]+)'"
>

And this text:

"&#x00A9;&#x00AE;"

The regular expression does not match, even though the first group clearly
matches on \uA9 and \uAE.


However, this text:

"&#x00DD;&#x00DE;" 

does match (second group).

If I copy the entire regex or any group from the @regex value and try it
in Oxygen against the same text I get the expected matches.

Have I made a stupid syntax mistake in my regular expression? Is there
some subtlety to matching groups that makes XSLT different from what
Oxygen is doing? I can't see any obvious syntax error in the regular
expression.

Thanks,

Eliot


----
Eliot Kimber, Owner
Contrext, LLC
http://contrext.com




On 4/29/16, 11:54 AM, "Eliot Kimber ekimber@xxxxxxxxxxxx"
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

>Dimitre,
>
>I see how that can work.
>
>Cheers,
>
>E.
>----
>Eliot Kimber, Owner
>Contrext, LLC
>http://contrext.com
>
>
>
>
>On 4/29/16, 11:38 AM, "Dimitre Novatchev dnovatchev@xxxxxxxxx"
><xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
>>I am at work and don't have the time for a complete/tested
>>implementation, but one can use the function string-to-codepoints()
>>and then perform on the result:
>>
>><xsl:for-each-group select="$theCodepoints"
>>group-adjacent=f:codepointToRange(.)>
>>
>> . . . . . . . .
>></xsl:for-each-group>
>>
>>Cheers,
>>Dimitre
>>
>>On Fri, Apr 29, 2016 at 8:04 AM, Eliot Kimber ekimber@xxxxxxxxxxxx
>><xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>> Using XSLT 2, I have a requirement to take text and group contiguous
>>> sequences of characters in markup according to a given character range
>>>the
>>> characters are in. This is to support the application of range-specific
>>> fonts to text in HTML.
>>>
>>> I have a static definition of the character ranges for a given national
>>> language and there shouldn't be any overlap between ranges. Given this
>>> static definition, I'm generating XSLT code to operate on text nodes in
>>> order to apply the range markup. The
>>>
>>> For example, given the text string "abcdefg" where range "R1" is "cde"
>>>and
>>> R2 is "g", the marked up result should be: abc<span
>>> class="R1">cde</span>f<span class="R2">g</span>
>>>
>>> My initial approach is to generate a template that takes the current
>>> language and the text node and then applies templates in a
>>> language-specific mode.
>>>
>>> For each language I'm then generating a template to do the range
>>>matching.
>>>
>>> My question, once I'm in a language-specific template for a text node,
>>> what is the most efficient and/or easiest to code way to map the string
>>>to
>>> ranges? Since I'm generating the code it doesn't have to be concise.
>>>
>>> I'm thinking along the lines of using analyze-string to match on any of
>>> the groups and then within the matching-substring clause have a choice
>>> group to determine which range actually matched. But it feels like I'm
>>> missing a more elegant way to determine the actual range.
>>>
>>> Or maybe there's a clearer/simpler/more efficient way using tail
>>>recursion?
>>>
>>> Thanks,
>>>
>>> Eliot
>>> ----
>>> Eliot Kimber, Owner
>>> Contrext, LLC
>>> http://contrext.com
>>>
>>> 
>>
>>
>>
>>-- 
>>Cheers,
>>Dimitre Novatchev
>>---------------------------------------
>>Truly great madness cannot be achieved without significant intelligence.
>>---------------------------------------
>>To invent, you need a good imagination and a pile of junk
>>-------------------------------------
>>Never fight an inanimate object
>>-------------------------------------
>>To avoid situations in which you might make mistakes may be the
>>biggest mistake of all
>>------------------------------------
>>Quality means doing it right when no one is looking.
>>-------------------------------------
>>You've achieved success in your field when you don't know whether what
>>you're doing is work or play
>>-------------------------------------
>>To achieve the impossible dream, try going to sleep.
>>-------------------------------------
>>Facts do not cease to exist because they are ignored.
>>-------------------------------------
>>Typing monkeys will write all Shakespeare's works in 200yrs.Will they
>>write all patents, too? :)
>>-------------------------------------
>>Sanity is madness put to good use.
>>-------------------------------------
>>I finally figured out the only reason to be alive is to enjoy it.

Current Thread