Re: [xsl] How to escape the normal interpretation of parentheses when utilizing regex-group()?

Subject: Re: [xsl] How to escape the normal interpretation of parentheses when utilizing regex-group()?
From: "David Carlisle d.p.carlisle@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 24 May 2023 11:49:08 -0000
On Wed, 24 May 2023 at 12:30, Martin Honnen martin.honnen@xxxxxx <
xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

>
> On 5/24/2023 1:21 PM, David Carlisle d.p.carlisle@xxxxxxxxx wrote:
>
>
>
> On Wed, 24 May 2023 at 12:00, Roger L Costello costello@xxxxxxxxx <
> xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
>> Hi Folks,
>>
>> My input consist of lines of text. Here is a sample input:
>>
>> 1 Record Type
>>
>> The XSLT program is to transform that line into this XML:
>>
>> <column>1</column>
>> <field-name>Record Type</field-name>
>>
>> Here is another sample input:
>>
>> 2 thru 4 Customer/Area Code
>>
>> The XSLT program is to transform that line into this XML:
>>
>> <column>2 thru 4</column>
>> <field-name>Customer/Area Code</field-name>
>>
>> I figured that <xsl:analyze-string and regex-group() would be suitable
>> for breaking apart each line.
>>
>> What regex to use for column? The value of column is an integer followed
>> optionally by "thru" and another integer. I figured this regex should do
>> the job:
>>
>> ([0-9]+(\s+thru\s+[0-9]+)?)
>>
>> But, but, but, ....
>>
>> regex-group(1) means that whole regex. So with this input:
>>
>> 2 thru 4 Customer/Area Code
>>
>> regex-group(1) matches:
>>
>> 2 thru 4
>>
>> Perfect.
>>
>> Unfortunately, regex-group(2) matches the inner, optional part. So I end
>> up with this:
>>
>> <column>2 thru 4</column>
>> <field-name> thru 4</field-name>
>>
>> Eek!
>>
>> Wrong.
>>
>> How to solve this problem? Is there a way to specify that the inner,
>> optional part:
>>
>> (\s+thru\s+[0-9]+)?
>>
>> belongs only to regex-group(1), not to regex-group(2)?
>>
>> Stated another way, is there a way to indicate that the parentheses in
>> the inner, optional part are not to be considered as regex-group() syntax?
>>
>> Stated still another way, is there a way to "escape" the normal
>> interpretation of parentheses when utilizing regex-group()?
>>
>
>  You could use non capturing group but no real need here, you haven't
> shown any regex matching  the second column, something like
>
> ^\s*([0-9]+(\s+thru\s+[0-9]+)?)\s*(.*)$
>
> then your columns are groups 1 and 3
>
>
> Are they?
>

yes:-)


<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
version="3.0">

 <xsl:template name="m">
  <xsl:variable name="x">
   1 aaa
   2 thru 4 bbb
  </xsl:variable>
  <xsl:analyze-string select="$x"
regex="^\s*([0-9]+(\s+thru\s+[0-9]+)?)\s*(.*)$" flags="m">
   <xsl:matching-substring>
    <xsl:text>&#10;</xsl:text>
    <col1><xsl:value-of select="regex-group(1)"/></col1>
    <col2><xsl:value-of select="regex-group(3)"/></col2>
    <xsl:text>&#10;</xsl:text>
   </xsl:matching-substring>
  </xsl:analyze-string>
 </xsl:template>
</xsl:stylesheet>




$ saxon9 -it:m rg.xsl
<?xml version="1.0" encoding="UTF-8"?>
<col1>1</col1><col2>aaa</col2>

<col1>2 thru 4</col1><col2>bbb</col2>



> I think you would need
>
>   <field-name>{(regex-group(3), regex-group(2))[1]}</field-name>
>
> as no regex-group is created if no match occurs.
> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
> EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/2739265> (by
> email <>)

Current Thread