Re: [xsl] Complex Regex takes 201 steps in regex buddy but runs forever in Analyze-String

Subject: Re: [xsl] Complex Regex takes 201 steps in regex buddy but runs forever in Analyze-String
From: Alex Muir <alex.g.muir@xxxxxxxxx>
Date: Mon, 31 Jan 2011 19:15:56 +0000
Okay this one seems to work based on your suggestion and a little
tweak to get it to surround all the LISTITEM's

((B$LISTITEM[^B$]+B$[^B$]+B$/LISTITEMB$)\s*(((B+[^B;B$]+B;\s*|B'[^B'B$]+B'\s*
){0,255})(B$LISTITEM[^B$]+B$[^B$]+B$/LISTITEMB$)){0,200})

Also I note that the input I posted there was working. I  was trying
to reduce the input text and then ended up using a project scenario
rather than a global scenario with the same name and after restarting
oxygen I guess I switched to using a different scenario running
different input than I wanted.

Thanks much


On Mon, Jan 31, 2011 at 6:59 PM, Wolfgang Laun <wolfgang.laun@xxxxxxxxx>
wrote:
> The parentheses '(' and ')' do not match well in <xsl:variable
> name="CompleteListIdentificationRegex" >. Please check.
>
> But one evil subpattern is this (with spaces inserted for readability):
>
> B  ( ( B+[^B;B$]+B; | \s+ B | B B'[^B'B$]+B' B ){0,255})
>
> This will try many combinations of zero to 255 repetitions of "any
> number > 0 of spaces"
>
> Cleaner is
> B  B (\s+|( B+[^B;B$]+B;|B'[^B'B$]+B'){0,255})
>
> -W
>
> On 31 January 2011 19:40, Alex Muir <alex.g.muir@xxxxxxxxx> wrote:
>> Hi,
>>
>> With the following code:
>> ------------------------------
>>
>> <?xml version="1.0"?>
>> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
>> B xmlns:saxon="http://saxon.sf.net/";
xmlns:xs="http://www.w3.org/2001/XMLSchema";
>> B version="2.0" B exclude-result-prefixes="#all">
>> B <xsl:output method="xml" indent="no"/>
>>
>>
>> B <xsl:template match="unknown[exists(text())]">
>> B  B <xsl:copy>
>> B  B  B <xsl:copy-of select="@*"/>
>>
>> B  B  B <xsl:call-template name="CompleteListAnalyze">
>> B  B  B  B <xsl:with-param name="content" select="text()"/>
>> B  B  B </xsl:call-template>
>>
>> B  B </xsl:copy>
>> B </xsl:template>
>>
>>
>> B <xsl:template name="CompleteListAnalyze">
>> B  B <xsl:param name="content"/>
>>
>> B  B <xsl:variable name="CompleteListIdentificationRegex" >
>> B  B 
B <xsl:text>((B$LISTITEM[^B$]+B$[^B$]+B$/LISTITEMB$)(((B+[^B;B$]+B;|\s+|B'[^B
'B$]+B'){0,255})(B$LISTITEM[^B$]+B$[^B$]+B$/LISTITEMB$)){0,200})</xsl:text>
>> B  B </xsl:variable>
>>
>> B  B <xsl:analyze-string select="$content"
>> regex="{$CompleteListIdentificationRegex}">
>> B  B  B <xsl:matching-substring>
>> B  B  B  B <xsl:text>B$COMPLETELIST POSITION="</xsl:text>
>> B  B  B  B <xsl:value-of select="position()"/>
>> B  B  B  B <xsl:text>" PLACEMENT=""B$</xsl:text>
>> B  B  B  B <xsl:value-of select="regex-group(1)"/>
>> B  B  B  B <xsl:text>B$b
/COMPLETELISTB$</xsl:text>
>> B  B  B </xsl:matching-substring>
>> B  B  B <xsl:non-matching-substring>
>> B  B  B  B <xsl:value-of select="."/>
>> B  B  B </xsl:non-matching-substring>
>> B  B </xsl:analyze-string>
>> B </xsl:template>
>>
>> </xsl:stylesheet>
>>
>>
>> And the following input file:
>> ----------------------------------
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>> <doc>
>> B  B <unknown>B$LISTITEM BULLET="15" TITLE="TEXT TEXT TEXT TEXT"
>> TYPE="SNLI"B$B+B'HLB'FONT size="2"
id="H13211"B;15B+/B'HLB'FONTB;B+/B'HLB'TDB;
>> B  B+B'HLB'TD id="H13213"B;B+/B'HLB'TDB; B+/B'HLB'TRB; B+B'HLB'TR
id="H13215"B;B+B'HLB'TD
>> id="H13216"B; B+/B'HLB'TDB;B+/B'HLB'TRB; B+B'HLB'TR valign="bottom"
id="H13218"B;
>> B  B  B B+B'HLB'TD id="H13220"B;B+/B'HLB'TDB; B  B  B  B  B+B'HLB'TD
colspan="2"
>> id="H13222"B;B+B'HLB'FONT size="2" id="H13223"B;TEXT TEXT TEXT
>> TEXTB+/B'HLB'FONTB;B$/LISTITEMB$B+/TDB; B  B  B  B  B+TD
id="H13225"B;B+/TDB;
>> B+TD id="H13227"B;B+/TDB; B  B  B  B  B+TD id="H13229"B;B+/TDB; B  B  B  B 
B+TD
>> id="H13231"B;B+/TDB; B  B  B  B  B+TD align="right"
id="H13233"B;B$LISTITEM
>> BULLET="16" TITLE="TEXT TEXT TEXT TEXT" TYPE="SNLI"B$B+B'HLB'FONT size="2"
>> id="H13234"B;16B+/B'HLB'FONTB;B+/B'HLB'TDB; B  B  B  B  B+B'HLB'TD
>> id="H13236"B;B+/B'HLB'TDB; B+/B'HLB'TRB; B+B'HLB'TR
id="H13238"B;B+B'HLB'TD
>> id="H13239"B; B+/B'HLB'TDB;B+/B'HLB'TRB; B+B'HLB'TR valign="bottom"
id="H13241"B;
>> B  B  B B+B'HLB'TD id="H13243"B;B+/B'HLB'TDB; B  B  B  B  B+B'HLB'TD
colspan="2"
>> id="H13245"B;B+B'HLB'FONT size="2" id="H13246"B;TEXT TEXT TEXT TEXT TEXT
>> B+/B'HLB'FONTB;B$/LISTITEMB$B+/TDB; B  B  B  B  B+TD id="H13248"B;B+/TDB;
B  B  B  B  B+TD
>> id="H13250"B;B+/TDB; B  B  B  B  B+TD id="H13252"B;B+/TDB; B  B  B  B 
B+TD
>> id="H13254"B;B+/TDB; B  B  B  B  B+TD align="right"
id="H13256"B;B$LISTITEM
>> BULLET="17" TITLE="TEXT TEXT TEXT TEXT" TYPE="SNLI"B$B+B'HLB'FONT size="2"
>> id="H13257"B;17B+/B'HLB'FONTB;B+/B'HLB'TDB; B  B  B  B  B+B'HLB'TD
>> id="H13259"B;B+/B'HLB'TDB; B+/B'HLB'TRB; B+B'HLB'TR
id="H13261"B;B+B'HLB'TD
>> id="H13262"B; B+/B'HLB'TDB;B+/B'HLB'TRB; B+B'HLB'TR valign="bottom"
id="H13264"B;
>> B  B  B B+B'HLB'TD id="H13266"B;B+/B'HLB'TDB; B  B  B  B  B+B'HLB'TD
colspan="2"
>> id="H13268"B;B+B'HLB'FONT size="2" id="H13269"B;TEXT TEXT TEXT TEXT TEXT
>> B+/B'HLB'FONTB;B$/LISTITEMB$</unknown>
>> </doc>
>>
>> The regex held in the variable CompleteListIdentificationRegex runs
>> fine on the same input executing to completion in 201 steps. It
>> essentially just identifies all the content within the above <unknown>
>> element.
>>
>> However the equivalent Analyze-String running in oxygen 12.1 will
>> continue running and not stop on the same input.
>>
>> Any ideas?
>>
>> Been working on it for 4 hours without much progress other than
>> reducing the number of execution steps in regex buddy by 40.
>>
>> Thanks Much
>>
>>
>> --
>> Alex
>> -----
>> Currently:
>> Freelance Software Engineer 6+ yrs exp
>>
>> Previously:
>> https://sites.google.com/a/utg.edu.gm/alex/
>>
>>
>> A Bafila, is two rivers flowing together as one:
>> http://www.facebook.com/pages/Bafila/125611807494851
>
>



--
Alex
-----
Currently:
Freelance Software Engineer 6+ yrs exp

Previously:
https://sites.google.com/a/utg.edu.gm/alex/


A Bafila, is two rivers flowing together as one:
http://www.facebook.com/pages/Bafila/125611807494851

Current Thread