Subject: Re: [xsl] Complex Regex takes 201 steps in regex buddy but runs forever in Analyze-String From: Wolfgang Laun <wolfgang.laun@xxxxxxxxx> Date: Mon, 31 Jan 2011 20:20:03 +0100 |
Yes, meanwhile I had changed the middle part to (B+[^B;B$]+B;\s*|B'[^B'B$]+B'\s*){0,255} so we agree :) -W On 31 January 2011 20:15, Alex Muir <alex.g.muir@xxxxxxxxx> wrote: > Okay this one seems to work based on your suggestion and a little > tweak to get it to surround all the LISTITEM's > > ((B$LISTITEM[^B$]+B$[^B$]+B$/LISTITEMB$)\s*(((B+[^B;B$]+B;\s*|B'[^B'B$]+B'\s* ){0,255})(B$LISTITEM[^B$]+B$[^B$]+B$/LISTITEMB$)){0,200}) > > Also I note that the input I posted there was working. I B was trying > to reduce the input text and then ended up using a project scenario > rather than a global scenario with the same name and after restarting > oxygen I guess I switched to using a different scenario running > different input than I wanted. > > Thanks much > > > On Mon, Jan 31, 2011 at 6:59 PM, Wolfgang Laun <wolfgang.laun@xxxxxxxxx> wrote: >> The parentheses '(' and ')' do not match well in <xsl:variable >> name="CompleteListIdentificationRegex" >. Please check. >> >> But one evil subpattern is this (with spaces inserted for readability): >> >> B ( ( B+[^B;B$]+B; | \s+ B | B B'[^B'B$]+B' B ){0,255}) >> >> This will try many combinations of zero to 255 repetitions of "any >> number > 0 of spaces" >> >> Cleaner is >> B B (\s+|( B+[^B;B$]+B;|B'[^B'B$]+B'){0,255}) >> >> -W >> >> On 31 January 2011 19:40, Alex Muir <alex.g.muir@xxxxxxxxx> wrote: >>> Hi, >>> >>> With the following code: >>> ------------------------------ >>> >>> <?xml version="1.0"?> >>> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >>> B xmlns:saxon="http://saxon.sf.net/" xmlns:xs="http://www.w3.org/2001/XMLSchema" >>> B version="2.0" B exclude-result-prefixes="#all"> >>> B <xsl:output method="xml" indent="no"/> >>> >>> >>> B <xsl:template match="unknown[exists(text())]"> >>> B B <xsl:copy> >>> B B B <xsl:copy-of select="@*"/> >>> >>> B B B <xsl:call-template name="CompleteListAnalyze"> >>> B B B B <xsl:with-param name="content" select="text()"/> >>> B B B </xsl:call-template> >>> >>> B B </xsl:copy> >>> B </xsl:template> >>> >>> >>> B <xsl:template name="CompleteListAnalyze"> >>> B B <xsl:param name="content"/> >>> >>> B B <xsl:variable name="CompleteListIdentificationRegex" > >>> B B B <xsl:text>((B$LISTITEM[^B$]+B$[^B$]+B$/LISTITEMB$)(((B+[^B;B$]+B;|\s+|B'[^B 'B$]+B'){0,255})(B$LISTITEM[^B$]+B$[^B$]+B$/LISTITEMB$)){0,200})</xsl:text> >>> B B </xsl:variable> >>> >>> B B <xsl:analyze-string select="$content" >>> regex="{$CompleteListIdentificationRegex}"> >>> B B B <xsl:matching-substring> >>> B B B B <xsl:text>B$COMPLETELIST POSITION="</xsl:text> >>> B B B B <xsl:value-of select="position()"/> >>> B B B B <xsl:text>" PLACEMENT=""B$</xsl:text> >>> B B B B <xsl:value-of select="regex-group(1)"/> >>> B B B B <xsl:text>B$b /COMPLETELISTB$</xsl:text> >>> B B B </xsl:matching-substring> >>> B B B <xsl:non-matching-substring> >>> B B B B <xsl:value-of select="."/> >>> B B B </xsl:non-matching-substring> >>> B B </xsl:analyze-string> >>> B </xsl:template> >>> >>> </xsl:stylesheet> >>> >>> >>> And the following input file: >>> ---------------------------------- >>> >>> <?xml version="1.0" encoding="UTF-8"?> >>> <doc> >>> B B <unknown>B$LISTITEM BULLET="15" TITLE="TEXT TEXT TEXT TEXT" >>> TYPE="SNLI"B$B+B'HLB'FONT size="2" id="H13211"B;15B+/B'HLB'FONTB;B+/B'HLB'TDB; >>> B B+B'HLB'TD id="H13213"B;B+/B'HLB'TDB; B+/B'HLB'TRB; B+B'HLB'TR id="H13215"B;B+B'HLB'TD >>> id="H13216"B; B+/B'HLB'TDB;B+/B'HLB'TRB; B+B'HLB'TR valign="bottom" id="H13218"B; >>> B B B B+B'HLB'TD id="H13220"B;B+/B'HLB'TDB; B B B B B+B'HLB'TD colspan="2" >>> id="H13222"B;B+B'HLB'FONT size="2" id="H13223"B;TEXT TEXT TEXT >>> TEXTB+/B'HLB'FONTB;B$/LISTITEMB$B+/TDB; B B B B B+TD id="H13225"B;B+/TDB; >>> B+TD id="H13227"B;B+/TDB; B B B B B+TD id="H13229"B;B+/TDB; B B B B B+TD >>> id="H13231"B;B+/TDB; B B B B B+TD align="right" id="H13233"B;B$LISTITEM >>> BULLET="16" TITLE="TEXT TEXT TEXT TEXT" TYPE="SNLI"B$B+B'HLB'FONT size="2" >>> id="H13234"B;16B+/B'HLB'FONTB;B+/B'HLB'TDB; B B B B B+B'HLB'TD >>> id="H13236"B;B+/B'HLB'TDB; B+/B'HLB'TRB; B+B'HLB'TR id="H13238"B;B+B'HLB'TD >>> id="H13239"B; B+/B'HLB'TDB;B+/B'HLB'TRB; B+B'HLB'TR valign="bottom" id="H13241"B; >>> B B B B+B'HLB'TD id="H13243"B;B+/B'HLB'TDB; B B B B B+B'HLB'TD colspan="2" >>> id="H13245"B;B+B'HLB'FONT size="2" id="H13246"B;TEXT TEXT TEXT TEXT TEXT >>> B+/B'HLB'FONTB;B$/LISTITEMB$B+/TDB; B B B B B+TD id="H13248"B;B+/TDB; B B B B B+TD >>> id="H13250"B;B+/TDB; B B B B B+TD id="H13252"B;B+/TDB; B B B B B+TD >>> id="H13254"B;B+/TDB; B B B B B+TD align="right" id="H13256"B;B$LISTITEM >>> BULLET="17" TITLE="TEXT TEXT TEXT TEXT" TYPE="SNLI"B$B+B'HLB'FONT size="2" >>> id="H13257"B;17B+/B'HLB'FONTB;B+/B'HLB'TDB; B B B B B+B'HLB'TD >>> id="H13259"B;B+/B'HLB'TDB; B+/B'HLB'TRB; B+B'HLB'TR id="H13261"B;B+B'HLB'TD >>> id="H13262"B; B+/B'HLB'TDB;B+/B'HLB'TRB; B+B'HLB'TR valign="bottom" id="H13264"B; >>> B B B B+B'HLB'TD id="H13266"B;B+/B'HLB'TDB; B B B B B+B'HLB'TD colspan="2" >>> id="H13268"B;B+B'HLB'FONT size="2" id="H13269"B;TEXT TEXT TEXT TEXT TEXT >>> B+/B'HLB'FONTB;B$/LISTITEMB$</unknown> >>> </doc> >>> >>> The regex held in the variable CompleteListIdentificationRegex runs >>> fine on the same input executing to completion in 201 steps. It >>> essentially just identifies all the content within the above <unknown> >>> element. >>> >>> However the equivalent Analyze-String running in oxygen 12.1 will >>> continue running and not stop on the same input. >>> >>> Any ideas? >>> >>> Been working on it for 4 hours without much progress other than >>> reducing the number of execution steps in regex buddy by 40. >>> >>> Thanks Much >>> >>> >>> -- >>> Alex >>> ----- >>> Currently: >>> Freelance Software Engineer 6+ yrs exp >>> >>> Previously: >>> https://sites.google.com/a/utg.edu.gm/alex/ >>> >>> >>> A Bafila, is two rivers flowing together as one: >>> http://www.facebook.com/pages/Bafila/125611807494851 >> >> > > > > -- > Alex > ----- > Currently: > Freelance Software Engineer 6+ yrs exp > > Previously: > https://sites.google.com/a/utg.edu.gm/alex/ > > > A Bafila, is two rivers flowing together as one: > http://www.facebook.com/pages/Bafila/125611807494851
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Complex Regex takes 201 s, Alex Muir | Thread | [no subject], Unknown |
Re: [xsl] Complex Regex takes 201 s, Alex Muir | Date | Re: [xsl] Q: Where will xpointer() , Hermann Stamm-Wilbra |
Month |