Subject: [xsl] Complex Regex takes 201 steps in regex buddy but runs forever in Analyze-String From: Alex Muir <alex.g.muir@xxxxxxxxx> Date: Mon, 31 Jan 2011 18:40:18 +0000 |
Hi, With the following code: ------------------------------ <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:saxon="http://saxon.sf.net/" xmlns:xs="http://www.w3.org/2001/XMLSchema" version="2.0" exclude-result-prefixes="#all"> <xsl:output method="xml" indent="no"/> <xsl:template match="unknown[exists(text())]"> <xsl:copy> <xsl:copy-of select="@*"/> <xsl:call-template name="CompleteListAnalyze"> <xsl:with-param name="content" select="text()"/> </xsl:call-template> </xsl:copy> </xsl:template> <xsl:template name="CompleteListAnalyze"> <xsl:param name="content"/> <xsl:variable name="CompleteListIdentificationRegex" > <xsl:text>((B$LISTITEM[^B$]+B$[^B$]+B$/LISTITEMB$)(((B+[^B;B$]+B;|\s+|B '[^B'B$]+B'){0,255})(B$LISTITEM[^B$]+B$[^B$]+B$/LISTITEMB$)){0,200})</xsl:tex t> </xsl:variable> <xsl:analyze-string select="$content" regex="{$CompleteListIdentificationRegex}"> <xsl:matching-substring> <xsl:text>B$COMPLETELIST POSITION="</xsl:text> <xsl:value-of select="position()"/> <xsl:text>" PLACEMENT=""B$</xsl:text> <xsl:value-of select="regex-group(1)"/> <xsl:text>B$b /COMPLETELISTB$</xsl:text> </xsl:matching-substring> <xsl:non-matching-substring> <xsl:value-of select="."/> </xsl:non-matching-substring> </xsl:analyze-string> </xsl:template> </xsl:stylesheet> And the following input file: ---------------------------------- <?xml version="1.0" encoding="UTF-8"?> <doc> <unknown>B$LISTITEM BULLET="15" TITLE="TEXT TEXT TEXT TEXT" TYPE="SNLI"B$B+B'HLB'FONT size="2" id="H13211"B;15B+/B'HLB'FONTB;B+/B'HLB'TDB; B+B'HLB'TD id="H13213"B;B+/B'HLB'TDB; B+/B'HLB'TRB; B+B'HLB'TR id="H13215"B;B+B'HLB'TD id="H13216"B; B+/B'HLB'TDB;B+/B'HLB'TRB; B+B'HLB'TR valign="bottom" id="H13218"B; B+B'HLB'TD id="H13220"B;B+/B'HLB'TDB; B+B'HLB'TD colspan="2" id="H13222"B;B+B'HLB'FONT size="2" id="H13223"B;TEXT TEXT TEXT TEXTB+/B'HLB'FONTB;B$/LISTITEMB$B+/TDB; B+TD id="H13225"B;B+/TDB; B+TD id="H13227"B;B+/TDB; B+TD id="H13229"B;B+/TDB; B+TD id="H13231"B;B+/TDB; B+TD align="right" id="H13233"B;B$LISTITEM BULLET="16" TITLE="TEXT TEXT TEXT TEXT" TYPE="SNLI"B$B+B'HLB'FONT size="2" id="H13234"B;16B+/B'HLB'FONTB;B+/B'HLB'TDB; B+B'HLB'TD id="H13236"B;B+/B'HLB'TDB; B+/B'HLB'TRB; B+B'HLB'TR id="H13238"B;B+B'HLB'TD id="H13239"B; B+/B'HLB'TDB;B+/B'HLB'TRB; B+B'HLB'TR valign="bottom" id="H13241"B; B+B'HLB'TD id="H13243"B;B+/B'HLB'TDB; B+B'HLB'TD colspan="2" id="H13245"B;B+B'HLB'FONT size="2" id="H13246"B;TEXT TEXT TEXT TEXT TEXT B+/B'HLB'FONTB;B$/LISTITEMB$B+/TDB; B+TD id="H13248"B;B+/TDB; B+TD id="H13250"B;B+/TDB; B+TD id="H13252"B;B+/TDB; B+TD id="H13254"B;B+/TDB; B+TD align="right" id="H13256"B;B$LISTITEM BULLET="17" TITLE="TEXT TEXT TEXT TEXT" TYPE="SNLI"B$B+B'HLB'FONT size="2" id="H13257"B;17B+/B'HLB'FONTB;B+/B'HLB'TDB; B+B'HLB'TD id="H13259"B;B+/B'HLB'TDB; B+/B'HLB'TRB; B+B'HLB'TR id="H13261"B;B+B'HLB'TD id="H13262"B; B+/B'HLB'TDB;B+/B'HLB'TRB; B+B'HLB'TR valign="bottom" id="H13264"B; B+B'HLB'TD id="H13266"B;B+/B'HLB'TDB; B+B'HLB'TD colspan="2" id="H13268"B;B+B'HLB'FONT size="2" id="H13269"B;TEXT TEXT TEXT TEXT TEXT B+/B'HLB'FONTB;B$/LISTITEMB$</unknown> </doc> The regex held in the variable CompleteListIdentificationRegex runs fine on the same input executing to completion in 201 steps. It essentially just identifies all the content within the above <unknown> element. However the equivalent Analyze-String running in oxygen 12.1 will continue running and not stop on the same input. Any ideas? Been working on it for 4 hours without much progress other than reducing the number of execution steps in regex buddy by 40. Thanks Much -- Alex ----- Currently: Freelance Software Engineer 6+ yrs exp Previously: https://sites.google.com/a/utg.edu.gm/alex/ A Bafila, is two rivers flowing together as one: http://www.facebook.com/pages/Bafila/125611807494851
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] EMF, Rolf Schumacher | Thread | Re: [xsl] Complex Regex takes 201 s, Wolfgang Laun |
Re: [xsl] Grouping lists from flat , Martin Honnen | Date | Re: [xsl] Complex Regex takes 201 s, Wolfgang Laun |
Month |