Subject: Re: [xsl] analyze-string regex From: Michael Kay <mike@xxxxxxxxxxxx> Date: Fri, 28 Mar 2014 10:15:21 +0000 |
I reel with horror when I see complex regular expressions like this. Anything that relies on regex-group(9) or regex-group(11) is a nightmare. I've usually found it's possible to split the processing into a number of phases, and this is the only way I can preserve my sanity. However, another approach I have seen is to build the regular expression methodically, for example with a sequence of variables: <xsl:variable name="number">\d+</xsl:variable> <xsl:variable name="string">"[^"]*"</xsl:variable> <xsl:vairable name="number-or-string" select="{$number}|{$string}"/> or even with function calls <xsl:vairable name="number-or-string" select="regex:choice($number, $string)"/> Unfortunately neither of these approaches really helps much with getting the group numbers right, but it can make a very large regex much more comprehensible to the reader, and more likely to be bug-free. Michael Kay Saxonica On 28 Mar 2014, at 09:58, Tony Graham <tgraham@xxxxxxxxxx> wrote: > On Thu, March 27, 2014 7:19 pm, Liam R E Quin wrote: >> On Thu, 2014-03-27 at 17:06 +0000, Rushforth, Peter wrote: >> [...] >> >>> What I came up seems to work ok: > > The test for regex-group(9) is redundant since if regex-group(11) is not > an empty string, then regex-group(9) won't be an empty string: > > --- > <xsl:if test="regex-group(11)"><!-- if a bbox exists we've got an option --> > <xsl:element name="option"> > <xsl:if test="regex-group(9)"> > --- > >>> <xsl:function name="ex:locationJson2Options"> >>> <xsl:param name="json"/><!-- 1 2 >>> 3 4 5 6 >>> 7 8 9 10 11 12 >>> 13 14 15 16 >>> 17 --> >>> <xsl:variable name="regexps" >>> select="'(\{.*?("title":.*?"(.*?)").*?("qualifier&qu ot;:.*?"(.*?)").*?("type":.*?"(.*?)").*?((((&qu ot;bbox":.*?\[(.*?)\]).*?("geometry":.*?(\{.*?\})).*?\}{1,}))| (("geometry":.*?(\{.*?\})).*?\}{1,})))'"/> >>> <xsl:analyze-string select="$json" regex="{$regexps}" flags="s"> > ... >> I'd also note you use " a lot, so change them to " and use '....' >> and ' instead. You can also build up a complex expression by making > > Or put the regex as the content of the xsl:variable so you don't have to > worry about either '"' or "'". > > If you use include 'x' in the @flags value, you can add white-space for > readability (and more easily see where you've put in the redundant > parentheses) as in the example below. > > I also suggest making variables for the positions of the significant regex > groups and using those in regex-group() to make the code more readable. > If the positions are calculated relative to the previous groups, your code > is more resilient to changes in the regex (and for bunches of related > parentheses, e.g., rBBoxGeometry (below), I'd often add a variable, e.g., > $rBBoxGeometryLast, for the last parentheses in the bunch and set the next > variable relative to that to make it resilient to changes in the bunch). > >> smaller variables (with comments) and using concat() at the end. > > If you do it as content of xsl:variable, you can use xsl:value-of to refer > to other regex variables. > > Regards, > > > Tony Graham tgraham@xxxxxxxxxx > Consultant http://www.mentea.net > Chair, Print and Page Layout Community Group @ W3C XML Guild member > -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- > Mentea XML, XSL-FO and XSLT consulting, training and programming > > > <xsl:variable name="regexps" as="xs:string"> > ( <!-- 1 --> > \{\s* > ( <!-- rTitle --> > "title":\s*" > (.*?) > " > ) > \s* > ( <!-- rQualifier --> > "qualifier":\s*" > (.*?) <!-- rQualifierData --> > " > ) > \s* > ( <!-- rType --> > "type":\s*" > (.*?) > " > ) > \s* > ( <!-- rBBoxGeometry --> > ( > ( > ( <!-- rBBox --> > "bbox":\s*\[ > (.*?) <!-- rBBoxData --> > \] > ) > \s* > ( > "geometry":\s* > ( > \{.*?\} > ) > ) > \s*\}{1,} > ) > ) > | > ( <!-- rGeometry --> > ( > "geometry":\s* > ( > \{.*?\} > ) > ) > \s*\}{1,} > ) > ) > ) > </xsl:variable> > > <xsl:variable name="rTitle" select="2" as="xs:integer" /> > <xsl:variable name="rQualifier" select="$rTitle + 2" as="xs:integer" /> > <xsl:variable name="rQualifierData" select="$rQualifier + 1" > as="xs:integer" /> > <xsl:variable name="rType" select="$rQualifierData + 1" as="xs:integer" /> > <xsl:variable name="rBBoxGeometry" select="$rType + 2" as="xs:integer" /> > <xsl:variable name="rBBox" select="$rBBoxGeometry + 3" as="xs:integer" /> > <xsl:variable name="rBBoxData" select="$rBBox + 1" as="xs:integer" /> > > <xsl:function name="ex:locationJson2Options"> > <xsl:param name="json"/> > > <xsl:analyze-string select="$json" regex="{$regexps}" flags="sx"> > <xsl:matching-substring> > <xsl:if test="regex-group($rBBox)"> > <!-- if a bbox exists we've got an option --> > <xsl:element name="option"> > <xsl:if test="regex-group($rBBoxGeometry)"> > <xsl:attribute name="data-bbox" > select="translate(regex-group($rBBoxData), > '
|
|
', > '')"/> > </xsl:if> > <xsl:value-of select="regex-group($rQualifierData)"/> > </xsl:element> > </xsl:if> > </xsl:matching-substring> > </xsl:analyze-string> > </xsl:function>
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] analyze-string regex, Tony Graham | Thread | Re: [xsl] analyze-string regex, John Lumley |
Re: [xsl] analyze-string regex, Tony Graham | Date | Re: [xsl] analyze-string regex, John Lumley |
Month |