|
Subject: Re: [xsl] analyze-string regex From: Michael Kay <mike@xxxxxxxxxxxx> Date: Fri, 28 Mar 2014 10:15:21 +0000 |
I reel with horror when I see complex regular expressions like this. Anything
that relies on regex-group(9) or regex-group(11) is a nightmare.
I've usually found it's possible to split the processing into a number of
phases, and this is the only way I can preserve my sanity.
However, another approach I have seen is to build the regular expression
methodically, for example with a sequence of variables:
<xsl:variable name="number">\d+</xsl:variable>
<xsl:variable name="string">"[^"]*"</xsl:variable>
<xsl:vairable name="number-or-string" select="{$number}|{$string}"/>
or even with function calls
<xsl:vairable name="number-or-string" select="regex:choice($number,
$string)"/>
Unfortunately neither of these approaches really helps much with getting the
group numbers right, but it can make a very large regex much more
comprehensible to the reader, and more likely to be bug-free.
Michael Kay
Saxonica
On 28 Mar 2014, at 09:58, Tony Graham <tgraham@xxxxxxxxxx> wrote:
> On Thu, March 27, 2014 7:19 pm, Liam R E Quin wrote:
>> On Thu, 2014-03-27 at 17:06 +0000, Rushforth, Peter wrote:
>> [...]
>>
>>> What I came up seems to work ok:
>
> The test for regex-group(9) is redundant since if regex-group(11) is not
> an empty string, then regex-group(9) won't be an empty string:
>
> ---
> <xsl:if test="regex-group(11)"><!-- if a bbox exists we've got an option
-->
> <xsl:element name="option">
> <xsl:if test="regex-group(9)">
> ---
>
>>> <xsl:function name="ex:locationJson2Options">
>>> <xsl:param name="json"/><!-- 1 2
>>> 3 4 5 6
>>> 7 8 9 10 11 12
>>> 13 14 15 16
>>> 17 -->
>>> <xsl:variable name="regexps"
>>>
select="'(\{.*?("title":.*?"(.*?)").*?("qualifier&qu
ot;:.*?"(.*?)").*?("type":.*?"(.*?)").*?((((&qu
ot;bbox":.*?\[(.*?)\]).*?("geometry":.*?(\{.*?\})).*?\}{1,}))|
(("geometry":.*?(\{.*?\})).*?\}{1,})))'"/>
>>> <xsl:analyze-string select="$json" regex="{$regexps}" flags="s">
> ...
>> I'd also note you use " a lot, so change them to " and use '....'
>> and ' instead. You can also build up a complex expression by making
>
> Or put the regex as the content of the xsl:variable so you don't have to
> worry about either '"' or "'".
>
> If you use include 'x' in the @flags value, you can add white-space for
> readability (and more easily see where you've put in the redundant
> parentheses) as in the example below.
>
> I also suggest making variables for the positions of the significant regex
> groups and using those in regex-group() to make the code more readable.
> If the positions are calculated relative to the previous groups, your code
> is more resilient to changes in the regex (and for bunches of related
> parentheses, e.g., rBBoxGeometry (below), I'd often add a variable, e.g.,
> $rBBoxGeometryLast, for the last parentheses in the bunch and set the next
> variable relative to that to make it resilient to changes in the bunch).
>
>> smaller variables (with comments) and using concat() at the end.
>
> If you do it as content of xsl:variable, you can use xsl:value-of to refer
> to other regex variables.
>
> Regards,
>
>
> Tony Graham tgraham@xxxxxxxxxx
> Consultant http://www.mentea.net
> Chair, Print and Page Layout Community Group @ W3C XML Guild member
> -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
> Mentea XML, XSL-FO and XSLT consulting, training and programming
>
>
> <xsl:variable name="regexps" as="xs:string">
> ( <!-- 1 -->
> \{\s*
> ( <!-- rTitle -->
> "title":\s*"
> (.*?)
> "
> )
> \s*
> ( <!-- rQualifier -->
> "qualifier":\s*"
> (.*?) <!-- rQualifierData -->
> "
> )
> \s*
> ( <!-- rType -->
> "type":\s*"
> (.*?)
> "
> )
> \s*
> ( <!-- rBBoxGeometry -->
> (
> (
> ( <!-- rBBox -->
> "bbox":\s*\[
> (.*?) <!-- rBBoxData -->
> \]
> )
> \s*
> (
> "geometry":\s*
> (
> \{.*?\}
> )
> )
> \s*\}{1,}
> )
> )
> |
> ( <!-- rGeometry -->
> (
> "geometry":\s*
> (
> \{.*?\}
> )
> )
> \s*\}{1,}
> )
> )
> )
> </xsl:variable>
>
> <xsl:variable name="rTitle" select="2" as="xs:integer" />
> <xsl:variable name="rQualifier" select="$rTitle + 2" as="xs:integer" />
> <xsl:variable name="rQualifierData" select="$rQualifier + 1"
> as="xs:integer" />
> <xsl:variable name="rType" select="$rQualifierData + 1" as="xs:integer" />
> <xsl:variable name="rBBoxGeometry" select="$rType + 2" as="xs:integer" />
> <xsl:variable name="rBBox" select="$rBBoxGeometry + 3" as="xs:integer" />
> <xsl:variable name="rBBoxData" select="$rBBox + 1" as="xs:integer" />
>
> <xsl:function name="ex:locationJson2Options">
> <xsl:param name="json"/>
>
> <xsl:analyze-string select="$json" regex="{$regexps}" flags="sx">
> <xsl:matching-substring>
> <xsl:if test="regex-group($rBBox)">
> <!-- if a bbox exists we've got an option -->
> <xsl:element name="option">
> <xsl:if test="regex-group($rBBoxGeometry)">
> <xsl:attribute name="data-bbox"
> select="translate(regex-group($rBBoxData),
> '
|
|
',
> '')"/>
> </xsl:if>
> <xsl:value-of select="regex-group($rQualifierData)"/>
> </xsl:element>
> </xsl:if>
> </xsl:matching-substring>
> </xsl:analyze-string>
> </xsl:function>
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| Re: [xsl] analyze-string regex, Tony Graham | Thread | Re: [xsl] analyze-string regex, John Lumley |
| Re: [xsl] analyze-string regex, Tony Graham | Date | Re: [xsl] analyze-string regex, John Lumley |
| Month |