Re: [xsl] analyze-string regex

Subject: Re: [xsl] analyze-string regex
From: "Tony Graham" <tgraham@xxxxxxxxxx>
Date: Fri, 28 Mar 2014 09:58:38 -0000 (GMT)
On Thu, March 27, 2014 7:19 pm, Liam R E Quin wrote:
> On Thu, 2014-03-27 at 17:06 +0000, Rushforth, Peter wrote:
> [...]
>
>> What I came up seems to work ok:

The test for regex-group(9) is redundant since if regex-group(11) is not
an empty string, then regex-group(9) won't be an empty string:

---
<xsl:if test="regex-group(11)"><!-- if a bbox exists we've got an option -->
  <xsl:element name="option">
    <xsl:if test="regex-group(9)">
---

>>   <xsl:function name="ex:locationJson2Options">
>>     <xsl:param name="json"/><!--           1    2
>>     3             4                              5                6
>>                        7              8 9 10 11                  12
>>        13                       14                     15 16
>>              17                                     -->
>>     <xsl:variable name="regexps"
>> select="'(\{.*?(&quot;title&quot;:.*?&quot;(.*?)&quot;).*?(&quot;qualifier&quot;:.*?&quot;(.*?)&quot;).*?(&quot;type&quot;:.*?&quot;(.*?)&quot;).*?((((&quot;bbox&quot;:.*?\[(.*?)\]).*?(&quot;geometry&quot;:.*?(\{.*?\})).*?\}{1,}))|((&quot;geometry&quot;:.*?(\{.*?\})).*?\}{1,})))'"/>
>>     <xsl:analyze-string select="$json" regex="{$regexps}" flags="s">
...
> I'd also note you use &quot; a lot, so change them to " and use '....'
> and &apos; instead. You can also build up a complex expression by making

Or put the regex as the content of the xsl:variable so you don't have to
worry about either '"' or "'".

If you use include 'x' in the @flags value, you can add white-space for
readability (and more easily see where you've put in the redundant
parentheses) as in the example below.

I also suggest making variables for the positions of the significant regex
groups and using those in regex-group() to make the code more readable. 
If the positions are calculated relative to the previous groups, your code
is more resilient to changes in the regex (and for bunches of related
parentheses, e.g., rBBoxGeometry (below), I'd often add a variable, e.g.,
$rBBoxGeometryLast, for the last parentheses in the bunch and set the next
variable relative to that to make it resilient to changes in the bunch).

> smaller variables (with comments) and using concat() at the end.

If you do it as content of xsl:variable, you can use xsl:value-of to refer
to other regex variables.

Regards,


Tony Graham                                         tgraham@xxxxxxxxxx
Consultant                                       http://www.mentea.net
Chair, Print and Page Layout Community Group @ W3C    XML Guild member
  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --
Mentea       XML, XSL-FO and XSLT consulting, training and programming


<xsl:variable name="regexps" as="xs:string">
(                         <!-- 1 -->
 \{\s*
 (                        <!-- rTitle -->
  "title":\s*"
  (.*?)
  "
 )
 \s*
 (                        <!-- rQualifier -->
  "qualifier":\s*"
  (.*?)                   <!-- rQualifierData -->
  "
 )
 \s*
 (                        <!-- rType -->
  "type":\s*"
  (.*?)
  "
 )
 \s*
 (                        <!-- rBBoxGeometry -->
  (
   (
    (                     <!-- rBBox -->
     "bbox":\s*\[
     (.*?)                <!-- rBBoxData -->
     \]
    )
    \s*
    (
     "geometry":\s*
     (
      \{.*?\}
     )
    )
    \s*\}{1,}
   )
  )
  |
  (                        <!-- rGeometry -->
   (
    "geometry":\s*
    (
     \{.*?\}
    )
   )
   \s*\}{1,}
  )
 )
)
</xsl:variable>

<xsl:variable name="rTitle" select="2" as="xs:integer" />
<xsl:variable name="rQualifier" select="$rTitle + 2" as="xs:integer" />
<xsl:variable name="rQualifierData" select="$rQualifier + 1"
as="xs:integer" />
<xsl:variable name="rType" select="$rQualifierData + 1" as="xs:integer" />
<xsl:variable name="rBBoxGeometry" select="$rType + 2" as="xs:integer" />
<xsl:variable name="rBBox" select="$rBBoxGeometry + 3" as="xs:integer" />
<xsl:variable name="rBBoxData" select="$rBBox + 1" as="xs:integer" />

<xsl:function name="ex:locationJson2Options">
  <xsl:param name="json"/>

  <xsl:analyze-string select="$json" regex="{$regexps}" flags="sx">
    <xsl:matching-substring>
      <xsl:if test="regex-group($rBBox)">
	<!-- if a bbox exists we've got an option -->
	<xsl:element name="option">
	  <xsl:if test="regex-group($rBBoxGeometry)">
	    <xsl:attribute name="data-bbox"
			   select="translate(regex-group($rBBoxData),
				             '&#xD;&#xA;|&#xD;|&#xA;',
				             '')"/>
	  </xsl:if>
	  <xsl:value-of select="regex-group($rQualifierData)"/>
	</xsl:element>
      </xsl:if>
    </xsl:matching-substring>
  </xsl:analyze-string>
</xsl:function>

Current Thread