Re: [xsl] analyze-string regex

Subject: Re: [xsl] analyze-string regex
From: Graydon <graydon@xxxxxxxxx>
Date: Wed, 26 Mar 2014 17:28:36 -0400
On Wed, Mar 26, 2014 at 08:30:15PM +0000, Rushforth, Peter scripsit:
> I am trying to use xsl:analyze-string to process a json response, the
> model of which is fixed.

[snip]

> I was under the impression that the regex would perform multiple
> matches, 

It does.

> allowing me to generate a sequence of elements from a string
> containing multiple objects that I match.  If that is true (the spec
> says it should be, I think), then my regex is faulty.  Could you offer
> any suggestions about the following, please?

Don't get yourself stuck writing a regex like that?

There might be a JSON-to-XML library out there, which'd be my first choice.

If not, I couldn't figure out precisely what you were trying to do -- I
would have had to be able to figure out which regex sub-match was which
-- but the following is the sort of thing I think works a lot better as
an approach.  Saw up the input with tokenize when you can (the "we don't
need quotes when we've got braces" bit of JSON isn't a help!) and get
some use out of the regular structure, restricting string matches to
replace and relatively short and simple stuff you (or at least I!) can
comprehend.

Once you're getting the right values, you replace the container elements
with what you actually want as output.

<xsl:template match="/json">
    <bucket>
        <!-- get rid of the intial and final brackets -->
        <xsl:variable name="desquared" select="replace(normalize-space(.),'^\[|\]$','')" />
        <!-- tokenize by braces, spaces, and commas -->
        <xsl:for-each select="tokenize($desquared,'\},\p{Zs}*\{')">
            <option>
                <in>
                    <!-- what did we start with? -->
                    <xsl:sequence select="." />
                </in>
                <xsl:for-each select="tokenize(replace(normalize-space(.),'^\{\p{Zs}*',''),'&quot;, ')">
                    <!-- how do we think it slices up? -->
                    <name>
                        <xsl:sequence select="replace((tokenize(.,':\p{Zs}'))[1],'^&quot;|&quot;$','')" />
                    </name>
                    <value>
                        <xsl:choose>
                            <xsl:when test="matches(tokenize(.,':\p{Zs}')[2],'^\[')">
                                <xsl:sequence select="substring-after(.,'geometry&quot;:')" />
                            </xsl:when>
                            <xsl:otherwise>
                                <xsl:sequence select="substring-after(.,': &quot;')" />
                            </xsl:otherwise>
                        </xsl:choose>
                    </value>
                </xsl:for-each>
            </option>
        </xsl:for-each>
    </bucket>
</xsl:template>

-- Graydon

Current Thread