Re: [xsl] parsing parens in the park

Subject: Re: [xsl] parsing parens in the park
From: "Dimitre Novatchev" <dnovatchev@xxxxxxxxx>
Date: Sun, 28 Sep 2008 10:01:08 -0700
> Every name is
> guaranteed to have an associated value, which could be a string of
> most anything, including the null string or one that contains
> *matched pairs of parentheses*. (A previous validation stage has, in
> theory, guaranteed that there are no unmatched single parentheses in
> a value.)
>

This description is somewhat vague. If a precise syntax of "value" is
provided (such as BNF), I can have a parser in just minutes by using
the LR Parsing framework of FXSL:

   http://fxsl.cvs.sourceforge.net/fxsl/fxsl-xslt2/f/func-lrParse.xsl?view=markup&sortby=date

It is quite straightforward to implement parsers in XSLT 2.0 using the
FXSL generic LR(1) table-driven parser and the parsing tables
generated by YACCX.


Examples are the JSON parser (used as the base for the
f:json-document() function)  and the XPath 2.0 parser, both
implemented in a straightforward manner.

The JSON parser was implemented in about 1-2 days (whenever I had free
time -- so probably just several hours) and most of the effort was in
the lexer and in the semantic evaluation -- things that the developer
has to provide anyway using *any* such tool.



--
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play



On Sun, Sep 28, 2008 at 9:29 AM, Syd Bauman <Syd_Bauman@xxxxxxxxx> wrote:
>
> In an XSLT 2.0 stylesheet I have a string (call it $park) that is of
> the format
>       name1 (value1) name2 (value2) name3 (value3) ...
> where the whitespace is optional, but the parentheses are not. Every
> name is guaranteed to be an unqualified XML Name. Every name is
> guaranteed to have an associated value, which could be a string of
> most anything, including the null string or one that contains
> *matched pairs of parentheses*. (A previous validation stage has, in
> theory, guaranteed that there are no unmatched single parentheses in
> a value.)
>
> I need to produce a copy of this string with certain adjustments,
> e.g., dropping any name-value pair where the name is a particular
> string, or selecting only name-value pairs for which the value
> matches a given regexp (which may include the local name of the
> context node).
>
> So it seems like a good first step would be to parse the string into
> what Perl calls a hash table, or lisp calls an alist. I think that
> means what I wish I could do is generate a key ala the following
> (presuming that my get functions return a sequence).
>  <xsl:key name="park-alist"
>           match="syd:get-values( $park )"
>           use="syd:get-names( $park )"/>
> If I understand correctly, this isn't possible because you can't call
> a function from the match= attribute, as it is a pattern, not an
> expression.
>
> So I suppose I can do this without a key by just indexing into two
> separate sequences. E.g., something like
>   <xsl:for-each select="syd:get-names($park)">
>     <xsl:variable name="i" select="position()"/>
>     <xsl:message>
>       name # <xsl:value-of select="$i"/> = <xsl:value-of select="."/>
>       value # <xsl:value-of select="$i"/> = <xsl:value-of select="syd:get-values($park)[$i]"/>
>     </xsl:message>
>   </xsl:for-each>
>
> However, while I'm confident I could tackle the parsing those two
> functions have to do in lisp or Perl, where I could scan the value
> and count open- and close-parens as I go, I have no idea how to parse
> them in XPath 2.0. I think maybe I could come up with a complex set
> of recursive parsing templates that used substring-before() and
> substring-after(), but
> a) I'm not sure, and
> b) there's gotta be a better way.
>
> So, any thoughts on how to parse that string, or a better method of
> approaching the whole problem, are appreciated.

Current Thread