Re: Regular expression functions (Was: Re: [xsl] comments on December F&O draft)

Subject: Re: Regular expression functions (Was: Re: [xsl] comments on December F&O draft)
From: David Carlisle <davidc@xxxxxxxxx>
Date: Fri, 11 Jan 2002 15:46:15 GMT

> What I don't think we've thoroughly discussed yet is the idea of
> regexp matching templates (as David first suggested) vs. regexp
> matching instructions (which you need, I think, to cover the whole
> spectrum of requirements). Hopefully David's coming up with some kind
> of proposal that summarises it all ;)

One of my main problems is that currently I can't see a way to specify
things that would actually address the main use case I have for this.
(Which is a real use case for the day-job not something I just made
up:-)


Suppose you had a document which was marked up as XML but in which the
mathematics was marked up as 

<maths>
\frac{-b \pm \sqrt{b^2 -4ac}}{2a}
</maths>

and you had 96000 of these math expressions in the document collection
(which can be either one document linked via external entities or 1200
separate documents, according to taste).

So you knock up some XSL that transforms the XML bits into XHTML, but
what do you do with the mathematics (which, as always, is the most
interesting part)?

Basically you want to turn the nested tree structure of TeX {} brace
pairs into MathML (or some similar XML structure).

It was pointed out somewhere near the beginning of this thread that
regexp are not sufficient to parse a text stream into a tree.
This is true but not really the point as that just means you can't do it
with a single regexp (essentially regular expression languages can not
count: you can not design a regexp that matches from a { to its
matching } (you can do special cases like {[^{}]} which matches the
innermost groups but to match arbitrary groups you need arithmetic.)

But here the language is not "regular expressions" but "an existing
language that includes arithmetic (xpath or xslt or xquery) extended
with regular expressions". Such a language ought to be able to do this
job.

Actually XSLT 1.0 can do this (without extensions) I have templates
just using the existing string functions which trawl through a string
finding all tex commands like \gamma and replacing them by their unicode
characters and matching braces so the two arguments of \frac and the
argument of \sqrt are correctly matched. It really isn't a lot of fun
doing this in XSLT (but it means I can browse the XML files in IE
and see the sums looking like sums are supposed to look)

Basically this was also what the omnimark script I posted earlier was
also doing (the tools change but the same jobs keep coming back:-).

In omnimark, you can set up two rules matching { and } and a counter
that you increment when you see a { and decrement when you see a }
If the counter ever gets to be 0 then you have found the closing brace
that matches the first opening brace you see, at which point you stick
in your XML markup and carry on.

In XSLT's tree model it's a bit harder to see how you'd do that kind of
thing. If it were nodes rather than regexp that you were matching then
in XSLT 1 you'd find the first child (only) and then work along the
sibling axis to collect up all the nodes in the current group, or use
one of Jeni's other favourite grouping mechanisms. In XSLT2 there's the
explicit grouping constructs.

I'm not sure if any of the things we've sketched so far really helps
this use case. Clearly you could work as I work now, grab the whole
string into a parameter and explictly pass bits of the string to
subsequent templates and do all the brace pair matching by hand.
but in this case it will be exactly what I have now as you don't need
regexp to match on { and }. (matching TeX command names would be easier
with regexp but it's the brace matching that is the hard bit).

No solutions here just something for Jeni to think about at the weekend
(you weren't planning on doing anything else were you?)

David

_____________________________________________________________________
This message has been checked for all known viruses by Star Internet
delivered through the MessageLabs Virus Scanning Service. For further
information visit http://www.star.net.uk/stats.asp or alternatively call
Star Internet for details on the Virus Scanning Service.

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread