Subject: Re: Regular expression functions (Was: Re: [xsl] comments on December F&O draft) From: David Carlisle <davidc@xxxxxxxxx> Date: Fri, 11 Jan 2002 15:46:15 GMT |
> What I don't think we've thoroughly discussed yet is the idea of > regexp matching templates (as David first suggested) vs. regexp > matching instructions (which you need, I think, to cover the whole > spectrum of requirements). Hopefully David's coming up with some kind > of proposal that summarises it all ;) One of my main problems is that currently I can't see a way to specify things that would actually address the main use case I have for this. (Which is a real use case for the day-job not something I just made up:-) Suppose you had a document which was marked up as XML but in which the mathematics was marked up as <maths> \frac{-b \pm \sqrt{b^2 -4ac}}{2a} </maths> and you had 96000 of these math expressions in the document collection (which can be either one document linked via external entities or 1200 separate documents, according to taste). So you knock up some XSL that transforms the XML bits into XHTML, but what do you do with the mathematics (which, as always, is the most interesting part)? Basically you want to turn the nested tree structure of TeX {} brace pairs into MathML (or some similar XML structure). It was pointed out somewhere near the beginning of this thread that regexp are not sufficient to parse a text stream into a tree. This is true but not really the point as that just means you can't do it with a single regexp (essentially regular expression languages can not count: you can not design a regexp that matches from a { to its matching } (you can do special cases like {[^{}]} which matches the innermost groups but to match arbitrary groups you need arithmetic.) But here the language is not "regular expressions" but "an existing language that includes arithmetic (xpath or xslt or xquery) extended with regular expressions". Such a language ought to be able to do this job. Actually XSLT 1.0 can do this (without extensions) I have templates just using the existing string functions which trawl through a string finding all tex commands like \gamma and replacing them by their unicode characters and matching braces so the two arguments of \frac and the argument of \sqrt are correctly matched. It really isn't a lot of fun doing this in XSLT (but it means I can browse the XML files in IE and see the sums looking like sums are supposed to look) Basically this was also what the omnimark script I posted earlier was also doing (the tools change but the same jobs keep coming back:-). In omnimark, you can set up two rules matching { and } and a counter that you increment when you see a { and decrement when you see a } If the counter ever gets to be 0 then you have found the closing brace that matches the first opening brace you see, at which point you stick in your XML markup and carry on. In XSLT's tree model it's a bit harder to see how you'd do that kind of thing. If it were nodes rather than regexp that you were matching then in XSLT 1 you'd find the first child (only) and then work along the sibling axis to collect up all the nodes in the current group, or use one of Jeni's other favourite grouping mechanisms. In XSLT2 there's the explicit grouping constructs. I'm not sure if any of the things we've sketched so far really helps this use case. Clearly you could work as I work now, grab the whole string into a parameter and explictly pass bits of the string to subsequent templates and do all the brace pair matching by hand. but in this case it will be exactly what I have now as you don't need regexp to match on { and }. (matching TeX command names would be easier with regexp but it's the brace matching that is the hard bit). No solutions here just something for Jeni to think about at the weekend (you weren't planning on doing anything else were you?) David _____________________________________________________________________ This message has been checked for all known viruses by Star Internet delivered through the MessageLabs Virus Scanning Service. For further information visit http://www.star.net.uk/stats.asp or alternatively call Star Internet for details on the Virus Scanning Service. XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: Regular expression functions (W, Jeni Tennison | Thread | Re: Regular expression functions (W, Jeni Tennison |
RE: mapping (Was: Re: [xsl] Re: . i, Michael Kay | Date | Entities Was: RE: [xsl] use cases f, Joerg Pietschmann |
Month |