Subject: RE: [xsl] XSLT match with regex what's the best current solution? From: "Michael Kay" <michael.h.kay@xxxxxxxxxxxx> Date: Wed, 16 Jan 2002 09:37:32 -0000 |
> I am working on a suite of scripts that induce structure in > free text and eventually capture fine grained medical information. > I have been using AWK so far, but I am thinking about making > this a process largely of XML transformations. However, since I > must induce XML structure from semi-structured free text I need > some more parsing support. First, regular expressions. I know > there is EXSLT but are regex matches and replaces supported > in SAXON (I love SAXON, so I would prefer using it.) Saxon doesn't currently have any regex support (not even the limited facilities described in EXSLT, nor those in the draft XSLT 2.0 WD, let alone the more sophisticated facilities being discussed on this list). But it shouldn't be too difficult to write some Saxon extension functions that call functions in a regular expression library. (There are a number of such libraries around, and I haven't done a detailed evaluation or comparison. I believe there's one in apache jakarta, one in IBM alphaworks, one in the JDK 1.4 beta.) You might be able to call these libraries directly, or you may find it's easier if you write some wrapper code around them. > > Also, any ideas of additional parsing tools and their integration > into XSLT would be appreciated. Is there a way of running XSLT > in line-mode and have every line matched against regular > expressions? Well, I suppose so, with a simple sed script I could > first wrap each line into a <line>...</line> tag and then use regex > match on the text node of each <line> element. You could break the text into lines using the saxon:tokenize() extension function. > > Is SAXON easy to extend? I suppose there is some documentation > of SAXON that tells me how to write extensions in Java, right? > Any reason why it would be better to use something other than > SAXON if my platform is Java and I'm not interested in Web stuff > (in which case I would look into the Apache work.) > If you know Java, writing Saxon extension functions isn't difficult. It's described in the extensibility.html file that comes with the download. (If you have any specific problems, please raise them on the Saxon help list at saxon.sf.net) Mike Kay XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] XSLT match with regex wha, Thomas Winkler | Thread | Re: [xsl] XSLT match with regex wha, Gunther Schadow |
Re: [xsl] Merging multiple document, David Carlisle | Date | [xsl] Higher-Order Functions in XPa, Dimitre Novatchev |
Month |