RE: [xsl] XSLT match with regex what's the best current solution?

Subject: RE: [xsl] XSLT match with regex what's the best current solution?
From: "Michael Kay" <michael.h.kay@xxxxxxxxxxxx>
Date: Wed, 16 Jan 2002 09:37:32 -0000
> I am working on a suite of scripts that induce structure in
> free text and eventually capture fine grained medical information.
> I have been using AWK so far, but I am thinking about making
> this a process largely of XML transformations. However, since I
> must induce XML structure from semi-structured free text I need
> some more parsing support. First, regular expressions. I know
> there is EXSLT but are regex matches and replaces supported
> in SAXON (I love SAXON, so I would prefer using it.)

Saxon doesn't currently have any regex support (not even the limited
facilities described in EXSLT, nor those in the draft XSLT 2.0 WD, let alone
the more sophisticated facilities being discussed on this list).

But it shouldn't be too difficult to write some Saxon extension functions
that call functions in a regular expression library. (There are a number of
such libraries around, and I haven't done a detailed evaluation or
comparison. I believe there's one in apache jakarta, one in IBM alphaworks,
one in the JDK 1.4 beta.) You might be able to call these libraries
directly, or you may find it's easier if you write some wrapper code around
them.
>
> Also, any ideas of additional parsing tools and their integration
> into XSLT would be appreciated. Is there a way of running XSLT
> in line-mode and have every line matched against regular
> expressions? Well, I suppose so, with a simple sed script I could
> first wrap each line into a <line>...</line> tag and then use regex
> match on the text node of each <line> element.

You could break the text into lines using the saxon:tokenize() extension
function.
>
> Is SAXON easy to extend? I suppose there is some documentation
> of SAXON that tells me how to write extensions in Java, right?
> Any reason why it would be better to use something other than
> SAXON if my platform is Java and I'm not interested in Web stuff
> (in which case I would look into the Apache work.)
>
If you know Java, writing Saxon extension functions isn't difficult. It's
described in the extensibility.html file that comes with the download.

(If you have any specific problems, please raise them on the Saxon help list
at saxon.sf.net)

Mike Kay


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread