Re: [xsl]Identifying patterns within texts

Subject: Re: [xsl]Identifying patterns within texts
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Thu, 29 Nov 2007 17:41:05 -0500
Jim,

At 05:04 PM 11/29/2007, you wrote:
The two responses I got (thank you....) reiterated the problem that I
identified at the beginning of this project.  How do you identify
"math"?  Since I am working on an educational tool where I am taking an
old format (strings within xml tags)and converting to a new
format(strings within new xml tags), it is tough to identify
What is considered math.  1/2/99  Is that math or a date?  Basically, I
am to decide what represents a mathematical expression and place it
within its own element/tag.  Therefore, the software that processes it,
will be able to display it in a 2-dimensional format....So the algorithm
I come up with must be flexible and expandable.  And it may not be
perfect.... Spaces between text will be a killer....  I guess what is
acceptable will be up to the Systems guys......

This sort of squishiness isn't unusual for this kind of problem, if that's any comfort.


However, I have went on the path to choose the XSLT implementation that
was used on the GNOME project (xmlsoft.org) which implements XSLT 1.0.
The engine I chose must be easily added to an existing dll and later
ported to a MAC library (now that Mac is very much Unix :) ).  I needed
something that was free, something the lawyers would approve the
licensing, and something that would be portable among those two
platforms.  I have seen some Java and C++ (Xalan with Xerces)
implementations, but I did not want the added tasks of integration (JNI
and C++ bindings).  Please comment on my logic if you see flaws.

If Java isn't a realistic option for you, the reasoning seems sound enough.


If Java is conceivable, you should at least consider Saxon8, which will give you all the 2.0 features and more. But I'm certainly not qualified to say whether it should be.

Therefore, the idea of using xsl:analyze-string element or regular
expressions in XSLT 2.0 is not an option right now.

If it isn't, then you basically have two choices:


1. Embrace the "fun" of setting out to be a killer XSLT 1.0 programmer, devising various sorts of amazing trickiness in a language not designed for the task at hand. In this case I'd recommend getting a copy of Jeni Tennison's "XSLT and XPath On the Edge", which covers this sort of thing along with much else. You will become skilled in recursive templates, arcane tricks with the translate() function, and other sorts of madness.

If you want to stay sane, however ...

I guess I could use a package like Boost/regex to post process my
converted
Xml.  I assume I can generate the XML from the result tree in memory and
then parse that looking for math using C.

This sounds like a worthwhile option.


Just to make sure there isn't an intermediate course, you might investigate what sort of extensibility your processor of choice offers. Maybe you could manage the requirement by writing your own function library, and if you're lucky, maybe some of it has already been done for you.

Cheers,
Wendell


====================================================================== Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ======================================================================

Current Thread