Re: [xsl] grouping + global variable (?) (was re: regexs, grouping (?) and XSLT2?)

Subject: Re: [xsl] grouping + global variable (?) (was re: regexs, grouping (?) and XSLT2?)
From: Deirdre Saoirse Moen <deirdre@xxxxxxxxxxx>
Date: Fri, 13 Aug 2004 20:05:26 -0700 (PDT)
On Fri, 13 Aug 2004, Wendell Piez wrote:

> Your project sounds very ambitious. Up-conversion is a challenging and
> fascinating business, which we're all going to learn much more about.
> You have several conference papers' worth of material here, I bet.

I'm hoping so.

Quite frankly, I hadn't realized we were so cutting edge. :)

Ultimately, my goal is to provide an application that offers integration
between the text file (written using the user's text processor of choice).

User wants to submit a manuscript, then the application performs all the
necessary generation of the document (including cover letter) using
user-specific information about how they want the document to appear,
including any market- or genre-specific styles. Press a button, out pops
the PDF or RTF. For now, I'll settle for PDF. :)

I'd already written the submission manager and am trying to work to
integrate the work of another person into the project. Thus my struggle to

> At 08:15 PM 8/12/2004, you wrote:

>> But I've been thinking, based on the comments from the list, that a
>> better process might be eliminating the perl script entirely.
> Maybe: but you'll need something at least as good to do the work it's
> doing, and Perl is really good at regular-expressions and string processing
> generally.
> (Personally I might have tried it in Python, but that's mainly because I
> can count the lines of Perl I've written in my life on one hand. Of course,
> I can count in binary on my hands, which gets me higher than five.)

I didn't write the perl script, thus my frustration (as a Python person).
My partner-in-crime and I have come at the problem from entirely different

> Now it has some regexp support, XSLT 2.0 should be at least a credible
> option here, but its features have yet to be stress-tested TMK and
> tools support is still somewhat up in the air. (I believe Mike Kay is
> speaking on this very topic at XML 2004 this November in Washington
> DC.)

OK, that's what I'd been beginning to understnad based on list comments. I
wasn't aware of the tool support problem.

> A split-down-the-middle option could be to write a little function
> library in the language of your choice to do the upconversion
> string-processing, and call out to it from your XSLT using extension
> functions. (This is what I kind of imagined would happen five years
> ago, but it turns out processor-dependent extension functions are
> unfashionable these days.)

This is an intriguing option.

99% of the problem comes from documents saved in the native platform that
aren't correctly tagged. I'm not quite certain what to do about this so
that the editing is transparent. Yet.

I feel moderately confident that this might make it a more contiguous
process, which would also require fewer installed pieces in order to work.

> >I'm not sure I'd
> >want to eliminate the intermediate XML file, though.

> I think having the intermediate format will prove to be good design in
> any case.


> >Option 3 seems to be ruled out based on my current toolchain
> >(apache-FOP), which probably eliminates #2 as well. (I could easily be
> >wrong on this)
> Apache Xalan-J has support for a node-set function, so you could use
> option 2 if you wanted. It will even recognize it in the
> namespace, which is nice.


> >So, my question (you knew there was one): can someone give me a
> >description of how to accomplish #4, given the workflow I've got, using
> >something like Saxon? I see that it's an XSLT processor, but I'm don't get
> >the map of how all the pieces fit together. Right now, I know (after
> >having looked) that I'm using xalan for the simple reason that it came
> >with my apache-fop install.
> Saxon is well-liked by developers (it runs well, it's conformant, and
> it has good error messages), and can be switched in for Xalan in your
> toolchain if you prefer it. Saxon also supports exslt:node-set, so you
> can use option #2 with it as well.

Well, I can see if it offers me more options. I know enough to figure out
how to wrest it into the toolchain.

> As I mentioned, it has an extension attribute, saxon:next-in-chain, that
> can be invoked for pipelining. IIRC it passes SAX events between processor
> invocations (Mike?), so it's much faster than writing a file and reparsing,
> though perhaps not quite as fast as passing unserialized trees, as options
> 2 and 3 would do.

Right now, I'm running a script daily that re-generates XML files from any
changed text files in a given directory tree. The generation of a PDF is
upon-request, with re-generation of XML if it's needed. So part A
(txt->xml) doesn't necessarily happen when part B (xml->pdf) does.

Nevertheless, you've given me another idea, which I'll try over this

> I am reasonably sure Xalan offers similar features, however, or the Cocoon
> framework does.

Cocoon seems very interesting, but I don't quite get where it fits into
the overall picture of things, though I am reading up on it.

> >I'd also eventually like to get a decent RTF output. Standard manuscript
> >prose is not terribly complex, so something that supported basic features
> >should suffice for that. Unfortunately, the commercial options are too
> >expensive for the intended audience. Is jfor likely to be my best
> >available option?
> I'd be interested to hear myself from the list on this question. I haven't
> yet myself seen a really nice route to RTF. I think two passes to this
> (analogous to the way IBM deployed a "TeXML" which could be targeted as a
> route to TeX) might be the best way to do it: have yet another tag set that
> describes only the formatting primitives supported by RTF and a utility
> stylesheet to make RTF out of that. Or use XSL-FO, if any of the formatters
> can make decent RTF yet.

jfor hasn't been updated at all in over a year, so it seems like a dead
project. And is down.

I should add that I *do* need API access rather than a standalone

_Deirdre  web:        blog:
yarn:    cat's blog:
"Memes are a hoax! Pass it on!"

Current Thread