Re: [xsl] grouping + global variable (?) (was re: regexs, grouping (?) and XSLT2?)

Subject: Re: [xsl] grouping + global variable (?) (was re: regexs, grouping (?) and XSLT2?)
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Sat, 14 Aug 2004 12:01:48 -0400
Hi Deirdre,

I do hope this thread provokes others to contribute.

At 11:05 PM 8/13/2004, you wrote:
Quite frankly, I hadn't realized we were so cutting edge. :)

Well up-conversion has been going on along as e-text has. But XML is only now beginning to catch up to where Omnimark and even Perl were years ago. And to do it in public and share is a new thing, since it's both difficult and profitable enough that it is best understood by the data-conversion vendors, who know a lot about it but whose methods and technologies tend to be proprietary for understandable reasons (it's their bread and butter).


Also, it is a tough enough business in the general case that often it's dealt with by throwing people at it, not machines, or both together. In many cases when input is underspecified or unconstrainable this cannot be avoided.

Ultimately, my goal is to provide an application that offers integration
between the text file (written using the user's text processor of choice).

Yes: understood.


User wants to submit a manuscript, then the application performs all the
necessary generation of the document (including cover letter) using
user-specific information about how they want the document to appear,
including any market- or genre-specific styles. Press a button, out pops
the PDF or RTF. For now, I'll settle for PDF. :)

Good choice. The long-term challenge of these systems is "round-tripping" but you may want to avoid that for now; an opaque format like PDF helps force users to edit their original input, not the system's output (which then has to become input again).


I didn't write the perl script, thus my frustration (as a Python person).
My partner-in-crime and I have come at the problem from entirely different
directions.

This can be useful.


> Now it has some regexp support, XSLT 2.0 should be at least a credible
> option here, but its features have yet to be stress-tested TMK and
> tools support is still somewhat up in the air. (I believe Mike Kay is
> speaking on this very topic at XML 2004 this November in Washington
> DC.)

OK, that's what I'd been beginning to understnad based on list comments. I
wasn't aware of the tool support problem.

Saxon 8 is available but other vendors are standing in the wings (where they're hard to see). Only when we have a range of tools will it become clear (IMHO) how well the spec is designed. (For example the fact that W3C XML Schema implementations differ on details of implementation compromises the use of Schema generally, since its portability is impaired. This is a shame, though getting the spec right the first time in every detail on something like Schema is near-impossible; over time we can hope this situation will improve.)


> A split-down-the-middle option could be to write a little function
> library in the language of your choice to do the upconversion
> string-processing, and call out to it from your XSLT using extension
> functions. (This is what I kind of imagined would happen five years
> ago, but it turns out processor-dependent extension functions are
> unfashionable these days.)

This is an intriguing option.

For this to work the text has to start life as some kind of XML, though that could be nothing but a dumb wrapper. Then you'd need a processor whose API allows you to return node-sets from functions.


Also, don't forget that XSLT 2 gives user-defined functions, so for many things it may be possible to avoid the external language altogether.

99% of the problem comes from documents saved in the native platform that
aren't correctly tagged. I'm not quite certain what to do about this so
that the editing is transparent. Yet.

I think this is the most difficult problem. This is why XML's well-formedness rules constitute its secret weapon. (Felt only when they chafe, this set of rules makes all downstream issues much easier to deal with, so XML developers can be quite unconscious of how much we don't have to think about.)


You need a way to trap and fix bad incoming tagging before it gets into your system, where it's expensive to deal with.

A plain-text editing window is appealing (many writers like their keyboards), but you're going to need at least a "galley" preview on input, before commit, or you're going to go insane. A real grammar for your syntax would be even better.

I feel moderately confident that this might make it a more contiguous
process, which would also require fewer installed pieces in order to work.

Yes.


> I'd be interested to hear myself from the list on this question. I haven't
> yet myself seen a really nice route to RTF. I think two passes to this
> (analogous to the way IBM deployed a "TeXML" which could be targeted as a
> route to TeX) might be the best way to do it: have yet another tag set that
> describes only the formatting primitives supported by RTF and a utility
> stylesheet to make RTF out of that. Or use XSL-FO, if any of the formatters
> can make decent RTF yet.

jfor hasn't been updated at all in over a year, so it seems like a dead
project. And jfor.org is down.

An indication that the problem of generating nice RTF is harder than it may first appear.


I should add that I *do* need API access rather than a standalone
application.

If it were me I'd be inclined to see how far I could go with XSLT 2. But then, I like XSLT. I am actually fairly hopeful that XSLT 2 processors will be strong contenders in this space.


Cheers,
Wendell

___&&__&_&___&_&__&&&__&_&__&__&&____&&_&___&__&_&&_____&__&__&&_____&_&&_
"Thus I make my own use of the telegraph, without consulting
the directors, like the sparrows, which I perceive use it
extensively for a perch." -- Thoreau


Current Thread