|
Subject: RE: jade with multiple input files / ESIS data revisited From: "James.W Wilson" <James.W.Wilson@xxxxxxxxxxxxx> Date: Thu, 18 Dec 1997 10:57:00 -0600 |
I've gotten a number of (very helpful and quick, btw) responses
to my email asking about multiple input files and the problem of the
dtd size. Rather than wasting bandwidth by quoting them let me just
summarize what we're trying to do and we can go from there.
We have gigantic (300 meg) chunks of sgml with all sorts of complex
coding in them. They conform to a big dtd (which, as someone
suggested, is the union of many specializations of a singe generic
model) which our particular group has no control over. However, we
don't want to convert everything to sgml, just certain pieces which
occur within certain tags. These pieces are usually small (average
size is < 2k, max size is maybe 300k) and are extracted from the main
chunk and placed into umpteen thousand files (44,000 for one
particular chunk) by our current process.
Since dsssl is side-effect-free, I presume that we can't parse the one
gigantic chunk and have it output one rtf file for each little piece
we're interested in (this would be ideal). We could generate one
gigantic rtf file, of course, but how to split that into many little
rtf files?
Failing that, we can run jade on each little piece; however, we need
entities and presumably element declarations from the dtd so jade can
correctly parse the input. The problem with including the dtd is, as
mentioned before, its size; per-file overhead swamps everything else
in our situation.
The data has a lot of end-tag-minimizationm and without the dtd jade
apparently has to guess where tags begin and end, and is often wrong.
I say this because if we go back and put in explicit end tags,
everything works fine.
I think we might be able to work around the guessing problem by
writing the style sheet carefully. If we just include the entities
file and the style sheet doesn't demand 100% accuracy from jade's
hierarchy guesses, maybe things will work out.
Another option would be to write our own mini-dtd which would contain
only the tags we need. However, since the chunks in question are not
really a totally discrete part of the main dtd, it's not clear that
this mini-dtd would actually be so mini. Also, since the main dtd is
constantly under revision we'd be constantly playing catch-up to keep
the mini-dtd in sync.
As for ESIS input, we already convert the big chunks to ESIS in the
course of our process; if jade accepted ESIS data, we could just split
out the chunks we need and they'd already be fully parsed so the dtd
would be irrelevant. Presumably this would be pretty fast. How much
work would it be to add this functionality? If it's not too
unreasonable, we could do it ourselves.
James
DSSSList info and archive: http://www.mulberrytech.com/dsssl/dssslist
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| Re: Jade RTF and Winhelp, Daniel Speck | Thread | Newbie question - Jade & Mac, Karl Critz |
| Jade RTF and Winhelp, W. Eliot Kimber | Date | Newbie question - Jade & Mac, Karl Critz |
| Month |