Subject: RE: jade with multiple input files / ESIS data revisited From: "James.W Wilson" <James.W.Wilson@xxxxxxxxxxxxx> Date: Thu, 18 Dec 1997 10:57:00 -0600 |
I've gotten a number of (very helpful and quick, btw) responses to my email asking about multiple input files and the problem of the dtd size. Rather than wasting bandwidth by quoting them let me just summarize what we're trying to do and we can go from there. We have gigantic (300 meg) chunks of sgml with all sorts of complex coding in them. They conform to a big dtd (which, as someone suggested, is the union of many specializations of a singe generic model) which our particular group has no control over. However, we don't want to convert everything to sgml, just certain pieces which occur within certain tags. These pieces are usually small (average size is < 2k, max size is maybe 300k) and are extracted from the main chunk and placed into umpteen thousand files (44,000 for one particular chunk) by our current process. Since dsssl is side-effect-free, I presume that we can't parse the one gigantic chunk and have it output one rtf file for each little piece we're interested in (this would be ideal). We could generate one gigantic rtf file, of course, but how to split that into many little rtf files? Failing that, we can run jade on each little piece; however, we need entities and presumably element declarations from the dtd so jade can correctly parse the input. The problem with including the dtd is, as mentioned before, its size; per-file overhead swamps everything else in our situation. The data has a lot of end-tag-minimizationm and without the dtd jade apparently has to guess where tags begin and end, and is often wrong. I say this because if we go back and put in explicit end tags, everything works fine. I think we might be able to work around the guessing problem by writing the style sheet carefully. If we just include the entities file and the style sheet doesn't demand 100% accuracy from jade's hierarchy guesses, maybe things will work out. Another option would be to write our own mini-dtd which would contain only the tags we need. However, since the chunks in question are not really a totally discrete part of the main dtd, it's not clear that this mini-dtd would actually be so mini. Also, since the main dtd is constantly under revision we'd be constantly playing catch-up to keep the mini-dtd in sync. As for ESIS input, we already convert the big chunks to ESIS in the course of our process; if jade accepted ESIS data, we could just split out the chunks we need and they'd already be fully parsed so the dtd would be irrelevant. Presumably this would be pretty fast. How much work would it be to add this functionality? If it's not too unreasonable, we could do it ourselves. James DSSSList info and archive: http://www.mulberrytech.com/dsssl/dssslist
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: Jade RTF and Winhelp, Daniel Speck | Thread | Newbie question - Jade & Mac, Karl Critz |
Jade RTF and Winhelp, W. Eliot Kimber | Date | Newbie question - Jade & Mac, Karl Critz |
Month |