Re: [xsl] parsing post script

Subject: Re: [xsl] parsing post script
From: Larry Kollar <kollar@xxxxxxxxxx>
Date: Tue, 25 Nov 2003 09:38:13 -0500

ghostscript includes a pstext utility to extract text: it does a
reasonable but not 100% accurate job (and includes the full ghostscript
postscript interpreter).

If you turn off the ps2ascii simple mode (remove the "-dSIMPLE" argument),
GhostScript outputs font and positioning information for each string. You
can use that information to eliminate headers & footers, identify elements
to tag, and so forth.


Exegenix (http://exegenix.com/) has a commercial solution for converting
PostScript or PDF to XML; it looks intriguing.

--
Larry Kollar k o l l a r @ a l l t e l . n e t
"The hardest part of all this is the part that requires thinking."
-- Paul Tyson, on xml-doc



XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list



Current Thread