Subject: [dssslist] New script to produce indexes with duplicates removed and ranges collapsed From: Jeremy Malcolm <Jeremy@xxxxxxxxxxxxx> Date: Sun, 12 Mar 2006 23:11:57 +0800 |
I have written a script to fix my problem with indexes coming up with ? instead of page numbers. As you may recall, a couple of my indexes were situated near the top rather than the bottom of my document, which meant that the all-element-numbers (AENs) that openjade generated to keep track of the page numbers for use in the print versions were thrown out as soon as the completed indexes were inserted; a catch-22 situation. My script, really just a modified version of one from the XSL stylesheets called pdf2index, has the handy side-effect of achieving something heretofore thought impossible with DSSSL: it removes duplicate page references like "2, 2, 2" and it collapses ranges like "1, 2, 3" into a nicer format like "1-3". The pdf2index script is hackier than what we can achieve with DSSSL: pdf2index uses a special stylesheet to generate a PDF format that can be converted back to text using pdftotext (from xpdf) and parsed. In contrast, my script which I've called aux2index.pl simply obtains the page numbers from the .aux file that is generated by the last pass of openjade. It then recreates the index file with those page numbers "hard-coded" in so that it won't be corrupted with the AENs change. The use of the script is as follows: (a) Generate your PDF format, including the index/es, in the usual way, generally with openjade to create a tex file and three passes of pdfjadetex to turn it into a PDF. If you're like me and you have one or more indexes that are above the text that you're indexing, this will corrupt the index in the PDF format and you'll see ? characters instead of page numbers. Never fear. (b) Don't delete the .aux file that was generated by the last run of pdfjadetex. Run aux2index.pl with two arguments: the .aux file as the first argument and the index file as the second. For example, "./aux2index.pl myfile.aux index.sgml > index.sgml.new". If index.sgml.new seems OK, copy it back to index.sgml. (c) Generate your PDF file again, again with pdfjade and three runs of pdfjadetex (or however you normally do it). Hey presto, you will have a nice index with no duplicates and with ranges collapsed. The script is a quick hack, which is released to the public domain, but it works for me. I don't trust the mailing list not to filter it out, so for now it may be downloaded from http://www.malcolm.id.au/files/software/unix/aux2index.pl (which will also allow me to keep improving it over the next day or so). Feel free to forward this message to any other appropriate developers, lists or newsgroups if others might find it useful (I tried to join docbook-apps in order to forward it there, but the mail server seems down). -- Jeremy Malcolm LLB (Hons) B Com Internet and Open Source lawyer, IT consultant, actor host -t NAPTR 1.0.8.0.3.1.2.9.8.1.6.e164.org|awk -F! '{print $3}' [demime 1.01d removed an attachment of type application/x-pkcs7-signature which had a name of smime.p7s]
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [dssslist] Re: Problems with ? , Jeremy Malcolm | Thread | [dssslist] how to produce page brea, JtoEE JtoME |
Re: [dssslist] Re: Problems with ? , N. Raghavendra | Date | [dssslist] how to produce page brea, JtoEE JtoME |
Month |