[dssslist] New script to produce indexes with duplicates removed and ranges collapsed

Subject: [dssslist] New script to produce indexes with duplicates removed and ranges collapsed
From: Jeremy Malcolm <Jeremy@xxxxxxxxxxxxx>
Date: Sun, 12 Mar 2006 23:11:57 +0800
I have written a script to fix my problem with indexes coming up with ? 
instead of page numbers.  As you may recall, a couple of my indexes were 
situated near the top rather than the bottom of my document, which meant 
that the all-element-numbers (AENs) that openjade generated to keep 
track of the page numbers for use in the print versions were thrown out 
as soon as the completed indexes were inserted; a catch-22 situation.

My script, really just a modified version of one from the XSL 
stylesheets called pdf2index, has the handy side-effect of achieving 
something heretofore thought impossible with DSSSL: it removes duplicate 
page references like "2, 2, 2" and it collapses ranges like "1, 2, 3" 
into a nicer format like "1-3".

The pdf2index script is hackier than what we can achieve with DSSSL: 
pdf2index uses a special stylesheet to generate a PDF format that can be 
converted back to text using pdftotext (from xpdf) and parsed.  In 
contrast, my script which I've called aux2index.pl simply obtains the 
page numbers from the .aux file that is generated by the last pass of 
openjade.  It then recreates the index file with those page numbers 
"hard-coded" in so that it won't be corrupted with the AENs change.

The use of the script is as follows:

(a) Generate your PDF format, including the index/es, in the usual way,
     generally with openjade to create a tex file and three passes of
     pdfjadetex to turn it into a PDF.  If you're like me and you have
     one or more indexes that are above the text that you're indexing,
     this will corrupt the index in the PDF format and you'll see ?
     characters instead of page numbers.  Never fear.

(b) Don't delete the .aux file that was generated by the last run of
     pdfjadetex.  Run aux2index.pl with two arguments: the .aux file
     as the first argument and the index file as the second.  For
     example, "./aux2index.pl myfile.aux index.sgml > index.sgml.new".
     If index.sgml.new seems OK, copy it back to index.sgml.

(c) Generate your PDF file again, again with pdfjade and three runs of
     pdfjadetex (or however you normally do it).  Hey presto, you will
     have a nice index with no duplicates and with ranges collapsed.

The script is a quick hack, which is released to the public domain, but 
it works for me.  I don't trust the mailing list not to filter it out, 
so for now it may be downloaded from 
http://www.malcolm.id.au/files/software/unix/aux2index.pl (which will 
also allow me to keep improving it over the next day or so).

Feel free to forward this message to any other appropriate developers, 
lists or newsgroups if others might find it useful (I tried to join 
docbook-apps in order to forward it there, but the mail server seems down).

-- 
Jeremy Malcolm LLB (Hons) B Com
Internet and Open Source lawyer, IT consultant, actor
host -t NAPTR 1.0.8.0.3.1.2.9.8.1.6.e164.org|awk -F! '{print $3}'

[demime 1.01d removed an attachment of type application/x-pkcs7-signature which had a name of smime.p7s]

Current Thread