[xsl] Post-Processing PDF For Back-Of-The-Book Indexes

In reference to an earlier thread about eliminating duplicate page
numbers in back-of-the-book indexes generated by XSL-FO styles, I have
successfully done this using the free PJ library from www.Etymon.com.
With this library you can interact with PDF at the lowest level of
granularity (individual PDF operators within a page). In my case, I was
able to get to the individual lines of the index pages, find sequences
of repeated numbers, remove them from the document, and write a new PDF
document. It required about 150 lines of Python (using the Jython
interpreter to provide access to the PJ Java library) to implement the
initial functionality I needed.

I'm not quite ready to post code--I need to refine what I've written and
do more testing, but I wanted to report this initial success as I know
others are struggling with this same problem.

Cheers,

Eliot
-- 
W. Eliot Kimber, eliot@xxxxxxxxxx
Consultant, ISOGEN International

1016 La Posada Dr., Suite 240
Austin, TX  78752 Phone: 512.656.4139

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

<- Previous	Index	Next ->
[xsl] search and replace, Saverio Perugini	Thread	RE: [xsl] Proper xsl coding for &nb, Jeff Beadle
[xsl] search and replace, Saverio Perugini	Date	Re: [xsl] How to get a "heading" fr, Jeni Tennison
	Month

<-prev [Thread] next->	<-prev [Date] next->
Month Index \| List Home