RE: [xsl] DocBook to plain text - what do you use?

Subject: RE: [xsl] DocBook to plain text - what do you use?
From: David.Pawson@xxxxxxxxxxx
Date: Thu, 29 Jul 2004 07:48:09 +0100
    -----Original Message-----
    From: Paul DuBois 

    I agree that it seems like it should be much easier.  
    That's one reason I'm puzzled that such a thing doesn't 
    seem to exist.
I guess its a low priority requirement?
    Is it just that no one is interested in producing plain 
    text?  (For example, to produce README files and such from 
    a distribution's general DocBook documentation sources?)  
    Or is the need little enough that lynx -dump is good enough 
    for people's purposes?
I think that's partly it.
    > Of course, that depends on your xslt experience and the 
    time you can 
    > allocate. If you need a good control of the outputed text 
    nothing will 
    > beat an XSL IMHO.
    True, but naturally I had hoped to avoid writing such a 
    thing myself. :-)

Two points.

1. Starting with the html stylesheets you have the structure.
2. You'll need to look after white space at that stage.
3. The issue of formatting the plain text is difficult, but has been
done by Eric, its in the faq IIRC. Basically convert everything to
para's or lists, then Erics java code sorts the text, line wrapping
at a given size. 
4. No one, AFAIK, has worked at page breaks. Not easy in xslt. Perhaps
a second pass with a bit of perl or python (or XSLT2?)is the answer there,
maybe even an extension to Erics java.

I'd thought about it; just never needed it badly enough.

HTH DaveP.

** snip here ** 


NOTICE: The information contained in this email and any attachments is 
confidential and may be privileged. If you are not the intended 
recipient you should not use, disclose, distribute or copy any of the 
content of it or of any attachment; you are requested to notify the 
sender immediately of your receipt of the email and then to delete it 
and any attachments from your system. 

RNIB endeavours to ensure that emails and any attachments generated by 
its staff are free from viruses or other contaminants. However, it 
cannot accept any responsibility for any  such which are transmitted.
We therefore recommend you scan all attachments. 

Please note that the statements and views expressed in this email and 
any attachments are those of the author and do not necessarily represent 
those of RNIB. 

RNIB Registered Charity Number: 226227 


Current Thread