David.Pawson@xxxxxxxxxxx wrote:

I'd like to know if anyone can share some experience on the best way to prep MS Word content for XSL processing. lists some word to xml tools.
My favourite is upcast from

I use it regularly.

seems like an awsome product. I'd be mainly insterested in the Java API but that version is excessively expensive for the small contracts I do.
Really looks like one very nice product though, too bad it's out of reach.

YAWC seems good though. Reasonably priced anyway. And it uses simplified docbook :)))
I'm not a big fan of VB(A) though :(
however, may be able to write simple COM wrapper and use it from within PHP :/
Will have to do some more investigations to see the effecitveness of the conversion.

thanks for the tip!

I realise this may depend on the way the Word document is prepared, but since I haven't got a copy yet, I'm hoping it will use Word's structural capabilities -- ie. sections/chapters whatever.

Suggestion, applicable only if the style usage is a mess.
Ctrl-A apply style plain/normal whatever.
Go through and mark up headers as deep as needed.
Do any other styling.

Its so much easier to work with default word styles,
then tools can get some level of structure out of it.

Sounds like an excellent idea to me. If I'm gonna do this, I may as well use the DocBook template (which has all the styles according to the docbook tags) in OpenOffice Writer, do *exactly* what you have just suggested, and then export the docbook XML file. Hardly automated but probably the only choice I have.
(you'll find the docbook template here)

Might have to write a little SAX routine to chop it up into chapter files or whatever -- or perhaps a batch file which uses XT :/
Better yet, chop it up into collections, put it into Xindice and allow website users to slice/dice how they want :)

Thanks for the help.
Much appreciated.

