This framework depends on some unique features of DITA but it could be
adapted to generate HTML directly rather than DITA.
The transformation is implemented as a two-phase process:
Phase 1: Generated a simplified form of the Word XML, which I call "simple
word processing format". This captures the essential structure and style
details of the original Word document while eliminating all of the hideous
verbosity of the Office Open markup design.
Phase 2: Transform the simple word processing doc into DITA. This relies
on a separate style-to-tag mapping document that relates Word styles to
DITA structures. This depends on heavily on for-each-group and the code is
a bit gnarly--it grew rather organically and, while it works, I can't
claim it reflects the best engineering approach. If I were to ever rewrite
the code I'm sure I would make it much cleaner and clearer.
This second phase could be replaced with a new HTML-generation phase,
driven either by the existing style-to-tag map or by a new one or just by
some static binding from styles to HTML markup (if such a thing is
The Phase 1 process is pretty stable--I only have to update it when some
new Word feature requires support from a client.
The code is in GitHub here:
Eliot Kimber, Owner
On 5/28/16, 8:39 PM, "adam adam@xxxxxxxxxxxxxxx"
>I'm new to the list. My usual home is at the Collaborative Knowledge
>So, I was poking around looking for any community/co-ordinated attempts
>at creating some robust XSL transformations from docx to HTML. I'm aware
>of TEI stylesheets and have had a good poke around in github and
>elsewhere, but I'm looking at straight docx->html (sans TEI) and the few
>stylesheet repos I find are not so well maintained. I am probably
>missing some, so any recommendations for a thriving hub of energy around
>this particular conversion is would be appreciated.
>However, what I'm really looking for is an active community, possibly
>with its own list or web based presence where there is a community
>effort to improve specific conversion types. Essentially. Im wondering
>if this already exists for docx->html or if not, then are their similar
>attempts I can learn from?....my inclination is to look for, or set up,
>something that had a web based component for testing so that non-XSL
>experts could also contribute through manual QA of results etc...
>Any thoughts or tips welcomed....