Hi,
At 03:53 PM 10/14/2009, XMLizer wrote:
Well ODF and OOXML have pretty different structure (ODF is block
oriented and OOXML is run oriented)
I would go for option 1 waiting for tools to do option 3
That's funny that you didn't propose option 4 (XML --> OOXML --> ODF)
Probably Option 2 could be interesting with a recognized intermediate
format (DocBook, TEI or DITA) but I'm not sure there is converter
available yet
I don't know enough about OOXML to address it, but I can say
something about the ODF conversion.
I've written an application that makes ODF in two transformation
steps. The first creates an "ODF-ready" intermediate format that
allows XML-native structures such as arbitrary nesting of block or
inline elements. The second converts this format into ODF (actually
the second step has an internal pipeline of its own).
Since the hard parts of the conversion are all handled in the second
phase, the first is fairly straightforward. (This is good for
maintenance, plus it ought to be fairly easy to replace this XSLT
with another, from a different source format.) The second phase
transform is not so straightforward largely because in the general
case, the problems you have to solve tend to be harder than they
look: nested structures (sometimes deeply nested) have to be
flattened and sometimes split, while whatever relevant semantics are
implied by their nesting must be preserved in the result.
Fortunately, in XSLT 2.0 this is at least doable. (The advantage of
exposing and formalizing an intermediate format is essentially that
it gives you a place to manage and constrain exactly what sorts of
semantics will be handled in the second phase, thus reducing the
impact of combinatorial explosion of N-ary relations between element
types and attribute values in the source.)
The experience of building this has given me, I think, some insight
into the problem:
1. This two-phase approach does work, and it simplifies the problem
of getting from descriptive tagging into word processors. The
intermediate format, however (as I think Eliot suggested) does have
to be designed for the purpose. An arbitrary descriptive format such
as NLM or Docbook really won't do -- although it could work nicely as
the original format (and so also as a post-editorial,
"pre-production" format for getting to the intermediate format).
In fact, it works well enough that I think there is real potential
for ODF applications like OpenOffice to expose such an intermediate
format as a more robust and easier option for interfacing with native
XML formats than ODF itself.
2. Round-tripping is another entirely different kettle of fish. One
reason this approach works is that you can map highly descriptive
elements (such as document metadata) into formatting analogues --
when (and only when) you want to -- in effect making a page design
for them. Getting back the other way isn't going to be easy, even if
the requirements can be defined in such a way to make it technically
feasible. Plus, there are a myriad of tricky technical problems with
structural inferencing, etc.
Not that round-tripping won't eventually be done. But I think we may
have to see significant evolution in word processors before it gets
to be nice, transparent and stress-free. So far, word processors have
been almost entirely beholden to the requirement to be what I call
"paintbrush applications", which are very valuable for many purposes
-- just not for creating and managing structurally sound documents
for semantic interchange.
Cheers,
Wendell
======================================================================
Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
----------------------------------------------------------------------
Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================