Hi,
At 01:50 PM 10/22/2009, you wrote:
Another issue I have is that even if the XMLized version is well
formed, I have to deal with the inclusions in the data. Since the XML
format thatI am converting to conforms to a schema, I am having a very
hard time writing a trasnformation to handle these inclusions.
SGML inclusions are bad news if you need to write a truly generalized
transformation, i.e. a transformation capable of converting any
document conforming to (SGML) DTD A into an equivalent conforming to
DTD B. Of course, some cases are worse than others.
Your choices for alleviating the problem are pretty much limited to
one of these or a combination:
1. Identify an element in your target format that is legal everywhere
the inclusion may appear, and use that.
2. If no such element is available, construct a more complex mapping
that accounts for the inclusion in different ways, depending on where
the elements turn up.
3. Don't attempt a fully general transformation; instead, control the
problem by identifying (usually by analyzing the source data itself,
irrespective of its DTD) where the inclusion is actually used, and
work from there.
If this isn't possible (maybe your data set is open-ended), then
perform a kind of triage, declaring what's in scope for your
transformation and which kinds of structures, formally legal in your
source data (but hopefully unattested in actual data and unlikely in
future data), should be declared out of scope in your transformation.
(It can sometimes be helpful to formalize this limitation, for
example by having an XML variant of your SGML DTD, without
inclusions, to which the data must validate before you accept it as
fit for transformation.)
In other words, you need to regard an SGML inclusion as what it
actually is: a modeling escape hatch. If you can't close the hatch,
you have to find ways either to handle the things that go through it,
or decide they aren't worth the effort, either because they're
unimportant or unlikely. Often, the best available alternative is
some judicious combination of these.
Cheers,
Wendell
======================================================================
Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
----------------------------------------------------------------------
Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================