Hi,
At 05:19 PM 3/27/2009, you wrote:
I have a question. While converting SGML to XML, there is an XML
schema that the XML has to be validated against. I did the mapping
between the SGML elements and the XML schema elements. I was not
clear on how to use the n2x to be able to use the XML schema while
converting the SGML.
Probably you wouldn't use it directly. You would perform the
conversion and then validate the results using the XML schema as a
secondary process.
This is actually a good thing: although you might have a schema that
is intended to describe the results of the conversion, it might not
be correct or complete. If your conversion is meant to be a
straightforward syntactic conversion, any validation errors would be
indications of unaddressed requirements for your XML schema, not for
your conversion process. That is, if you are only changing the syntax
of your documents, and may not rearrange or rename elements and
attributes, then your XML schema has to be fitted to the results of
your conversion, not the other way around.
And even if you assume the schema is correct and complete, you might
not want the processor to decide what changes to make, if it has to
make changes, to get the output to be valid. If you have an XML
schema in hand, known (or defined) to be correct and complete, and
the XML you get from rewriting your SGML as XML is not already valid,
part of your conversion may have to involve transformation.
If this is the case, the problem is more complicated: you need to use
SP or n2x or another SGML tool to make XML syntax, and then you need
to alter this XML, making new XML that is valid to your schema.
Accordingly, it's easier to split your process up into distinct
phases, dealing with syntax and tagging semantics (restructuring and
renaming) separately:
1. Convert your SGML to XML syntax (a fairly straightforward
syntactic conversion)
2. Optionally, derive an XML schema as a *descriptive* exercise, to
help show what your new XML looks like and reveal where adjustments
have to be made
3. Design and implement a transformation that maps your data from
this XML to your target XML format
4. Validate against your target XML schema to check your results
Phase 1 can use SP or n2x. Phase 2 is actually optional, although
very useful (it will help you do a better job with phase 3).
Phase 3 can use XSLT, which is why this post is (barely) on topic.
Phase 4 is essentially a test to see whether Phase 3 has been
performed correctly. It does not guarantee the transformation is
correct (a machine cannot do that without help), but it is necessary.
At no point do you need to use an XML schema directly with your
SGML-to-XML conversion. (You do need your SGML DTD to parse your SGML though.)
How hard this all is really depends on how close your target XML
schema already is to the SGML DTD. Making an XML schema to which your
SGML (once it is syntactically XML) can be guaranteed valid without
alteration is easier for some SGML DTDs than others. If your SGML DTD
is very XML-like it might be fairly easy.
If it isn't, it may be easier to do the opposite: first make your
XML, then the schema for it. Especially if your data set is bounded,
you can cast your SGML into XML syntax, and then derive an XML schema
to describe it. You would do this particularly if your data were more
important than your schema. (Similarly, if you had a collection of
fine porcelain, you might acquire the right number and size of boxes
to store it in, instead of getting rid of some of it to fit the boxes you had.)
The deeper reasons for all this are rooted in modeling features of
SGML that are not particularly XML-friendly and which cannot be
readily expressed in XML schemas. The job you are looking at will be
easier if your SGML does not use these features.
I hope that helps,
Wendell
======================================================================
Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
----------------------------------------------------------------------
Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================