[xsl] XSLT vs Schematron Decision: Sanity Check

Subject: [xsl] XSLT vs Schematron Decision: Sanity Check
From: Norm Birkett <Norm.Birkett@xxxxxxxxx>
Date: Wed, 12 Oct 2011 13:25:31 -0400
I'm a bit of an XML neophyte who's jumped/fallen into the deep end of
the pool, and so I'd like to subject a piece of my thinking to the
informed criticism a group like this can provide--a chance to learn from
other people's experiences rather than learning from my own mistakes.
I'll try to distill the problem/question down to its relevant essence.
Here goes:

The crux: I'm trying to build a validating-and-transforming XML filter
in such a way as to yield human-readable documentation of the input XML
language.

The context:

(1) The goal here is to replace a nasty sprawl of legacy code. I'll call
the replacement "NAI" (for "new and improved").
(2) NAI will receive XML documents, produced by various people, systems,
and organizations, representing a variety of more or less complex
financial transactions in pretty gruesome detail.
(3) The input documents are written in an XML language referred to
locally as GENERIC (for reasons lost to history).
(4) This GENERIC language is poorly documented.
(5) It would be very useful if GENERIC were well documented (because it
is constantly growing, and we are constantly adding more people and
organizations who want to feed documents written in GENERIC to NAI).
(6) The first thing NAI must do when it gets an input document is to
validate it.
(7) If the input document is valid, then the next thing NAI does is to
convert it to a different XML language, which I will call INTERNAL. (It
is the internal data representation of the big system into which NAI
serves as a gateway.)
(8) The INTERNAL language is even more poorly documented--but that is a
subject for another day.

The proposed design of NAI:

(Step 1) "Loosely" validate the input document using a schema written in
RELAX NG's compact language ("RNC").
(Step 2) If document passes Step 1, "tightly" validate the input
document using a schema written in Schematron.
(Step 3) If document passes Step 2, transform the input document into
INTERNAL, using XSLT.

The proposed process to produce the human-readable documentation of the
input language:

(A) Translate the RNC used in Step 1 above into RELAX NG's XML language
("RNG").
(B) Use XSLT to translate the RNG into a simple HTML depiction of the
elements and structure of the GENERIC language.
(C) Use XSLT to augment that simple HTML depiction with the tight
validation rules (represented in Schematron--see Step 2 above) that
further define the GENERIC language.
(D) Use XSLT to further augment that increasingly less simple HTML
depiction with links into the HTML depiction of the INTERNAL language.

An important underlying assumption:
(A1) It is an unyielding law of nature that documentation cannot
accurately describe program code unless it is itself a part of or
derived from that code. (And in "that code" I do not include comments,
though I am grudgingly willing to include error messages.)

My question:
===========
In Step 2 of the proposed design of NAI, I find myself asking myself,
"Look--you already have to acquire a lot of XSLT expertise to pull off
the rest of this stunt. And XSLT can be used to represent the same sorts
of validations as Schematron can represent. Why on earth do you want to
introduce yet another language/technology into this mix? Just write your
tight validation rules in XSLT, and have one less thing to learn and
worry about."

To which myself replies (you see why I need a sanity check here):
"Schematron is tidy and small, which will make (C) [see above] much,
much simpler. It also means that learning Schematron isn't such a big
cost. Plus it was designed for validation, whereas XSLT is for
transformations. Use tools for what they're designed for--validators for
validation, transformers for transformations."

I'm leaning pretty strongly toward the-myself-that-favors-schematron's
view of the matter, for the reasons just articulated.

But I can't be said to have mastered Schematron OR XSLT yet, so I'm a
bit skeptical of my ability to compare their capabilities.

It should be added that I'm working in a .NET environment, where the XSL
tools seem to be a bit more numerous than the Schematron tools.

So: What say you, XSLers? Does Step 2 sound to you like a job for
Schematron or for XSLT? Does the .NET environment affect that decision?
Are there any pitfalls you would urge me to watch out for? Any thoughts
you want to share will be appreciated.

Norm Birkett

Current Thread