RE: [xsl] [XSLT 2.0] Checking that an element's value has the desired datatype?

Subject: RE: [xsl] [XSLT 2.0] Checking that an element's value has the desired datatype?
From: "Costello, Roger L." <costello@xxxxxxxxx>
Date: Fri, 20 Oct 2006 07:09:22 -0400
Hi Folks,

I am forwarding the below message from Rick Jelliffe.  It's well worth
the read.  /Roger
-----------------------------------------------------------------------
-----

Michael Kay wrote:
>> Isn't this a valid use of a schema language?  /Roger
>>
>
> I would say that you are using Schematron here as a query language,
> not as a schema language.
>
Roger invited me to post a comment.

When anything becomes large, complicated and changeable, it is useful
to layer it and, if it grows to a certain size, to have custom
languages to express the layers tersely, to have good analytical models
for each of the layers, and to have a division of labour to take
advantage of the layers.

One very simple division is that used by database people, that static
constraints (all data at all times) belong to schemas, while dynamic
constraints (co-occurrence constraints, value-value constraints, etc)
belong to a level of constraint language, while constraints relating to
links (keys, indexing) belong to database administration; and other
kinds of constraints are deemed "business rules" and belong to the
world of queries and applications. (When database schemas change, of
course, it turns out their schemas are not so static...)

This is quite a useful model, and clearly when many database-background
people say the word "schema" they mean static constraints and typing
with only the simplest context, equivalent to fields and tables or
simple content models. To them, Schematron is not a schema language but
is a constraint language or a business rules language or a query
language.

The one line description of Schematron I give is "a language for
asserting the presence or absence of patterns in an XML document";  I
don't care how people class it: whether people think it is a schema
language or not says nothing about Schematron but it just reveals their
particular world-view of where how constraints should be partitioned.
To me, that is a practical issue not a theoretical one.

However, Schematron, especially using XPath (for example, hiding an
underlying XSLT implementation perhaps), is a general purpose XML
language. So rather than saying "Schematron is a schema language" some
people (Michael) might be more comfortable if we say "Schematron can be
used as a schema language" and "Schematron can be used as a constraint
language" and "Schematron can be used as a business rules language" and
"Schematron can be used as a query language".

As a schema language, Schematron has the advantage of being able to
model relations better than grammars can: information can divided into
different branches and into different documents. There are many simple
constraints that cannot be expressed by regular grammars (such as "each
news story needs at least one <who>, <what>, <where> and <when> but
they can go anywhere) or which cannot be expressed by particular
grammar-based schema languages (such as XSD's quaint refusal to allow
attribute values to influence type, except for xsi:type); Schematron
can be a perfectly good schema language for those. Swings and
roundabouts, of course: some kinds of complex content models are
difficult to express, but these are probably signs of underlayering
anyway.

One reason that people use Schematron seems to be that XSD frequently
allows you to test non-problems: it gives you a lot of power to be able
validate what probably are invariants in your system anyway! If you
make your schema based on your database, for example, then the XSD
structures and datatyping may just serve to document what you have
rather than provide much value in finding problems. And when people
architect in a Schematron stage in order to validate these "non-schema"
constraints (ie, those constraints not inevitable from the DBMS'
schemas) and abandon XSD validation as fat and complex, it is only
natural that they then start to retro-fit simple structural constraints
in the Schematron schemas too.

So the issue of what a schema language is and what a query language or
constraint language is less interesting, perhaps, than the issue "How
do I arrange my validation and datatyping requirements into an optimal
workflow both for people and for processing?"  Schematron has a
mechanism called <phase> specifically to allow constraints to be
grouped: for example, Michael might be more at home by grouping
constraints into phases called "schema", "constraint", "businessRules"
and "queries" for example.

So I agree with Michael, if he is saying that not all constraints
belong to "schemas" in the sense of invariant constraints that are of
interest for data storage; however, frequently in XML we don't
manipulate the data outside a database and in that situation a
constraint-layering model based on storage concerns is a carbuncle at
best and folly at worst. In those cases, the constraints that belong in
the "schema" may indeed fruitfully involve more complex constraints.

Cheers
Rick Jelliffe

Current Thread