On 20.05.2010 22:59, Costello, Roger L. wrote:
Hi Folks,
My thinking is that the input data should be checked using XML Schema and Schematron. Ditto for the data that is output by XSLT.
I see no reason to include in an XSLT program any code to check data. It just makes the program bigger, harder to understand, harder to maintain, and harder to debug.
Do you agree?
Let me give two counterexamples.
1.
Today I wrote XSLT code to check some intermediate data (neither input
nor output). The task was to convert tabular data (any of CSV, XHTML,
and Excel XML export) to a custom XML format. The first line of each
table is expected to contain column names.
A test was needed to make sure that all required columns (or rather,
their names) are present, and columns whose name isn't expected should
be reported. That is to make sure that the authors use standardized
names such as "Abbreviation", "Synonym(s)", "Subject classification"
rather than, for example, "Abreviation", "equiv.", "Main subject".
It is evident that no grammar-based validation may perform a test like this.
But also Schematron doesn't work out of the box here. This is because a)
there may be non-XML input such as CSV and b) the test should be carried
out in the middle of a multi-pass XSLT transformation, after data import
and normalization, prior to custom-format generation, hyphenation, etc.
Having the test after normalization saves me from writing
input-format-specific test code. Or if I don't want to write
format-specific Schematron, I could import and call my XSLT 2.0 input
normalization functions in Schematron assertions. This is possible
because, ironically, Schematron compiles to XSLT. An XSLT stylesheet
actually performs the check. Then I will have written part of the test
(not the actual assertion but a function called from within the
assertion) in XSLT...
Alternatively, I can split my multi-pass stylesheet into several
distinct XSLT invocations for the individual
normalization/transformation steps, with some Schematron tests on an
intermediate, normalized output format. I guess that's what XProc is
designed for: having a processing pipeline of distinct
transformation/check stages instead of a monolithic XSLT stylesheet that
does everything. It's feasible and reasonable to split my
transformation/checks in such a way.
In that sense you are partly right. Assembling a pipeline from distinct
transform/check steps may lead to code that is easier to understand and
to maintain.
2.
But sometimes practical considerations force you to check within XSLT.
Consider an import/export scenario such as in OpenOffice or InDesign.
You may only specify a single XSLT file for each transformation
direction (import / export). Even if you shipped a Schematron that is
already compiled to XSLT or included in a RelaxNG schema, you couldn't
run both the transformation and the check during import/export.
But you can include the checks as XSLT code in the monolithic XSLT file.
The code may add comments in the generated files, making the user aware
of issues in their data. You will be able to distribute the
transformation/check as a single file (a .jar file containing XSLT in
OpenOffice, .xsl in InDesign). You don't have to tell the user how to
set up Schematron or Schematron-aware RelaxNG validators and how to make
sense of SVRL messages. Their main application will report the issues in
a custom yet comprehensible format (for example, OpenOffice comments,
InDesign pragraphs in red ink, XML comments, ...).
May be these are all-too mundane considerations, given the categorical
nature of your statement.
But fortunately XSLT is well suited and frequently used for many mundane
tasks, and, as Wendell said, it's hard to give a categorical answer to
your question.
Gerrit
/Roger
--
Gerrit Imsieke
Geschdftsf|hrer / Managing Director
le-tex publishing services GmbH
Weissenfelser Str. 84, 04229 Leipzig, Germany
Phone +49 341 355356 110, Fax +49 341 355356 510
gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de
Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930
Geschdftsf|hrer: Gerrit Imsieke, Svea Jelonek,
Thomas Schmidt, Dr. Reinhard Vvckler