Re: [xsl] Why should I put code in XSLT to check the input data or the output data?

Subject: Re: [xsl] Why should I put code in XSLT to check the input data or the output data?
From: "Imsieke, Gerrit, le-tex" <gerrit.imsieke@xxxxxxxxx>
Date: Fri, 21 May 2010 01:31:52 +0200
On 20.05.2010 22:59, Costello, Roger L. wrote:
Hi Folks,

My thinking is that the input data should be checked using XML Schema and Schematron. Ditto for the data that is output by XSLT.

I see no reason to include in an XSLT program any code to check data. It just makes the program bigger, harder to understand, harder to maintain, and harder to debug.

Do you agree?

Let me give two counterexamples.


1.

Today I wrote XSLT code to check some intermediate data (neither input nor output). The task was to convert tabular data (any of CSV, XHTML, and Excel XML export) to a custom XML format. The first line of each table is expected to contain column names.

A test was needed to make sure that all required columns (or rather, their names) are present, and columns whose name isn't expected should be reported. That is to make sure that the authors use standardized names such as "Abbreviation", "Synonym(s)", "Subject classification" rather than, for example, "Abreviation", "equiv.", "Main subject".

It is evident that no grammar-based validation may perform a test like this.

But also Schematron doesn't work out of the box here. This is because a) there may be non-XML input such as CSV and b) the test should be carried out in the middle of a multi-pass XSLT transformation, after data import and normalization, prior to custom-format generation, hyphenation, etc.

Having the test after normalization saves me from writing input-format-specific test code. Or if I don't want to write format-specific Schematron, I could import and call my XSLT 2.0 input normalization functions in Schematron assertions. This is possible because, ironically, Schematron compiles to XSLT. An XSLT stylesheet actually performs the check. Then I will have written part of the test (not the actual assertion but a function called from within the assertion) in XSLT...

Alternatively, I can split my multi-pass stylesheet into several distinct XSLT invocations for the individual normalization/transformation steps, with some Schematron tests on an intermediate, normalized output format. I guess that's what XProc is designed for: having a processing pipeline of distinct transformation/check stages instead of a monolithic XSLT stylesheet that does everything. It's feasible and reasonable to split my transformation/checks in such a way.

In that sense you are partly right. Assembling a pipeline from distinct transform/check steps may lead to code that is easier to understand and to maintain.

2.

But sometimes practical considerations force you to check within XSLT. Consider an import/export scenario such as in OpenOffice or InDesign. You may only specify a single XSLT file for each transformation direction (import / export). Even if you shipped a Schematron that is already compiled to XSLT or included in a RelaxNG schema, you couldn't run both the transformation and the check during import/export.

But you can include the checks as XSLT code in the monolithic XSLT file. The code may add comments in the generated files, making the user aware of issues in their data. You will be able to distribute the transformation/check as a single file (a .jar file containing XSLT in OpenOffice, .xsl in InDesign). You don't have to tell the user how to set up Schematron or Schematron-aware RelaxNG validators and how to make sense of SVRL messages. Their main application will report the issues in a custom yet comprehensible format (for example, OpenOffice comments, InDesign pragraphs in red ink, XML comments, ...).


May be these are all-too mundane considerations, given the categorical nature of your statement.


But fortunately XSLT is well suited and frequently used for many mundane tasks, and, as Wendell said, it's hard to give a categorical answer to your question.

Gerrit


/Roger



-- Gerrit Imsieke Geschdftsf|hrer / Managing Director le-tex publishing services GmbH Weissenfelser Str. 84, 04229 Leipzig, Germany Phone +49 341 355356 110, Fax +49 341 355356 510 gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

Geschdftsf|hrer: Gerrit Imsieke, Svea Jelonek,
Thomas Schmidt, Dr. Reinhard Vvckler

Current Thread