Re: [xsl] How do you ensure that data is not altered/corrupted in a transformation?

Subject: Re: [xsl] How do you ensure that data is not altered/corrupted in a transformation?
From: "Joel Kalvesmaki director@xxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 19 May 2023 15:49:02 -0000
General agreement here, and some additional principles:

1. Define your expectations. The XSLT was written ostensibly to do some changes. Document for yourself (and others) exactly what changes are expected and which are tolerated. Any other change is, presumably, not desirable.

2. Try one or more tools. Already mentioned in this thread has been XSpec, Schematron, and hashing. You want to make sure that the tool you pick can handle the expectations you defined in #1, and that you're comfortable with it.

3. Automate your testing tool. Again, it depends upon what you've picked and your comfort level, but you can use shell/batch scripts, XProc, ant, Python, C#, etc.

One more method to consider: write a reverse XSLT to attempt to change the output back into the original. Then you can apply one of several available straightforward comparisons.

I used this reverse XSLT strategy in a situation where I needed to parse thousands of plain text files into lossless XML. I wrote the reverse XSLT while developing the main code, which helped me both to define my expectations (on, e.g., whitespace, line breaks) and to quickly find errors in my primary code base. When the project was finished, I was able to demonstrate to my colleagues that the application was completely faithful to the original. Because both the forward and the reverse XSLT were written declaratively, auditors were better placed to understood both intent and reality of the code than if I had adopted a procedural/imperative approach.

(If the output you're testing is lossy vis-C -vis the original, you might need to write two sets of reverse XSLT, one to scrap the losses in the original and the second to restore the output as best as possible.)

Hope this helps,

Joel

On 2023-05-19 04:13, Michael Kay mike@xxxxxxxxxxxx wrote:
Testing, testing, testing. Plus tools to help prevent the mistakes
arising in the first place.

Schema validation can catch a lot of the errors, but it won't catch them all.

We had a case like this where a customer was flagging up dangerous
levels of some reading in medical reports by displaying the relevant
figures in red. When they upgraded from XSLT 1.0 to 2.0, the test
$level > $dangerLevel started doing a string comparison rather than a
numeric comparison, with the effect that the red flags weren't
appearing -- and it took them months to notice, because they weren't
doing enough testing. Hopefully no-one died. Tests are the only
answer: and because stylesheets can be thrown together quickly, people
often neglect to follow good software engineering disciplines when
changing them.

Michael Kay
Saxonica

On 19 May 2023, at 09:37, Roger L Costello costello@xxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

Hi Folks,

In certain domains loss of life may occur if data is altered/corrupted in any way.

Suppose you write an XSLT program which transforms this:

<alt>12000 feet</alt>

to this:

<altitude>12000 feet</altitude>

How do you ensure that the data -- 12000 feet -- was not altered/corrupted in the transformation?

I have heard of people doing a hash on the data prior to the transformation, a hash on the data after the transformation, and then comparing the hashes. Is that what you would do when lives are on the line? What is your recommendation?

/Roger



--
Joel Kalvesmaki
Director, Text Alignment Network
http://textalign.net

Current Thread