Re: [xsl] How do you ensure that data is not altered/corrupted in a transformation?

Subject: Re: [xsl] How do you ensure that data is not altered/corrupted in a transformation?
From: "C. M. Sperberg-McQueen cmsmcq@xxxxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 19 May 2023 17:11:45 -0000
"Roger L Costello costello@xxxxxxxxx"
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> writes:

> In certain domains loss of life may occur if data is altered/corrupted in
any way.
>
> ...
>
> I have heard of people doing a hash on the data prior to the
> transformation, a hash on the data after the transformation, and then
> comparing the hashes. Is that what you would do when lives are on the
> line? What is your recommendation?

When lives are on the line, I would be inclined to attempt a formal
proof that the transformation has in fact preserved all the information
it is critical to preserve, has dropped only information it was intended
to drop (if any), and has added only information it was intended to add
(if any).

The general approach is described in

  C. M. Sperberg-McQueen, bWhat constitutes successful format
      conversion? Towards a formalization of J;intellectual contentb.b
      International Journal of Digital Curation (IJDC) 6.1 (2011):
      153-164.

      http://www.ijdc.net/article/view/170/238
      http://www.ijdc.net/article/download/170/238/0

A concrete but simple example is described in

  Sperberg-McQueen, C. M., Yves Marcoux and Claus Huitfeldt. bDocument
      lattices: Equivalence, compatibility, and contradiction in
      document markup.b Presented at Balisage: The Markup Conference
      2014, Washington, DC, August 5 - 8, 2014. In Proceedings of
      Balisage: The Markup Conference 2014. Balisage Series on Markup
      Technologies, vol. 13
      (2014).

      https://doi.org/10.4242/BalisageVol13.Sperberg-McQueen01
      https://balisage.net/Proceedings/vol13/html/Sperberg-McQueen01/Balisage
Vol13-Sperberg-McQueen01.html

The papers mentioned assume that you want to be confident that a
particular transformation of a particular dataset has not corrupted the
information you care about.  If you want to be confident that a given
transformation will never corrupt information you care about, the task
will be more challenging, both because you will need to construct
correctness proofs and because you will need to define what it means for
a transformation to be correct, neither of which are terribly simple.

In cases where lives are not on the line, it will often make sense to
settle for a less stringent standard of confidence.  For example, as
Michael Kay suggests, to invest a lot in testing.  And as others have
already implicitly or explicitly suggested, a good deal of benefit --
and possibly a good deal of effort -- will accrue already from the task
of *identifying* exactly what information it is critical to preserve and
what additions or omissions of information are expected or allowed.

--
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Current Thread