[xsl] Unit Testing — Good but Not Sufficient

Hi Folks,
A couple of weeks ago I posted a question titled:
"Approaches for validating database transformations using XML/XSLT tools?"
Thank you all for your thoughtful responses. I wanted to follow up now that
I've completed the task (or at least the first phase of it) and share what I
learned.
In brief, the task was to validate a data transformation pipeline that takes
data from a complex source system, applies Python-based transformations, and
loads the results into a simplified, user-facing database. The goal was to
determine whether the transformations were complete, consistent, and
accurate.
The developer who implemented the Python scripts appears to have done
extensive unit testing, and the transformation logic itself is generally
solid. Several of you recommended unit testing and SQL-based comparisons,
which are clearly valuable approaches.
However, I ended up taking a different path: I converted the datasets to XML,
expressed the transformation logic as explicit business rules, and implemented
those rules using Schematron. I then validated those rules against the entire
dataset.
This led to a key realization:
Unit testing is necessary, but not sufficient.
More specifically:

  *   Python unit tests verify that transformation logic behaves correctly for
limited, predefined test cases
  *   Schematron-based validation verifies that the transformation is
correctly applied across all data, including real-world values and edge cases
By expressing the transformation rules in Schematron and applying them to the
full dataset, I was able to:

  *   validate all records, not just samples
  *   detect cross-table inconsistencies
  *   surface transformation behaviors that were not fully documented (e.g.,
normalization rules, default values, filtering logic)
Importantly, this approach also uncovered a discrepancy that unit testing had
not surfaced: a mismatch in row counts between the source system and the
transformed dataset (specifically, additional records present in the target
that were not present in the source). This turned out to be explainable, but
it highlighted the value of validating the entire dataset, not just test
cases.
In other words, this exercise highlighted an important distinction:
Unit tests verify code correctness; dataset-level validation verifies data
correctness.
Both are needed to have confidence in a data transformation pipeline.
Thanks again for your suggestions--they were helpful in shaping the approach,
even if I ultimately went in a different direction.
Best regards,
Roger

<- Previous	Index	Next ->
Re: [xsl] [ANN] Version 28.1 of the, Octavian Nadolu octa	Thread	Re: [xsl] Unit Testing — Good but , Martin Honnen martin
Re: [xsl] [ANN] Version 28.1 of the, Octavian Nadolu octa	Date	Re: [xsl] Unit Testing — Good but , Martin Honnen martin
	Month

<-prev [Thread] next->	<-prev [Date] next->
Month Index \| List Home