|
Subject: [xsl] Unit Testing — Good but Not Sufficient From: "Roger L Costello costello@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Sun, 29 Mar 2026 11:27:36 -0000 |
Hi Folks, A couple of weeks ago I posted a question titled: "Approaches for validating database transformations using XML/XSLT tools?" Thank you all for your thoughtful responses. I wanted to follow up now that I've completed the task (or at least the first phase of it) and share what I learned. In brief, the task was to validate a data transformation pipeline that takes data from a complex source system, applies Python-based transformations, and loads the results into a simplified, user-facing database. The goal was to determine whether the transformations were complete, consistent, and accurate. The developer who implemented the Python scripts appears to have done extensive unit testing, and the transformation logic itself is generally solid. Several of you recommended unit testing and SQL-based comparisons, which are clearly valuable approaches. However, I ended up taking a different path: I converted the datasets to XML, expressed the transformation logic as explicit business rules, and implemented those rules using Schematron. I then validated those rules against the entire dataset. This led to a key realization: Unit testing is necessary, but not sufficient. More specifically: * Python unit tests verify that transformation logic behaves correctly for limited, predefined test cases * Schematron-based validation verifies that the transformation is correctly applied across all data, including real-world values and edge cases By expressing the transformation rules in Schematron and applying them to the full dataset, I was able to: * validate all records, not just samples * detect cross-table inconsistencies * surface transformation behaviors that were not fully documented (e.g., normalization rules, default values, filtering logic) Importantly, this approach also uncovered a discrepancy that unit testing had not surfaced: a mismatch in row counts between the source system and the transformed dataset (specifically, additional records present in the target that were not present in the source). This turned out to be explainable, but it highlighted the value of validating the entire dataset, not just test cases. In other words, this exercise highlighted an important distinction: Unit tests verify code correctness; dataset-level validation verifies data correctness. Both are needed to have confidence in a data transformation pipeline. Thanks again for your suggestions--they were helpful in shaping the approach, even if I ultimately went in a different direction. Best regards, Roger
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| Re: [xsl] [ANN] Version 28.1 of the, Octavian Nadolu octa | Thread | Re: [xsl] Unit Testing — Good but , Martin Honnen martin |
| Re: [xsl] [ANN] Version 28.1 of the, Octavian Nadolu octa | Date | Re: [xsl] Unit Testing — Good but , Martin Honnen martin |
| Month |