Subject: Re: [xsl] Verifying large XSL transform output From: Paul Tyson <phtyson@xxxxxxxxxxxxx> Date: Tue, 11 Feb 2014 20:35:09 -0600 |
Hi Matthew, Schematron is your best friend for validating XML content. 1. Sketch out some schematron rules that would validate your output (with reference to the source). 2. Write one or more XSLT stylesheets to generate schematron rules from your input, to validate specific content in the target document. 3. Run your production transformation and your schematron rule generator over the source documents. 4. Compile the schematron files into xslt. 5. Run the schematron xslt files against your output to get SRVL (Schematron Report Validation Language) files (or whatever format you like). 6. (Optional) Transform SRVL files to a readable report form (e.g. HTML). 7 (Optional) Put it all together in an Xproc pipeline and automate! 8. Iterate steps 1-6 until there are no further improvements to be made and you are satisfied with the validation. Have fun. Regards, --Paul P.S. I did this a while back to validate several thousand XML documents that were generated by a sausage-grinder conversion (not XSLT) from flat files (think spreadsheets). The requirement was to check hundreds--sometimes thousands--of data fields in each file for exact match with input. The process worked very well. There are a few gotchas to watch out for. You'll have to be careful with quoting, variables, curly braces, namespaces, and xpath expressions since you're writing xslt to generate a file (in schematron language) that will itself be turned into xslt. Character entities may also be a problem, so you'll have to preserve those through all the transformation steps. But once you get the hang of it, it goes very well. P.P.S. I'm not in the regular business of writing xslt to transform documents, but it seems this approach would be a good way to implement test-driven stylesheet development. You could co-develop the validation rules and the transformation, and test as you go, using real input. Best, --Paul On Tue, 2014-02-11 at 10:36 -0500, Matthew Stoeffler wrote: > I have a question about verifying XSL transform output. I'm moving > somewhat large XML docs --digital books-- from one format into another > (archival) format, with lots of pulling and pushing. The source > format is, euphemistically speaking, 'interesting', and not the kind > of thing you'd necessarily want to emulate: lots of too-loose content > models granting multiple structural variations for the same > intellectual object; much cross-document referencing via PIs, etc. > The transform scripts are large. I know my results are valid in the > new format; I'm now trying to confirm that I'm capturing all the > content. I've done analysis of ID's from source to output. I have > contemplated ways of counting text nodes, or text string length, as > another possible approach. I'd love some feedback from the list on > metrics others have tried and what seems to work best. Thanks in > advance. > > m./
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Verifying large XSL trans, Graydon | Thread | [xsl] Ignoring ambiguous matches, Ihe Onwuka |
Re: [xsl] Verifying large XSL trans, Greg Hunt | Date | Re: [xsl] Verifying large XSL trans, Matthew Stoeffler |
Month |