Re: [xsl] Does the new structure include the same text content?

Subject: Re: [xsl] Does the new structure include the same text content?
From: "Imsieke, Gerrit, le-tex gerrit.imsieke@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 22 Jan 2021 18:56:26 -0000
Yes, it is often hard to go full round-trip.

What sometimes works better is this: If HTML rendering stylesheets exist for both formats, to tweak both stylesheets so that, with a little or large bit of normalization, identical textual outputs emerge.

I don't know enough about S1000D to know whether this approach is feasible though.

Gerrit

On 22.01.2021 16:58, ian.proudfoot@xxxxxxxxxxx wrote:
Hi Gerrit,
Good to know that I may be on the right track with the normalized text diff.
It would be almost impossible to go back to the original SGML structure from the XML. The main difficulty is that a lot of the structure in the SGML uses inclusions to allow tables and figures in almost any location. That SGML feature was always a recipe for untidy documents!

Ian

-----Original Message-----
From: Imsieke, Gerrit, le-tex gerrit.imsieke@xxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Sent: 22 January 2021 11:45
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [xsl] Does the new structure include the same text content?

Hi Ian,

diffing normalized text output is a good approach in my experience.
However, if the 4.1 structures differ significantly from 1.7 as you say, it might be a good idea to transform the 4.1 output back to 1.7 prior to the diff. Or maybe not "transform it back to match the input exactly", but only to such a degree that the text files will be the same if no content was lost or duplicated.

Gerrit

On 22.01.2021 12:28, ian.proudfoot@xxxxxxxxxxx wrote:
Hi everyone,

I am working on a project to convert several thousand SGML files
(S1000D
1.7) into a more recent XML version (S1000D 4.1). My finished XSLT
style sheet does the job that is expected.  However during the
development I did run into a problem where an error in the stylesheet
allowed the output to pass schema validation but by omitting some
content! For me thatbs very bad news and I was lucky to notice it.
Ultimately the final output will be verified by the subject matter
experts, but I really donbt want to give them any reason to doubt the
reliability of the conversion.

This got me thinking about ways to verify the output text content
against the input despite significantly different structure. Is there
an established way to do that? If so what is it called and how well
does it work?

Perhaps itbs something that I should build into the XSLT as it is
written? Or perhaps it could be run as a post process batch comparison
operation?

My initial thought is to output normalized text from input and output
and compare the resulting text filesb&

Ibve searched the archives, but I probably donbt know the correct
terminology to get any useful resultsb&

Thanks in advance for all responses.

Ian

Ian Proudfoot

Bembridge

Isle of Wight

United Kingdom

Current Thread