Re: [xsl] Does the new structure include the same text content?

Subject: Re: [xsl] Does the new structure include the same text content?
From: "ian.proudfoot@xxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 22 Jan 2021 15:58:55 -0000
Hi Gerrit,
Good to know that I may be on the right track with the normalized text diff.
It would be almost impossible to go back to the original SGML structure from
the XML. The main difficulty is that a lot of the structure in the SGML uses
inclusions to allow tables and figures in almost any location. That SGML
feature was always a recipe for untidy documents!

Ian

-----Original Message-----
From: Imsieke, Gerrit, le-tex gerrit.imsieke@xxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Sent: 22 January 2021 11:45
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [xsl] Does the new structure include the same text content?

Hi Ian,

diffing normalized text output is a good approach in my experience.
However, if the 4.1 structures differ significantly from 1.7 as you say, it
might be a good idea to transform the 4.1 output back to 1.7 prior to the
diff. Or maybe not "transform it back to match the input exactly", but only to
such a degree that the text files will be the same if no content was lost or
duplicated.

Gerrit

On 22.01.2021 12:28, ian.proudfoot@xxxxxxxxxxx wrote:
> Hi everyone,
>
> I am working on a project to convert several thousand SGML files
> (S1000D
> 1.7) into a more recent XML version (S1000D 4.1). My finished XSLT
> style sheet does the job that is expected.  However during the
> development I did run into a problem where an error in the stylesheet
> allowed the output to pass schema validation but by omitting some
> content! For me thatbs very bad news and I was lucky to notice it.
> Ultimately the final output will be verified by the subject matter
> experts, but I really donbt want to give them any reason to doubt the
> reliability of the conversion.
>
> This got me thinking about ways to verify the output text content
> against the input despite significantly different structure. Is there
> an established way to do that? If so what is it called and how well
> does it work?
>
> Perhaps itbs something that I should build into the XSLT as it is
> written? Or perhaps it could be run as a post process batch comparison
> operation?
>
> My initial thought is to output normalized text from input and output
> and compare the resulting text filesb&
>
> Ibve searched the archives, but I probably donbt know the correct
> terminology to get any useful resultsb&
>
> Thanks in advance for all responses.
>
> Ian
>
> Ian Proudfoot
>
> Bembridge
>
> Isle of Wight
>
> United Kingdom

Current Thread