Re: [xsl] Tracking entity references

Subject: Re: [xsl] Tracking entity references
From: "Peter Flynn peter@xxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 28 Mar 2023 11:28:31 -0000
On 28/03/2023 11:23, Wegmann, Frank frank.wegmann@xxxxxxxxxxxxxx wrote:
This may be a very silly question, but I couldnbt find an answer on
it, with Saxon, or outside it, using tools relying on expat.

Use case is a set of old XML documents (traditional, inhouse DTD)
that make overly excessive use of entities of all kinds. While it is
easy to get all entity declarations, and also to locate references of
external entities, I could not locate references of text entities. By
the time I see it the entity reference has already been resolved. How
can I achieve that (ideally with line/column of the actual place in
the original file or entity)?

I would edit or otherwise process a copy of the DTD (and any entity files) and extract all the entity names to another file in the form of system entity declarations to a non-existent file, then edit the DTD and entity files to delete or comment out all the entity declarations.

(This does of course need robust checking because it's a non-SGML
operation, and DTD syntax is notorious for authors putting stuff in
weird places because the parser is super-tolerant of spacing. Entities
within inclusion or exclusion exceptions may need hand working :-)

If you then run (eg) onsgmls invoking the newly-made catalog of bogus
system entity declarations and set the max error number up high, you
should get a ton of error messages, one for each entity reference that
fails to resolve, giving the line and character number in the document
where it occurs.

Peter

Current Thread