Re: [xsl] Tracking entity references

Subject: Re: [xsl] Tracking entity references
From: "Wegmann, Frank frank.wegmann@xxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 28 Mar 2023 11:45:08 -0000
Thanks Michael. That explains my futile search for an event handler to look
for the start of any entity reference.

Thanks for the interesting suggestions from Chris and Peter. I think, I could
probably be better off with a second pass on a file, now doing the parsing
myself. At that point I know at least exactly what entities have been declared
and where external ones are located.

Thing is that these documents are simply not readable by anyone else than the
original author and I want to come up with a statistics telling me which
entities have been used how many times in what particular context to assess
which ones may constitute reasonable use and which ones just push a principle
to its limits on the cost of readability and maintainability. The alternative
is to use a file with all references resolved and pass that to those who now
have to work on that. In fact, I did just that to enable them to read and
understand the documents.

f.


From: Michael Kay mike@xxxxxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Sent: Tuesday, March 28, 2023 12:33 PM
To: xsl-list <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: [xsl] Tracking entity references

You have to dive in at a lower level than XSLT, because as you say, XSLT only
sees the document after all XML entities have been resolved by the XML
parser.

In the Java world, a SAX parser will report entity boundaries to the
LexicalHandler -

https://docs.oracle.com/javase/8/docs/api/org/xml/sax/ext/LexicalHandler.html
#startEntity-java.lang.String-<https://eur04.safelinks.protection.outlook.com
/?url=https%3A%2F%2Fdocs.oracle.com%2Fjavase%2F8%2Fdocs%2Fapi%2Forg%2Fxml%2Fs
ax%2Fext%2FLexicalHandler.html%23startEntity-java.lang.String-&data=05%7C01%7
Cfrank.wegmann%40softwareag.com%7Ce62666eb4cef4c98a5d108db2f77d4b2%7Cd9662eb9
ad984e74a8a204ed5d544db6%7C1%7C0%7C638155963938657121%7CUnknown%7CTWFpbGZsb3d
8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C
%7C%7C&sdata=Ph8kngNvCE36ef1JO7WqLrU2GT1f1QNSN%2Fp1x1S3T2E%3D&reserved=0>

Well some of the entity boundaries anyway. It doesn't report entity boundaries
within attribute values.

Michael Kay
Saxonica


On 28 Mar 2023, at 11:23, Wegmann, Frank
frank.wegmann@xxxxxxxxxxxxxx<mailto:frank.wegmann@xxxxxxxxxxxxxx>
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx<mailto:xsl-list-service@xxxxxxxxxxxx
rytech.com>> wrote:

This may be a very silly question, but I couldn't find an answer on it, with
Saxon, or outside it, using tools relying on expat.

Use case is a set of old XML documents (traditional, inhouse DTD) that make
overly excessive use of entities of all kinds. While it is easy to get all
entity declarations, and also to locate references of external entities, I
could not locate references of text entities. By the time I see it the entity
reference has already been resolved. How can I achieve that (ideally with
line/column of the actual place in the original file or entity)?

I very much hope that I just overlooked something here...

Thanks,
Frank Wegmann
Software AG



Software AG - Sitz/Registered office: Uhlandstra_e 12, 64297 Darmstadt,
Germany - Registergericht/Commercial register: Darmstadt HRB 1562 -
Vorstand/Management Board: Sanjay Brahmawar (Vorsitzender/Chairman), Daniela
B|nger, Joshua Husk, Dr. Benno Quade, Dr. Stefan Sigg -
Aufsichtsratsvorsitzender/Chairman of the Supervisory Board: Christian Lucas -
https://www.softwareag.com<https://eur04.safelinks.protection.outlook.com/?ur
l=https%3A%2F%2Fwww.softwareag.com%2F&data=05%7C01%7Cfrank.wegmann%40software
ag.com%7Ce62666eb4cef4c98a5d108db2f77d4b2%7Cd9662eb9ad984e74a8a204ed5d544db6%
7C1%7C0%7C638155963938813368%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQ
IjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=bHXoyx9AfdxP
xeCU2Hl8oMOu%2FTlKFVdSUK9gm5F2N%2FM%3D&reserved=0>
XSL-List info and
archive<https://eur04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.
mulberrytech.com%2Fxsl%2Fxsl-list&data=05%7C01%7Cfrank.wegmann%40softwareag.c
om%7Ce62666eb4cef4c98a5d108db2f77d4b2%7Cd9662eb9ad984e74a8a204ed5d544db6%7C1%
7C0%7C638155963938813368%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoi
V2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=f%2FMErFf7uIzIQe
Wo0QoVewnpaquw9V%2BPfhYpxjdj0ww%3D&reserved=0>
EasyUnsubscribe<https://eur04.safelinks.protection.outlook.com/?url=http%3A%2
F%2Flists.mulberrytech.com%2Funsub%2Fxsl-list%2F293509&data=05%7C01%7Cfrank.w
egmann%40softwareag.com%7Ce62666eb4cef4c98a5d108db2f77d4b2%7Cd9662eb9ad984e74
a8a204ed5d544db6%7C1%7C0%7C638155963938813368%7CUnknown%7CTWFpbGZsb3d8eyJWIjo
iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&s
data=hU9WfxdDiQHNJKLdckMNrd6Gu0CF0y6tapONCSt%2Fyuo%3D&reserved=0> (by email)

XSL-List info and
archive<https://eur04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.
mulberrytech.com%2Fxsl%2Fxsl-list&data=05%7C01%7Cfrank.wegmann%40softwareag.c
om%7Ce62666eb4cef4c98a5d108db2f77d4b2%7Cd9662eb9ad984e74a8a204ed5d544db6%7C1%
7C0%7C638155963938813368%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoi
V2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=f%2FMErFf7uIzIQe
Wo0QoVewnpaquw9V%2BPfhYpxjdj0ww%3D&reserved=0>
EasyUnsubscribe<https://eur04.safelinks.protection.outlook.com/?url=http%3A%2
F%2Flists.mulberrytech.com%2Funsub%2Fxsl-list%2F1110376&data=05%7C01%7Cfrank.
wegmann%40softwareag.com%7Ce62666eb4cef4c98a5d108db2f77d4b2%7Cd9662eb9ad984e7
4a8a204ed5d544db6%7C1%7C0%7C638155963938813368%7CUnknown%7CTWFpbGZsb3d8eyJWIj
oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&
sdata=Otk1DVvI7W0oxVPpKLASOCcRWCc6y4d6TXpn9j8j7%2Bg%3D&reserved=0> (by
email<>)

Current Thread