Subject: Re: [xsl] Tracking entity references From: "Eliot Kimber eliot.kimber@xxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Tue, 28 Mar 2023 14:31:31 -0000 |
As entity references are reliably found with simple string matching it might be sufficient to just do a regex search: % egrep -hoR '&([^;]+);' . --include="*.dita" | sort | uniq -dc 94 ​ 175 
 7   6807 & 1952 ' 54987 > 60125 < 25692 " % _ Cheers, E. _____________________________________________ Eliot Kimber Sr Staff Content Engineer O: 512 554 9368 M: 512 554 9368 servicenow.com<https://www.servicenow.com> LinkedIn<https://www.linkedin.com/company/servicenow> | Twitter<https://twitter.com/servicenow> | YouTube<https://www.youtube.com/user/servicenowinc> | Facebook<https://www.facebook.com/servicenow> From: Wegmann, Frank frank.wegmann@xxxxxxxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Tuesday, March 28, 2023 at 6:51 AM To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx <xsl-list@xxxxxxxxxxxxxxxxxxxxxx> Subject: Re: [xsl] Tracking entity references [External Email] ________________________________ Thanks Michael. That explains my futile search for an event handler to look for the start of any entity reference. Thanks for the interesting suggestions from Chris and Peter. I think, I could probably be better off with a second pass on a file, now doing the parsing myself. At that point I know at least exactly what entities have been declared and where external ones are located. Thing is that these documents are simply not readable by anyone else than the original author and I want to come up with a statistics telling me which entities have been used how many times in what particular context to assess which ones may constitute reasonable use and which ones just push a principle to its limits on the cost of readability and maintainability. The alternative is to use a file with all references resolved and pass that to those who now have to work on that. In fact, I did just that to enable them to read and understand the documents. f. From: Michael Kay mike@xxxxxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Sent: Tuesday, March 28, 2023 12:33 PM To: xsl-list <xsl-list@xxxxxxxxxxxxxxxxxxxxxx> Subject: Re: [xsl] Tracking entity references You have to dive in at a lower level than XSLT, because as you say, XSLT only sees the document after all XML entities have been resolved by the XML parser. In the Java world, a SAX parser will report entity boundaries to the LexicalHandler - https://docs.oracle.com/javase/8/docs/api/org/xml/sax/ext/LexicalHandler.html #startEntity-java.lang.String-<https://eur04.safelinks.protection.outlook.com /?url=https%3A%2F%2Fdocs.oracle.com%2Fjavase%2F8%2Fdocs%2Fapi%2Forg%2Fxml%2Fs ax%2Fext%2FLexicalHandler.html%23startEntity-java.lang.String-&data=05%7C01%7 Cfrank.wegmann%40softwareag.com%7Ce62666eb4cef4c98a5d108db2f77d4b2%7Cd9662eb9 ad984e74a8a204ed5d544db6%7C1%7C0%7C638155963938657121%7CUnknown%7CTWFpbGZsb3d 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C %7C%7C&sdata=Ph8kngNvCE36ef1JO7WqLrU2GT1f1QNSN%2Fp1x1S3T2E%3D&reserved=0> Well some of the entity boundaries anyway. It doesn't report entity boundaries within attribute values. Michael Kay Saxonica On 28 Mar 2023, at 11:23, Wegmann, Frank frank.wegmann@xxxxxxxxxxxxxx<mailto:frank.wegmann@xxxxxxxxxxxxxx> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx<mailto:xsl-list-service@xxxxxxxxxxxx rytech.com>> wrote: This may be a very silly question, but I couldnt find an answer on it, with Saxon, or outside it, using tools relying on expat. Use case is a set of old XML documents (traditional, inhouse DTD) that make overly excessive use of entities of all kinds. While it is easy to get all entity declarations, and also to locate references of external entities, I could not locate references of text entities. By the time I see it the entity reference has already been resolved. How can I achieve that (ideally with line/column of the actual place in the original file or entity)? I very much hope that I just overlooked something here... Thanks, Frank Wegmann Software AG Software AG Sitz/Registered office: Uhlandstra_e 12, 64297 Darmstadt, Germany Registergericht/Commercial register: Darmstadt HRB 1562 - Vorstand/Management Board: Sanjay Brahmawar (Vorsitzender/Chairman), Daniela B|nger, Joshua Husk, Dr. Benno Quade, Dr. Stefan Sigg - Aufsichtsratsvorsitzender/Chairman of the Supervisory Board: Christian Lucas - https://www.softwareag.com<https://eur04.safelinks.protection.outlook.com/?ur l=https%3A%2F%2Fwww.softwareag.com%2F&data=05%7C01%7Cfrank.wegmann%40software ag.com%7Ce62666eb4cef4c98a5d108db2f77d4b2%7Cd9662eb9ad984e74a8a204ed5d544db6% 7C1%7C0%7C638155963938813368%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQ IjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=bHXoyx9AfdxP xeCU2Hl8oMOu%2FTlKFVdSUK9gm5F2N%2FM%3D&reserved=0> XSL-List info and archive<https://eur04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww. mulberrytech.com%2Fxsl%2Fxsl-list&data=05%7C01%7Cfrank.wegmann%40softwareag.c om%7Ce62666eb4cef4c98a5d108db2f77d4b2%7Cd9662eb9ad984e74a8a204ed5d544db6%7C1% 7C0%7C638155963938813368%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoi V2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=f%2FMErFf7uIzIQe Wo0QoVewnpaquw9V%2BPfhYpxjdj0ww%3D&reserved=0> EasyUnsubscribe<https://eur04.safelinks.protection.outlook.com/?url=http%3A%2 F%2Flists.mulberrytech.com%2Funsub%2Fxsl-list%2F293509&data=05%7C01%7Cfrank.w egmann%40softwareag.com%7Ce62666eb4cef4c98a5d108db2f77d4b2%7Cd9662eb9ad984e74 a8a204ed5d544db6%7C1%7C0%7C638155963938813368%7CUnknown%7CTWFpbGZsb3d8eyJWIjo iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&s data=hU9WfxdDiQHNJKLdckMNrd6Gu0CF0y6tapONCSt%2Fyuo%3D&reserved=0> (by email) XSL-List info and archive<https://eur04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww. mulberrytech.com%2Fxsl%2Fxsl-list&data=05%7C01%7Cfrank.wegmann%40softwareag.c om%7Ce62666eb4cef4c98a5d108db2f77d4b2%7Cd9662eb9ad984e74a8a204ed5d544db6%7C1% 7C0%7C638155963938813368%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoi V2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=f%2FMErFf7uIzIQe Wo0QoVewnpaquw9V%2BPfhYpxjdj0ww%3D&reserved=0> EasyUnsubscribe<https://eur04.safelinks.protection.outlook.com/?url=http%3A%2 F%2Flists.mulberrytech.com%2Funsub%2Fxsl-list%2F1110376&data=05%7C01%7Cfrank. wegmann%40softwareag.com%7Ce62666eb4cef4c98a5d108db2f77d4b2%7Cd9662eb9ad984e7 4a8a204ed5d544db6%7C1%7C0%7C638155963938813368%7CUnknown%7CTWFpbGZsb3d8eyJWIj oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C& sdata=Otk1DVvI7W0oxVPpKLASOCcRWCc6y4d6TXpn9j8j7%2Bg%3D&reserved=0> (by email) XSL-List info and archive<http://www.mulberrytech.com/xsl/xsl-list> EasyUnsubscribe<http://lists.mulberrytech.com/unsub/xsl-list/3453418> (by email<>)
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Tracking entity reference, Wegmann, Frank frank | Thread | Re: [xsl] Tracking entity reference, Peter Flynn peter@xx |
Re: [xsl] Tracking entity reference, Wegmann, Frank frank | Date | |
Month |