Subject: Re: [xsl] Adding entity declarations to DOCTYPE in xml output|
From: "Eliot Kimber ekimber@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 26 Feb 2019 21:52:28 -0000
That general requirement (simple string macros) can be satisfied using XInclude, which is implemented by most, if not all, of the modern XML parsers. XInclude has many limitations (it is not a true use-by-reference facility) but it does have at least the same level of utility as text entities without requiring the use of DTDs and without some of the problematic aspects of DTDs (for example, you can choose to defer or ignore XInclude elements if you want, which I often do want depending on processing context). I could go farther and say that the original SGML design of DTDs was entirely misguided as well and should never have been done that way and certainly shouldn't have been carried into XML (again, I certainly argued *for* them at the time) but that's easy for me to say now. At the time that SGML was being defined and implemented the DTD syntax seemed perfectly sensible and it took a long time for us to recognize the inherent problems with DTDs as they exist in SGML and XML. In particular, because they are a purely syntactic mechanism DTDs are a security risk and provide no reliable declaration of the actual semantic document type of the document that exhibits the DOCTYPE declaration. Consider this example: <!DOCTYPE foo [ <!ENTITY gotcha SYSTEM "/usr/etc/.passwords"> ]> <foo>&gotcha;</foo> Now load that into a CMS that shall remain nameless running as "root" and look at the content that gets stored. Oops. Or consider: <!DOCTYPE notabook PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://docbook.org/xml/4.5/docbookx.dtd"[ <!ELEMENT notabook (foo, bar?) ]> <!ELEMENT foo EMPTY > <!ELEMENT bar EMPTY > <notabook> <foo/> </notebook> Here the DOCTYPE appears to declare this to be a DocBook 4 book and many, if not most, DTD-aware systems will use the public ID to bind this document to its DocBook-specific configuration. But this is clearly not a DocBook document (at least to a human observer). But an XML system that simply requires the document to be A) valid and B) associated with a known external DTD will likely happily accept this document. Thus, the DOCTYPE declaration tells you *nothing* actionable about the document itself. It's completely valid (assuming I didn't introduce typos in the internal declaration subset) but meaningless. By having the grammar declared only by reference, i.e., RELAX NG, XSD, or some other grammar, and by using namespaces to qualify at least one thing in the document (as the DITA standard does with the @dita:DITAArchVersion attribute) the document is unalterably associated with the definition of the thing it's supposed to be (that is, the namespace name and the URIs of any associated grammars function as names of the "true type" of the document, as opposed to just pointers to syntactic rules that guide parsing and validation). Compare with: <?xml-model href="http://docbook.org/xml/5.1/rng/docbook.rng" schematypens="http://relaxng.org/ns/structure/1.0"?> <?xml-model href="http://docbook.org/xml/5.1/rng/docbook.rng" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?> <notabook> <foo/> </notebook> This is clearly, and unambiguously, not a DocBook document. The model references unalterably bind the document to governing schemas that will detect the document's invalidity. The lack of the expected (and required) DocBook namespace on the root element also exposes this as not being a DocBook document. Likewise, there is no simple syntactic macro expansion happening here, so the security exposure is lower. So not a fan of DTDs.
|<- Previous||Index||Next ->|
|Re: [xsl] Adding entity declaration, Michele R Combs mrro||Thread||Re: [xsl] Adding entity declaration, Liam R. E. Quin liam|
|Re: [xsl] Adding entity declaration, Michele R Combs mrro||Date||Re: [xsl] Adding entity declaration, Liam R. E. Quin liam|