Subject: Re: [xsl] Switching off character entity resolution in XSL From: Peter Flynn <pflynn@xxxxxx> Date: Tue, 03 Feb 2004 12:23:17 +0000 |
On Tue, 2004-02-03 at 03:11, AHynes@xxxxxxxxxx wrote: > Hello All, > > Unlike what most people would use XSL for (i.e. conversion of XML to HTML > or other output format), I have a requirement to transform from one XML > structure to another (subsequent presentation rendering occuring way > downstream). No big deal I guess, but the annoying thing here is that by > the time an XML parser has done it's job as per the XML specification, all > those pesky character entities have been resolved (as defined in the DTD > for the source document) and the output contains square brackets. > > Example: > source document contains: • > After transformation: [bull ] (of course, the entity declared > in the DTD is this, i.e. <!ENTITY bull "[bull ]">) > What I would like: • This looks like it's either an old DTD converted from SGML unedited, or a DTD written by someone who was unaware that XML shouldn't need to use character entities. In practice there are always reasons: an editor which cannot generate all the required characters is one common problem. > I really don't want to go messing with the DTD either, and I really don't > think a parser would like there being unparsed entities within an entity > declaration in a DTD i.e. <!ENTITY bull •> is illegal. So, alas, is a recursive reference like <!ENTITY bull "&bull;">, at least in Saxon and I assume in other processors as well. > I realise there is some way of dealing with this with character > substitutions before or after using something like sed, but this isn't > really a great solution, particularly across platforms. Is there any way of > manipulating the output using XSL, or alternatively switching off entity > resolution in the parser? I don't think so, but you can add to the internal subset a declaration of the character entities you want output as something else, eg <?xml version="1.0"?> <!DOCTYPE whatever SYSTEM "some.dtd" [ <!ENTITY bull "•"> ]> This will output a "real" bullet as a numeric character reference. If you have copies of the character entity declaration files (eg from the distribution of DocBook) you could reference them in the internal subset instead, so that all the declarations override any in the DTD. Is there a reason why your output should need to preserve the character entity format? ///Peter XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] Switching off character e, Michael Kay | Thread | Re: [xsl] Switching off character e, David Carlisle |
RE: [xsl] FO Processor choice, Andrew Welch | Date | Re: [xsl] FO Processor choice, Kobayashi |
Month |