Re: [xsl] Switching off character entity resolution in XSL

Subject: Re: [xsl] Switching off character entity resolution in XSL
From: Peter Flynn <pflynn@xxxxxx>
Date: Tue, 03 Feb 2004 12:23:17 +0000
On Tue, 2004-02-03 at 03:11, AHynes@xxxxxxxxxx wrote:
> Hello All,
> Unlike what most people would use XSL for (i.e. conversion of XML to HTML
> or other output format), I have a requirement to transform from one XML
> structure to another (subsequent presentation rendering occuring way
> downstream). No big deal I guess, but the annoying thing here is that by
> the time an XML parser has done it's job as per the XML specification, all
> those pesky character entities have been resolved (as defined in the DTD
> for the source document) and the output contains square brackets.
> Example:
> source document contains:     &bull;
> After transformation:         [bull  ]    (of course, the entity declared
> in the DTD is this, i.e. <!ENTITY bull "[bull  ]">)
> What I would like:            &bull;

This looks like it's either an old DTD converted from SGML unedited,
or a DTD written by someone who was unaware that XML shouldn't need 
to use character entities. In practice there are always reasons: an
editor which cannot generate all the required characters is one
common problem.

> I really don't want to go messing with the DTD either, and I really don't
> think a parser would like there being unparsed entities within an entity
> declaration in a  DTD i.e. <!ENTITY bull &bull;> is illegal.

So, alas, is a recursive reference like <!ENTITY bull "&#38;bull;">,
at least in Saxon and I assume in other processors as well.

> I realise there is some way of dealing with this with character
> substitutions before or after using something like sed, but this isn't
> really a great solution, particularly across platforms. Is there any way of
> manipulating the output using XSL, or alternatively switching off entity
> resolution in the parser? 

I don't think so, but you can add to the internal subset a 
declaration of the character entities you want output as something 
else, eg

<?xml version="1.0"?>
<!DOCTYPE whatever SYSTEM "some.dtd" [
<!ENTITY bull "&#x2022;">

This will output a "real" bullet as a numeric character reference.
If you have copies of the character entity declaration files (eg
from the distribution of DocBook) you could reference them in the
internal subset instead, so that all the declarations override
any in the DTD.

Is there a reason why your output should need to preserve the
character entity format?


 XSL-List info and archive:

Current Thread