Re: [xsl] Switching off character entity resolution in XSL

Subject: Re: [xsl] Switching off character entity resolution in XSL
From: Richard Light <richard@xxxxxxxxxxxxxxxxx>
Date: Tue, 3 Feb 2004 10:09:06 +0000
In message
<OFD1EA90EE.C86A23F7-ONCA256E2F.000FE8D4-CA256E2F.00118833@xxxxxxxxxx>,
AHynes@xxxxxxxxxx writes
>Hello All,
>
>Unlike what most people would use XSL for (i.e. conversion of XML to HTML
>or other output format), I have a requirement to transform from one XML
>structure to another (subsequent presentation rendering occuring way
>downstream). No big deal I guess, but the annoying thing here is that by
>the time an XML parser has done it's job as per the XML specification, all
>those pesky character entities have been resolved (as defined in the DTD
>for the source document) and the output contains square brackets.

I've done this for entities which map to character references, rather
than to the SGML-style "SDATA" strings you quote below.  My strategy is
to live with the fact that the parser has carried out all the entity
mappings, and to use a "mappings" document containing entries like this:

<char>
<name>Delta</name>
<value>&#x0394;</value>
<unicode>0394</unicode>
<description>Delta       Dec:916 </description>
<mapping>[capital Delta]</mapping>
<!--U0394 /Delta capital Delta, Greek -->
</char>

to reverse the process on output.  (For your purposes, all you need is
the <name> and <value> elements - the other element types have different
uses.)

Essentially, when you come to output text(), iterate through it
character by character.  Have a convenience variable $normal-chars,
e.g.:

<xsl:variable name="normal-chars"
          select="concat('ABCDEFGHIJKLMNOPQRSTUVWXYZ',
                  'abcdefghijklmnopqrstuvwxyz',
                  '0123456789 ',
                  '!$%^*()-_+={}[];:@#~/?.,')"/>

so you can quickly test and output characters which can be output as
found.  For all others, look up the <char> with the appropriate <value>,
and output its <name>:

    <xsl:value-of select="concat('&amp;', $ch-name, ';')"
disable-output-escaping="yes"/>

Yes, it is slow and clumsy, and yes, it does use the deprecated
disable-output-escaping, but it does work ...

Richard Light

>Example:
>source document contains:     &bull;
>After transformation:         [bull  ]    (of course, the entity declared
>in the DTD is this, i.e. <!ENTITY bull "[bull  ]">)
>What I would like:            &bull;
>
>I really don't want to go messing with the DTD either, and I really don't
>think a parser would like there being unparsed entities within an entity
>declaration in a  DTD i.e. <!ENTITY bull &bull;> is illegal.
>
>I realise there is some way of dealing with this with character
>substitutions before or after using something like sed, but this isn't
>really a great solution, particularly across platforms. Is there any way of
>manipulating the output using XSL, or alternatively switching off entity
>resolution in the parser? I've played with custom entity resolvers with
>Java XML parsers (i.e. resolving URLs for example) but cannot see how this
>could be used for external character entities, and also realise there is
>some scope for writing a solution in something like JDOM - but what a pain!
>That defeats the whole purpose of XSL. I have gotten used to a pretty good
>compromise of using Saxon with the Xerces parser and the Norm Walsh entity
>resolver classes if that's of any help.
>
>Either there's a simple solution to this, it's something XML 2.0 (or
>whatever is on the horizon) might address (which is no help for me really),
>I'm on the wrong mailing list or I should just resort back to ("the good
>ol' days of" - yes, sarcasm) Omnimark which was really good at "unparsing"
>entities. I'm sure others experience similar problems so hopefully the
>first option is the right one (i.e. easy ?).
>
>Thanks very much,
>Alan Hynes.
>
>
>
>
>
>
> XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
>

-- 
Richard Light
SGML/XML and Museum Information Consultancy
richard@xxxxxxxxxxxxxxxxx


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread