|
Subject: Re: [xsl] Switching off character entity resolution in XSL From: Richard Light <richard@xxxxxxxxxxxxxxxxx> Date: Tue, 3 Feb 2004 10:09:06 +0000 |
In message
<OFD1EA90EE.C86A23F7-ONCA256E2F.000FE8D4-CA256E2F.00118833@xxxxxxxxxx>,
AHynes@xxxxxxxxxx writes
>Hello All,
>
>Unlike what most people would use XSL for (i.e. conversion of XML to HTML
>or other output format), I have a requirement to transform from one XML
>structure to another (subsequent presentation rendering occuring way
>downstream). No big deal I guess, but the annoying thing here is that by
>the time an XML parser has done it's job as per the XML specification, all
>those pesky character entities have been resolved (as defined in the DTD
>for the source document) and the output contains square brackets.
I've done this for entities which map to character references, rather
than to the SGML-style "SDATA" strings you quote below. My strategy is
to live with the fact that the parser has carried out all the entity
mappings, and to use a "mappings" document containing entries like this:
<char>
<name>Delta</name>
<value>Δ</value>
<unicode>0394</unicode>
<description>Delta Dec:916 </description>
<mapping>[capital Delta]</mapping>
<!--U0394 /Delta capital Delta, Greek -->
</char>
to reverse the process on output. (For your purposes, all you need is
the <name> and <value> elements - the other element types have different
uses.)
Essentially, when you come to output text(), iterate through it
character by character. Have a convenience variable $normal-chars,
e.g.:
<xsl:variable name="normal-chars"
select="concat('ABCDEFGHIJKLMNOPQRSTUVWXYZ',
'abcdefghijklmnopqrstuvwxyz',
'0123456789 ',
'!$%^*()-_+={}[];:@#~/?.,')"/>
so you can quickly test and output characters which can be output as
found. For all others, look up the <char> with the appropriate <value>,
and output its <name>:
<xsl:value-of select="concat('&', $ch-name, ';')"
disable-output-escaping="yes"/>
Yes, it is slow and clumsy, and yes, it does use the deprecated
disable-output-escaping, but it does work ...
Richard Light
>Example:
>source document contains: •
>After transformation: [bull ] (of course, the entity declared
>in the DTD is this, i.e. <!ENTITY bull "[bull ]">)
>What I would like: •
>
>I really don't want to go messing with the DTD either, and I really don't
>think a parser would like there being unparsed entities within an entity
>declaration in a DTD i.e. <!ENTITY bull •> is illegal.
>
>I realise there is some way of dealing with this with character
>substitutions before or after using something like sed, but this isn't
>really a great solution, particularly across platforms. Is there any way of
>manipulating the output using XSL, or alternatively switching off entity
>resolution in the parser? I've played with custom entity resolvers with
>Java XML parsers (i.e. resolving URLs for example) but cannot see how this
>could be used for external character entities, and also realise there is
>some scope for writing a solution in something like JDOM - but what a pain!
>That defeats the whole purpose of XSL. I have gotten used to a pretty good
>compromise of using Saxon with the Xerces parser and the Norm Walsh entity
>resolver classes if that's of any help.
>
>Either there's a simple solution to this, it's something XML 2.0 (or
>whatever is on the horizon) might address (which is no help for me really),
>I'm on the wrong mailing list or I should just resort back to ("the good
>ol' days of" - yes, sarcasm) Omnimark which was really good at "unparsing"
>entities. I'm sure others experience similar problems so hopefully the
>first option is the right one (i.e. easy ?).
>
>Thanks very much,
>Alan Hynes.
>
>
>
>
>
>
> XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
>
--
Richard Light
SGML/XML and Museum Information Consultancy
richard@xxxxxxxxxxxxxxxxx
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| [xsl] Switching off character entit, AHynes | Thread | Re: [xsl] Switching off character e, Wendell Piez |
| RE: [xsl] Switching off character e, Stuart Brown | Date | RE: [xsl] Switching off character e, Michael Kay |
| Month |