Re: [xsl] Confused about entities

Subject: Re: [xsl] Confused about entities
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Tue, 14 Mar 2006 10:01:42 -0500
Gary,

At 09:56 AM 3/14/2006, you wrote:
On 14/03/06, andrew welch <andrew.j.welch@xxxxxxxxx> wrote:

> If you are outtputting as XML where are the &nbsp;'s coming from?
>
> Are you writing them out manually in the stylesheet - as is in &amp;nbsp;?

Nope but thanks for the clue. The *input* is double escaped (I must
stop trusting a browser when viewing the output). That is I've
actually got &amp;nbsp; as the input which explains why there is no
conversion going on. Later on in a different transform I do a
saxon:parse which then obviously can't find the reference to a &nbsp;
and therefore throws an exception.

Oh the input is *double* escaped? Fun.


Two available options:
1. Isolate a transformation step to write a file in which the double-escaping is removed, in effect by resolving the "& amp;" entity to "&" so the file presents an honest "& nbsp;" -- then parse as normally. But this step in the pipeline has either to write a file or pass the data through, say, a SAX filter -- it has to serialize the data somehow, for reparsing: it can't work completely within XSLT's world of trees (the logical view). As Mike just observed, you're having to work with the lexical layer of the markup before it represents what it's supposed to represent (what you, but not the computer, knows it "actually" represents through the double-escaped entities).


2. Use string processing. Since you're using XSLT2.0 this is a reasonable option. A regular expression could be used to match the fake entities and turn them into something more useful. Probably this process would have to write a file too, to be parsed again, unless you used some kind of internal lookup table to take the place of the set of entity declarations (which are only available to a parser).

I hesitate to say more, as XSLT 2.0 gives much better facilities for handling such things than 1.0 did. (I could tell you about 1.0 tricks, but why?) But since I haven't tried them out myself, I can only direct your attention to them.

Note that both these approaches assume that your files actually parse. The error message you reported before suggests they don't.

But maybe you have the unescaping thing working and need to invoke the entity declarations on the output to get it to parse properly -- that error message was upon parsing the *output*? (You're not the only one confused now.)

Cheers,
Wendell




====================================================================== Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ======================================================================

Current Thread