RE: [xsl] Upper ASCII chars

Subject: RE: [xsl] Upper ASCII chars
From: Jay Burgess <jburgess@xxxxxxxxxxxxxx>
Date: Tue, 05 Feb 2002 09:18:02 -0600
[Thanks Jeni, Michael, and David for the replies. I'll reply to all three here.]

> (Jeni) It depends on what processor you're using.
>
> (Michael) Nevertheless, many people do care, so some
> processors give you a way of controlling it. Saxon has
> an attribute saxon:character-representation, and I
> think Xalan has some kind of configuration file.
>
I am using Xalan. I'll look for its saxon:character-representation equivalent. That would seem to solve the problem.


> (Jeni) Out of interest, are you experiencing problems with browsers
> recognising the character entity references, or is it purely that you
> don't like the space that they take up, or find them less readable
> than the native characters?
>
> (David) although if you are writing HTML why do you care? the two
> forms that you show are equivalent to any HTML system.
>
Unfortunately, it's not an "HTML system". I'm building server-side include pages from an XML configuration file. The parameters for the <SERVLET> block (e.g. <param name="input1" value="£©®ÄËÓáöÿ.DTD">) need to be able to contain both "lower ASCII" and "upper ASCII" characters. value="£" is completely different data for the SSI parser than value="&pound;".


> (Michael) Oh dear: "upper ASCII". There's no such thing. ASCII stops
> at 0x7F. A good first rule in understanding character coding issues
> is to get your terminology straight!
>
Yes, ASCII is a 7-bit protocol. But in the all the years I've been in this business, when someone says "upper ASCII", everyone else knows what they're talking about. Since my goal was to define my problem, and all three of you seemed to understand the issue, I believe it accomplished its purpose.


> (Jeni) As an alternative, you could change the output method to xml
> and generate well-formed HTML (or full XHTML if you want).
>
I did try this already, and this led to a different set of problems (mostly formatting related) which I didn't try to address at the time. If I can't get your first suggestion to work, then I'll go back to this option and try to make it work.


> (Jeni) [There's been a recent suggestion on xsl-editors@xxxxxx that
> a similar functionality to saxon:character-representation be offered
> in XSLT 2.0 - you might want to post this example there to demonstrate
> another use case.]
>
I'll do that this morning.  Thanks for the suggestion.

Jay


-----Original Message----- From: Jeni Tennison [mailto:jeni@xxxxxxxxxxxxxxxx] Sent: Tuesday, February 05, 2002 1:35 AM To: Jay Burgess Cc: xsl-list@xxxxxxxxxxxxxxxxxxxxxx Subject: Re: [xsl] Upper ASCII chars


Hi Jay,


> I get the following in the file:
>
>    <param name="input1"
> value="&pound;&copy;&reg;&Auml;&Euml;&Oacute;&aacute;&ouml;&yuml;.DTD">
>
> What I want, though, is:
>
>    <param name="input1" value="£©®ÄËÓáöÿ.DTD">
>
> Is there a way to achieve this?

It depends on what processor you're using. The XSLT 1.0 Rec states
that if the output method is html and the processor knows the
character entity reference for a character, then that character may be
output using the character entity reference, which is what you're
experiencing.

Some processors, notably Saxon (someone tell me if other processors
offer this) give you a bit of control over how you want the characters
to be serialized. With Saxon, you can do:

  <xsl:output method="html"
              saxon:character-representation="native;entity" />

to tell Saxon to serialize non-ASCII characters that can be serialized
as native characters in your character encoding as native characters,
and those that cannot be represented in your character encoding as
entities (if Saxon knows such an entity). This should give you the
result that you're after (assuming that the characters that you're
using are representable within your encoding).

[There's been a recent suggestion on xsl-editors@xxxxxx that a similar
functionality to saxon:character-representation be offered in XSLT 2.0
- you might want to post this example there to demonstrate another use
case.]

As an alternative, you could change the output method to xml and
generate well-formed HTML (or full XHTML if you want). The characters
won't be represented as entities in that case because XSLT 1.0
processors can't tell the difference between normal XML and
well-formed HTML, so won't escape any of the characters.

Out of interest, are you experiencing problems with browsers
recognising the character entity references, or is it purely that you
don't like the space that they take up, or find them less readable
than the native characters?

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/



XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list


Current Thread