|
Subject: RE: I/O of iso-8859-1 characters? From: Miles Sabin <msabin@xxxxxxxxxxxxxxxxxxx> Date: Thu, 12 Aug 1999 18:04:08 +0100 |
Kai Grossjohann wrote,
> Right now, all I seem to be able to get is "character
> not allowed" for non-ASCII iso-8859-1 characters on
> the input side. If I change the input side to
> "ä" style entities, all I've been able to get is
> gibberish (might be UTF-8 or UTF-7, I don't know) on
> the output side. I played around a bit with the
> NXMLOutputHandler but since I didn't know what I was
> doing: no cigar.
>
> What is the most painless way to deal with iso-8859-1
> characters?
It sounds like it's interpreting your input document as
UTF-8 (and barfing when you feed it top bit set
ISO-8859-1 characters which it's treating as malformed
UTF-8 octet sequences).
I presume its doing that because you've omitted the
encoding declaration. For ISO-8859-1 its would be,
<?xml version="1.0" encoding="ISO-8859-1"?>
If you're using the SAX interfaces to XT, there are a
couple of alternatives. When you construct your InputSource you could
do,
yourInputSource.setEncoding("ISO-8859-1");
Alternatively, if you're reading the input doc via a
system id you could ensure that the server on the other
end is correctly setting the Content-Type to,
text/xml; charset=ISO-8859-1
XT should then be able to pick up the character
encoding from the URLConnection it constructs to resolve
the system id.
There may also be some XT specific mechanism in which
case your best bet would be to RTFM ;-)
Cheers,
Miles
--
Miles Sabin Cromwell Media
Internet Systems Architect 5/6 Glenthorne Mews
+44 (0)181 410 2230 London, W6 0LJ
msabin@xxxxxxxxxxxxxxxxxxx England
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| Conditionally enclosing something i, Mike Brown | Thread | Re: I/O of iso-8859-1 characters?, Kai Großjohann |
| RE: Weird IE5 behaviour, Senthil Vaiyapuri | Date | XSL variable types, Jon Smirl |
| Month |