RE: I/O of iso-8859-1 characters?

Subject: RE: I/O of iso-8859-1 characters?
From: Miles Sabin <msabin@xxxxxxxxxxxxxxxxxxx>
Date: Thu, 12 Aug 1999 18:04:08 +0100
Kai Grossjohann wrote,
> Right now, all I seem to be able to get is "character 
> not allowed" for non-ASCII iso-8859-1 characters on 
> the input side.  If I change the input side to 
> "&auml;" style entities, all I've been able to get is
> gibberish (might be UTF-8 or UTF-7, I don't know) on 
> the output side. I played around a bit with the 
> NXMLOutputHandler but since I didn't know what I was 
> doing: no cigar.
>
> What is the most painless way to deal with iso-8859-1 
> characters?

It sounds like it's interpreting your input document as 
UTF-8 (and barfing when you feed it top bit set
ISO-8859-1 characters which it's treating as malformed 
UTF-8 octet sequences).

I presume its doing that because you've omitted the
encoding declaration. For ISO-8859-1 its would be,

 <?xml version="1.0" encoding="ISO-8859-1"?>

If you're using the SAX interfaces to XT, there are a
couple of alternatives. When you construct your InputSource you could
do,

  yourInputSource.setEncoding("ISO-8859-1");

Alternatively, if you're reading the input doc via a
system id you could ensure that the server on the other 
end is correctly setting the Content-Type to,

  text/xml; charset=ISO-8859-1

XT should then be able to pick up the character
encoding from the URLConnection it constructs to resolve
the system id.

There may also be some XT specific mechanism in which
case your best bet would be to RTFM ;-)

Cheers,


Miles

-- 
Miles Sabin                          Cromwell Media
Internet Systems Architect           5/6 Glenthorne Mews
+44 (0)181 410 2230                  London, W6 0LJ
msabin@xxxxxxxxxxxxxxxxxxx           England


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread