Re: [xsl] unreadable characters from indesign

Subject: Re: [xsl] unreadable characters from indesign
From: David Carlisle <davidc@xxxxxxxxx>
Date: Thu, 18 Jan 2007 02:01:57 GMT
>  because I still don't understand how those characters end up 
> in my xml
the usual cause of unexpected characters is incorrectly specified
encodings.

for example if the original document had a "PARAGRAPH SEPARATOR"
character Unicode hex 2029, decimal 8233, and that file was written 
using utf-8 then this character would take three bytes, with
hex codes E2 80 A9, that is, decimal 226 128 169, which are the three
numbers you mentioned in the original post.

If the file is correctly read as utf8 these three bytes will make a
single character, accessable using the same code, or & #x2029; for
example, but if it is incorrectly read using iso-8859-1 then the three
bytes will appear as three spurious characters 
> U+00E2, LATIN SMALL LETTER A WITH CIRCUMFLEX
> U+0080, control
> U+00A9, COPYRIGHT SIGN


David

Current Thread