Subject: Re: [xsl] text extraction|
From: "Andrew Welch" <andrew.j.welch@xxxxxxxxx>
Date: Thu, 12 Oct 2006 17:05:30 +0100
Andrew Welch wrote: > On 10/12/06, mus47@xxxxxxxx <mus47@xxxxxxxx> wrote: >> And also I want to now how can the output file encoding setted to >> iso8859-1 instead of utf8. >> I use the xsltproc tool. > > You can set the output encoding using <xsl:output/>
But it is not guaranteed that the processor supports anything different from UTF-8/UTF-16.
"The value of the encoding attribute provides the value of the encoding parameter to the serialization method. The default value is implementation-defined, but in the case of the xml and xhtml methods it must be either UTF-8 or UTF-16."
...which took me a little by surprise - It seems to say that when the output method is xml or xhtml the encoding MUST be either UTF-8 or UTF-16? Saxon doesn't seem to mind...
Also note, the first 127 codepoints when encoded as ISO-8859-1 or UTF-8 are exactly equal. Only ISO 128 (sometimes euro sign, but you may see something different: ) and above are treated differently.
Note that ISO-8859-1 is an order of magnitude smaller then UTF-8, so you may end up with missing or replaced characters (not sure what they will be replaced with though, when they don't exist) in the output stream.
No you dont end up with missing or replaced characters... Any characters not in the encoding should be output as a character reference. Its a well known technique to use an output encoding of US-ASCII so that all non-ascii characters get output as character references, which gets around read encoding problems further down the pipe.