Re: [xsl] xml -> htmlhelp and character 8220

Subject: Re: [xsl] xml -> htmlhelp and character 8220
From: Allin Cottrell <cottrell@xxxxxxx>
Date: Fri, 12 Nov 2004 21:10:20 -0500 (EST)
On Fri, 12 Nov 2004, Jirka Kosek wrote:

Output encoding for HTML Help couldn't be implicit as HTML Help is very buggy piece of software. It doesn't support UTF-8 or character entity references so all characters must be written raw in some single-byte encoding.

Granted. After some more experimentation I see what the relevant difference is between docbook-xsl-1.59.1 (with which I did not perceive a problem) and dockbook-xsl-1.67.0 (which triggered my reports of a problem). The earlier stylesheets silently discarded characters, such as left and right double quotes, which could not be represented in the output encoding for the htmlhelp toc file. With the new stylesheets, you get an error message regarding such characters. The new behavior seems right.


As I mentioned in a previous post, I am working around this by outputting utf-8 then using gnu recode to turn the toc into windows cp1252 (this circumambulation being required since saxon 6.5.3 does not support cp1252 output).

In the process I have discovered a bug in gnu recode version 3.6 (or libiconv): recode produces corrupt output from the u8..cp1252 conversion unless supplied with the -x: flag, to prevent it from using libiconv.

Allin Cottrell

Current Thread