Re: [xsl] xml -> htmlhelp and character 8220

Subject: Re: [xsl] xml -> htmlhelp and character 8220
From: Jirka Kosek <jirka@xxxxxxxx>
Date: Fri, 12 Nov 2004 22:59:03 +0100
Allin Cottrell wrote:

> There isn't.  HTML output is implicit in context, and up till now I have
> let the output encoding be implicit too (given that we're generating
> HTML help for Windoze).  The input encoding is specified in the input
> xml files.

Output encoding for HTML Help couldn't be implicit as HTML Help is very
buggy piece of software. It doesn't support UTF-8 or character entity
references so all characters must be written raw in some single-byte
encoding. For Western-European languages the most appropriate encoding
is windows-1252 -- it contains both dashes and quotes. So in your case
you should have following settings:

<xsl:param name="htmlhelp.encoding" select="'windows-1252'"/>
<xsl:param name="chunker.output.encoding" select="'windows-1252'"/>
<xsl:param name="saxon.character.representation" select="'native'"/>

In most cases documents can be in UTF-8, so you can change second line to:

<xsl:param name="chunker.output.encoding" select="'UTF-8'"/>

> Thanks very much for your help.  But I'm coming to the conclusion this
> is a bug (or at least a feature regression) in the xsl stylesheets.  It
> seems that any required character re-encoding should be implicit from
> the context
>
>   iso-8859-1 xml input -> Windows html help output
>
> and should be handled correctly without the user having to specify up to
> 4 encoding variables.  And this did happen OK with the earlier release
> of the stylesheets.

This is not possible, bacause it will be very hard to select proper
encoding automatically (without having hardwired character repertoires
for each encoding inside stylesheets). You must use different
single-byte encodings for different languages depending on fancy
characters appearing in titles that go to project files (.hhc, .hhk, .hhp).

You should blame MS for not supporting Unicode in HTML Help. But it is
waste of time, because HTML Help was frozen long time ago, MS Help 2 is
only for Visual Studio.NET and next help system will be available in
Longhorn (which release data is constantly shifting).

				Jirka
	(author of HTML Help output in DocBook stylesheets)

--
------------------------------------------------------------------
   Jirka Kosek     e-mail: jirka@xxxxxxxx     http://www.kosek.cz
------------------------------------------------------------------
   ProfesionC!lnC- E!kolenC- a poradenstvC- v oblasti technologiC- XML.
      PodC-vejte se na nC!E! novD spuE!tDnC= web http://DocBook.cz
        PodrobnC= pEehled E!kolenC- http://xmlguru.cz/skoleni/
------------------------------------------------------------------

[demime 1.01d removed an attachment of type application/x-pkcs7-signature which had a name of smime.p7s]

Current Thread