[xsl] Character encoding problem

Subject: [xsl] Character encoding problem
From: Matt Gushee <mgushee@xxxxxxxxxxxxx>
Date: Mon, 21 May 2001 11:26:50 -0600 (MDT)
Hi, folks--

I'm developing a simple XSLT transformation for selecting languages
(English or Japanese) on a bilingual website. It takes a source XHTML
document with paired headings in English and Japanese, e.g.:

	 <p xml:lang="en">
           [ some stuff in English ]
         </p>
         <p xml:lang="ja">
           [ same content in Japanese ]
         </p>

... and outputs everything in the selected language plus any content
that has no language specified. At least that's the theory. I've tried
processing it w/ (full) Saxon and 4XSLT's command line interfaces, but
keep getting errors:

Saxon:
	$ saxon main.html i18n.xsl currentLanguage=en
	Transform failed: =US-ASCII

	The above 'saxon' is a simple shell script I wrote just to
	save typing. It just invokes 'java com.icl.saxon.Whatever
	[<args>]'.

4XSLT:
	$ 4xslt -DcurrentLanguage=en main.html i18n.xsl
	[ long stack trace ]
	TypeError: argument(2) to filter() must be a sequence type

The 4XSLT error looks like a possible bug, but the Saxon output is
just plain puzzling. Where is 'US-ASCII' coming from? I edit the
source in EUC-JP, then convert it to UTF-8 or UTF-16 (same results
either way) using iconv.

So, can anybody give me a clue? Any leads would be much appreciated.

Matt Gushee


---- i18n.xsl ---------------------------------------------

<?xml version="1.0"?>
<!-- None of the commentings-out made any difference -->
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>

  <xsl:param name="currentLanguage" select="'en'"/>

  <xsl:variable name="charEncoding">
    <xsl:choose>
      <xsl:when test="$currentLanguage='en'">iso-8859-1</xsl:when>
      <xsl:when test="$currentLanguage='ja'">euc-jp</xsl:when>
      <xsl:otherwise>utf-8</xsl:otherwise>
    </xsl:choose>
  </xsl:variable>

  <xsl:output method="html" encoding="$charEncoding"/>

  <xsl:template match="/">
    <xsl:apply-templates/>
  </xsl:template>

  <!-- <xsl:template match="*[lang($currentLanguage) or not(@xml:lang)]"> -->
  <xsl:template match="*[lang($currentLanguage)]">
    <xsl:copy>
      <!-- <xsl:for-each select="@*[name() != 'id']"> -->
      <xsl:for-each select="@*">
	<xsl:copy/>
      </xsl:for-each>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>
    
</xsl:stylesheet>


--- main.html [pre-conversion: euc-jp encoding] --------------

<?xml version="1.0" encoding="UTF-16"?>
<!--
<!DOCTYPE html PUBLIC
  "-//W3C//DTD XHTML 1.1//EN"
  "/usr/local/share/xml/xhtml/xhtml11.dtd"
>
-->
<html xmlns="http://www.w3.org/1999/xhtml";
  version="-//W3C//DTD XHTML 1.1//EN"
  xml:lang="en">
  <head>
    <title>Welcome</title>
  </head>

  <body xml:lang="en">
    <h1 xml:lang="en">Welcome</h1>
    <h1 xml:lang="ja">ようこそ</h1>
    <hr xmlns="http://www.w3.org/1999/xhtml"/>
    <p xml:lang="en">
The Kaiwa Club is an informal group for people who want to practice
Japanese conversation. We welcome members at all levels of
proficiency.
</p>
    <p xml:lang="ja">
会話倶楽部は日本語の会話を練習したい人のためのインフォーマルなグループで
ございます。レベルはかかわらず、新しい会員を大歓迎しております。
</p>
  </body>
</html>

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread