Unicode and XSL (was substring())

Subject: Unicode and XSL (was substring())
From: Richard Light <richard@xxxxxxxxxxxxxxxxx>
Date: Sat, 5 Jun 1999 10:58:29 +0100
In message <93CB64052F94D211BC5D0010A800133170EECF@xxxxxxxxxxxxxxxxxxxxx
uk>, Kay Michael <Michael.Kay@xxxxxxx> writes
>We had this conversation a few weeks ago (regarding length()). As I learnt
>then, it's all due to the appalling decision to allow non-spacing
>diactricals in Unicode, which makes it quite hard to define what you mean by
>"the first character" in a string.

It isn't just diacriticals.  Unicode has a concept of "combining
characters" which is used for a wide range of purposes, most of which I
don't begin to understand.  It divides them into combining character
classes, which group together characters which appear over, under,
around, (etc.!) the base character.  It also has a detailed algorithm
for combining multiple combining characters into one base character.

The *semantics* of "the first character" might be a difficult one.
However, if you are simply trying to count characters, surely all you
have to do is to ignore any combining characters that occur within the
string.  (The first character should be a 'real one' - combining
characters always follow the base character they qualify.)

Since XML adopts Unicode in an unqualified manner, I assume that XSL
back-ends will support the rendering of these combined characters.  Just
like I assume that all XML editors will support Unicode. ;-(

Richard Light.

Richard Light
SGML/XML and Museum Information Consultancy

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Current Thread