Re: Character mapping in DSSSL/Jade

Subject: Re: Character mapping in DSSSL/Jade
From: "Stephen J. Tinney" <stinney@xxxxxxxxxxxxx>
Date: Wed, 7 May 1997 13:13:23 -0400 (EDT)
>> I am generating the separate d and x parts quite happily using modes,
>> and what I would like to do now is use Jade to map the UTF to the
>> ASCII subset used by the indexer.
>
>I'm not knowledgable about character sets and would be curious to know
>what this means. How do you map an arbitrary Kanji character to ASCII?
>Or do all of the Unicode characters already fit into the subset that 
>Unicode has in common with ASCII?

Serves me right for trying to keep out what I thought to be
unnecessary application-specific detail.  I work with Sumerian, and am
indexing graphemes in transliteration.  For display, I want to show
the graphemes with various diacritics including subscript numerals.
For indexing, it is acceptable to ignore certain of the diacritics,
and map others onto otherwise unused characters (we use scaron for
/sh/, but my indexer uses 'c' for this purpose; the identifiable
phonological repertoire of Sumerian can be mapped onto about 20 roman
characters).  Subscript digits, used to disambiguate homophones, are
treated as ordinary digits by the indexer.  So, to rephrase the
question, can I turn "<U-0161>a<U-2083>" into "ca3" using some
clever/efficient Jade?  It occurred to me, actually, that a better
route might be to do the UTF-ASCII translation in the indexer scanner,
but I'd still be curious to know the answer.

 Steve

 DSSSList info and archive:  http://www.mulberrytech.com/dsssl/dssslist


Current Thread