Subject: Re: Character mapping in DSSSL/Jade From: "Stephen J. Tinney" <stinney@xxxxxxxxxxxxx> Date: Wed, 7 May 1997 13:13:23 -0400 (EDT) |
>> I am generating the separate d and x parts quite happily using modes, >> and what I would like to do now is use Jade to map the UTF to the >> ASCII subset used by the indexer. > >I'm not knowledgable about character sets and would be curious to know >what this means. How do you map an arbitrary Kanji character to ASCII? >Or do all of the Unicode characters already fit into the subset that >Unicode has in common with ASCII? Serves me right for trying to keep out what I thought to be unnecessary application-specific detail. I work with Sumerian, and am indexing graphemes in transliteration. For display, I want to show the graphemes with various diacritics including subscript numerals. For indexing, it is acceptable to ignore certain of the diacritics, and map others onto otherwise unused characters (we use scaron for /sh/, but my indexer uses 'c' for this purpose; the identifiable phonological repertoire of Sumerian can be mapped onto about 20 roman characters). Subscript digits, used to disambiguate homophones, are treated as ordinary digits by the indexer. So, to rephrase the question, can I turn "<U-0161>a<U-2083>" into "ca3" using some clever/efficient Jade? It occurred to me, actually, that a better route might be to do the UTF-ASCII translation in the indexer scanner, but I'd still be curious to know the answer. Steve DSSSList info and archive: http://www.mulberrytech.com/dsssl/dssslist
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: Character mapping in DSSSL/Jade, Paul Prescod | Thread | Jade requests, Paul Prescod |
Re: Character mapping in DSSSL/Jade, Paul Prescod | Date | Jade requests, Paul Prescod |
Month |