Re: [xsl] Flattening characters to plain latin

Colin Paul Adams wrote:

    >> codepoints-to-string(string-to-codepoints(normalize-unicode($in,
    >> 'NFKD'))[.  lt 127])
No. Not unless you correct the error in it first.

What error?

The following: codepoints-to-string(string-to-codepoints(normalize-unicode('@ABCDEFGHIJKLMNOPQRSTUV', 'NFKD'))[. le 127])

returns "AAAAAACEEEEIIIINOOOOO"

it misses the P and F, but I am not well-educated enough to understand normalize-unicode NFKD algorithms and whether that is an error or not. In addition, I changed lt to le, but I was under the impression that codepoint 127 was not part of Latin-1. The code itself was correct, but the definition of "plain latin" from the OP perhaps needs some clarification.

But here's one that removes all punctuation, but leaves alone the other symbols, like 0, . and ', but also the missing F and P

codepoints-to-string(
   string-to-codepoints(
       normalize-unicode('@ABCDEFGHIJKLMNOPQRSTUV0.'', 'NFKD'))
   [replace(codepoints-to-string(.), '[\p{M}]', '')])

This returns "AAAAAAFCEEEEIIIIPNOOOOO0.'"

but is not even close to as pretty as Michael's! Note the double codepoints-to-string, (which make it u-u-u-u-gly!). The alternative, a replace on the cpts of the whole stcp+normalize, would automatically normalize the results back before the regular expression can do its work. But like I said, it is u-u-u-ugly!

-- Abel

PS: hope the mailer does not mess too much with the high Latin-1 characters....

Current Thread
RE: [xsl] Flattening characters to plain latin, (continued) Michael Kay - Thu, 15 Feb 2007 15:01:10 -0000 Peter Hickman - Thu, 15 Feb 2007 17:03:59 +0000 Abel Braaksma - Fri, 16 Feb 2007 11:59:13 +0100 Colin Paul Adams - 16 Feb 2007 12:38:56 +0000 Abel Braaksma - Fri, 16 Feb 2007 14:05:47 +0100 <= Colin Paul Adams - 16 Feb 2007 13:25:17 +0000 Abel Braaksma - Fri, 16 Feb 2007 16:06:13 +0100 Colin Paul Adams - 16 Feb 2007 15:38:03 +0000 Michael Kay - Sat, 17 Feb 2007 17:22:31 -0000

Current Thread

RE: [xsl] Flattening characters to plain latin, (continued)
- Michael Kay - Thu, 15 Feb 2007 15:01:10 -0000
  - Peter Hickman - Thu, 15 Feb 2007 17:03:59 +0000
  - Abel Braaksma - Fri, 16 Feb 2007 11:59:13 +0100
    - Colin Paul Adams - 16 Feb 2007 12:38:56 +0000
    - Abel Braaksma - Fri, 16 Feb 2007 14:05:47 +0100 <=
    - Colin Paul Adams - 16 Feb 2007 13:25:17 +0000
    - Abel Braaksma - Fri, 16 Feb 2007 16:06:13 +0100
    - Colin Paul Adams - 16 Feb 2007 15:38:03 +0000
    - Michael Kay - Sat, 17 Feb 2007 17:22:31 -0000

<- Previous	Index	Next ->
Re: [xsl] Flattening characters to , Colin Paul Adams	Thread	Re: [xsl] Flattening characters to , Colin Paul Adams
Re: [xsl] XHTML html validation, Owen Rees	Date	Re: [xsl] Flattening characters to , Colin Paul Adams
	Month

<-prev [Thread] next->	<-prev [Date] next->
Month Index \| List Home