Subject: [xsl] Re: replace() and translate() second try From: Kenneth Reid Beesley <krbeesley@xxxxxxxxx> Date: Sat, 4 Jun 2011 11:35:56 -0600 |
Thanks to Michael Kay and David Carlisle for their responses. In an example like translate(string, "wxyz", "ABCD") where w, x, y and z are supplementary characters, I find (using saxonhe9-3) that it works if the supplementary characters are indicated with the hex-escape &#xHHHHHHHH; notation, but _not_ if the supplementary characters are simply typed in using a Unicode-savvy text editor that handles supplementary characters. In case the hex-escape sequence I just typed got garbled by email filters, it consists of an ampersand, a pound/hash sign, an 'x', and a sequence of hex digits, terminated with a semicolon. Thanks to David Carlisle for suggesting the hex-escape notation. My original XML file (containing supplementary Unicode characters from the Deseret Alphabet block) and my XSLT script are both in UTF-8 encoding. So something like this works: translate(string, '𐐨𐐩𐐪𐐫' , 'ABCD') In case things get garbled again by email filters, the second argument to translate() contains (without the spaces shown here) four supplementary characters indicated by the hex code point values: & #x 10428 ; & #x 10429 ; & #x 1042A ; & #x 1042B ; If, using a Unicode-savvy text editor, with UTF-8 encoding for the file, I simply type in the four supplementary characters in the second string argument, this script does not work. This is a shame because the script with the real characters is far more readable (if you have a unicode editor that can display the supplementary character glyphs). The same for the replace(string, 'orig', 'repl') function. If the second argument contains supplementary characters, they need to be indicated in the hex-escape notation or I get results that are at least inconsistent. I have a little example that I would gladly forward to anyone who is interested. Thanks, Ken > > ---------------------------------------------------------------------- > Date: Fri, 3 Jun 2011 00:00:47 -0600 > To: xslt <xsl-list@xxxxxxxxxxxxxxxxxxxxxx> > From: Kenneth Reid Beesley <krbeesley@xxxxxxxxx> > Subject: replace() and translate() second try > Message-Id: <ACE8B20D-B21E-4551-8852-4EEE0EF398A7@xxxxxxxxx> > > I see that my previous message got rather garbled. Here's a simplified = > version of the question. > Assume we have an XSLT transform with something like > > translate(string, 'abcd', 'ABCD') > > Obviously 'a' gets replaced with 'A', 'b' with 'B', etc. > > Should this still work if the 'abcd' is replace by a string of 4 = > Unicode _supplementary_ characters? > That is, does translate() work with Characters (including supplementary = > characters) or just chars? > > Thanks, > > Ken > > ****************************** > Kenneth R. Beesley, D.Phil. > P.O. Box 540475 > North Salt Lake, UT > 84054 USA > > ------------------------------ > > Date: Fri, 03 Jun 2011 08:05:08 +0100 > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > From: Michael Kay <mike@xxxxxxxxxxxx> > Subject: Re: [xsl] replace() and translate() second try > Message-ID: <4DE887A4.8000905@xxxxxxxxxxxx> > > On 03/06/2011 07:00, Kenneth Reid Beesley wrote: >> I see that my previous message got rather garbled. Here's a simplified version of the question. >> Assume we have an XSLT transform with something like >> >> translate(string, 'abcd', 'ABCD') >> >> Obviously 'a' gets replaced with 'A', 'b' with 'B', etc. >> >> Should this still work if the 'abcd' is replace by a string of 4 Unicode _supplementary_ characters? >> That is, does translate() work with Characters (including supplementary characters) or just chars? >> > > Yes, it should work correctly, and I have tests to show that it does, so > please raise a bug report with a reproducible test case. > > The replace() function should also work with all Unicode characters, > though there may be question marks here about which version of Unicode > the characters are defined in, especially if you are trying to match > them against Unicode character categories such as \p{Ll}. > > Michael Kay > Saxonica > > > ------------------------------ > > Date: Fri, 03 Jun 2011 09:09:40 +0100 > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > From: David Carlisle <davidc@xxxxxxxxx> > Cc: Kenneth Reid Beesley <krbeesley@xxxxxxxxx> > Subject: Re: [xsl] replace(), translate() and Unicode supplementary characters > Message-ID: <4DE896C4.6060309@xxxxxxxxx> > > On 03/06/2011 05:00, Kenneth Reid Beesley wrote: >> Questions: Are translate() and replace() supposed to work with Unicode supplementary characters? > > yes > >> If so, what am I doing wrong? > > hard to say as there is rather a large chance that the input you showed > has been through several aggressive mail filters. > > Chances are it's an encoding error and one way to avoid those is to code > your stylesheet in ascii. > > translate( > ., > '& #x10428& #x10429;& #x1042A& #x1042B;', > '& #x0069;& #x0065;& #x0251;& #x0254;' > ) > > without the spaces after the & > should work for the translation you cited. > > David
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Hexadecimal character ref, Michael Kay | Thread | Re: [xsl] Re: replace() and transla, Michael Kay |
Re: [xsl] Hexadecimal character ref, Martin Honnen | Date | Re: [xsl] Re: replace() and transla, Michael Kay |
Month |