Subject: Re: [xsl] special character encoding, two problems From: "Jonina Dames jdames@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Thu, 23 Oct 2014 20:39:00 -0000 |
Thanks for the advice! The <xsl:value-of select="normalize-unicode(replace(normalize-unicode(.,'NFKD'),'\p{Mn}',''),'NFKC')" /> function works for most of the entities, but it's still missing a couple dozen characters. Some of the author names still have unicode entities instead of plain ascii (for example, several characters with a stroke, several ligatures, thorn characters, upper and lowercase). Is there a variation of this function or a parameter that will catch and convert ALL of these to plain ascii, as well as the standard acute and cedil characters? Or do I need to address these outlying characters with something else (not translate, since I can't use a one-to-one replacement for ligature entities)? Thanks, Joni On Wed, Oct 15, 2014 at 05:56:40PM -0000, Jonina Damesjdames@xxxxxxxxx scripsit: > Problem 2: > I'm trying to use a stylesheet with a character map so I can convert > accented letters to their plain ascii equivalents in a surname element of my > output XML to create indexing values. I'm new to XSLT 2.0 and I'm having > trouble figuring out the syntax so my mappings will work correctly. Is there > a simpler way to convert numeric unicode entities of accented letters to > plain ascii characters, or is this my best bet? First off, the instant the XML document is parsed, all the numeric entities ought to get converted straight to the character represented by that entity, so when you're using XSLT to process the document you're dealing with regular old characters of whatever code point. If what you want to do is to take the accented characters and remove their accents, leaving the base character behind, the traditional way to remove accents is the translate() function, thus -- translate(.,'i','e') -- and you just keep going for all the characters you want to de-accent when you create the surname element's string contents. <surname> <xsl:value-of select="translate(.,'@BGIHJKNOT[Y\Y"Q`bgihjknot{y|q','AACEEEEIIOUUUYNaaceeeeiiouuuyn')"/> <surname> This is pretty much like the character maps only character maps aren't obliged to be one-to-one, which translate()'s replacement of characters _is_. (the first character in the search list gets swapped for the first character in the replacement list, and so on.) If you've got XSLT 2.0, it's much better (because you don't have to list every accented character that might show up!) to use <surname> <xsl:value-of select="normalize-unicode(replace(normalize-unicode(.,'NFKD'),'\p{Mn}',''),'NFKC')" /> </surname> because that will get everything without any explicit list requirement. You're normalizing the Unicode string to decomposed form -- so the "e" and the accent-aigu are separate code points -- then using Unicode character categories to delete all the "Mark, Nonspacing" characters (all the accents) via replace(), and then re-normalizing the result back into the composed form which XSLT expects. Both examples assume you're operating on the context node (that dot character) which you might not be. -- Graydon On 10/16/14 5:24 AM, XSL-List: The Open Forum on XSL wrote: > This message contains the recent posts to the XSL-List: The Open Forum on XSL > mailing list managed by Mulberry Technologies, Inc. (http://lists.mulberrytech.com). > > -- Jonina Dames Customer Support Specialist Inera Inc. +1 617 932 1932 eXtyles on Twitter <https://twitter.com/extyles> jdames@xxxxxxxxx ----------------------------------------------------------------- This email message and any attachments are confidential. If you are not the intended recipient, please immediately reply to the sender or call 617-932-1932 and delete the message from your email system. Thank you. -------------------------------------------------------------------
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] special character encodin, Eliot Kimber ekimber | Thread | Re: [xsl] special character encodin, Graydon graydon@xxxx |
Re: [xsl] From flat to hierarchical, Hank Ratzesberger xm | Date | Re: [xsl] special character encodin, Graydon graydon@xxxx |
Month |