Subject: Re: [xsl] Katakana substitution regex From: Wolfgang Laun <wolfgang.laun@xxxxxxxxx> Date: Sat, 7 Aug 2010 09:34:18 +0200 |
I suppose that there can't be a sequence of two or more ー characters. If so, I'd just go ahead and replace all substrings with the substring + #12540 and then, in a second call, replace all #12540#12540 by #12540. Sometimes it is simpler not to try to avoid to do something that can be easily undone. Below is the stylesheet. Substrings are sorted by descending length - I don't know whether there are substrings similar to 'abcd' and 'bc', where the suffix must be appended to 'abcd' but not to the 'bc' within. <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:wl="w.l"> <xsl:function name="wl:make-pattern" as="xs:string"> <xsl:param name="reps" as="xs:string*"/> <xsl:variable name="sorted" as="xs:string*"> <xsl:perform-sort select="$reps" > <xsl:sort select="string-length(.)" order="descending"/> </xsl:perform-sort> </xsl:variable> <xsl:sequence select="concat('(',string-join($sorted,'|'),')')"/> </xsl:function> <xsl:function name="wl:rep-subs" as="xs:string"> <xsl:param name="text" as="xs:string"/> <xsl:param name="pattern" as="xs:string"/> <xsl:sequence select="replace(replace($text, $pattern, '$1ー'), 'ーー', 'ー')"/> </xsl:function> <xsl:variable name="pattern" select="wl:make-pattern(('ab', 'abcd', 'cd', 'bc'))"/> <xsl:template match="/"> <xsl:apply-templates/> </xsl:template> <xsl:template match="text"> <xsl:copy> <xsl:value-of select="wl:rep-subs(text(),$pattern)"/> </xsl:copy> </xsl:template> </xsl:stylesheet> On 6 August 2010 22:14, Hoskins & Gretton <hoskgret@xxxxxxxxxxxxxxxx> wrote: > > HI, I have to convert some Katakana strings from "original" to "new" by adding ー (#x30fc;) a pronunciation character (see http://www.fileformat.info/info/unicode/char/30fc/index.htm). > In Japanese, there aren't any word boundaries, so essentially all of my search strings are substrings of the text of the current element. > When substring "a" is followed by the character ー I do not want to make the replacement. > > example: ブラウザ is a search string but it is followed by ー already -- do nothing > > When substring "a" is not followed by the character ー I want to make the replacement to create "a" followed by ー. > > example: ブラウザ is a search string but it is not followed by #x30fc; already > add to the end to make it > ブラウザー > > If I was going to just add the ー, I was able to do that with a regex that contained the strings that I wanted to find by using regex and analyze-string, where $regexSearch contains all of my search Katakana strings: > > <xsl:analyze-string select="." regex="({$regexSearch})"> > <xsl:matching-substring> > <xsl:value-of select="regex-group(1)"/> > <xsl:text>ー</xsl:text> > </xsl:matching-substring> > <xsl:non-matching-substring> > <xsl:value-of select="."/> > </xsl:non-matching-substring> > </xsl:analyze-string> > However,I can't figure out how I should fit this in to an overall xslt, where I need to check check ahead in the element text before I decide to make the substitution. Currently, if there is a string: ブラウザー > it becomes: ブラウザーー (doubling the last character). > > If someone has some experience with this type of search and replace problem, I would appreciate some guidance. > Regards, Dorothy
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Katakana substitution reg, Terry Badger | Thread | [xsl] [XPath 1.0] Why is .[A] illeg, Costello, Roger L. |
Re: [xsl] question about generate-i, Dave Pawson | Date | [xsl] [XPath 1.0] Why is .[A] illeg, Costello, Roger L. |
Month |