[xsl] Katakana substitution regex

Subject: [xsl] Katakana substitution regex
From: Hoskins & Gretton <hoskgret@xxxxxxxxxxxxxxxx>
Date: Fri, 06 Aug 2010 16:14:00 -0400
HI, I have to convert some Katakana strings from "original" to "new" by adding &#12540; (#x30fc;) a pronunciation character (see http://www.fileformat.info/info/unicode/char/30fc/index.htm).
In Japanese, there aren't any word boundaries, so essentially all of my search strings are substrings of the text of the current element.
When substring "a" is followed by the character &#12540; I do not want to make the replacement.


example: &#12502;&#12521;&#12454;&#12470; is a search string but it is followed by &#12540; already -- do nothing

When substring "a" is not followed by the character &#12540; I want to make the replacement to create "a" followed by &#12540;.

example: &#12502;&#12521;&#12454;&#12470; is a search string but it is not followed by #x30fc; already
add to the end to make it
&#12502;&#12521;&#12454;&#12470;&#12540;


If I was going to just add the &#12540;, I was able to do that with a regex that contained the strings that I wanted to find by using regex and analyze-string, where $regexSearch contains all of my search Katakana strings:

<xsl:analyze-string select="." regex="({$regexSearch})">
<xsl:matching-substring>
<xsl:value-of select="regex-group(1)"/>
<xsl:text>&#12540;</xsl:text>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
However,I can't figure out how I should fit this in to an overall xslt, where I need to check check ahead in the element text before I decide to make the substitution. Currently, if there is a string: &#12502;&#12521;&#12454;&#12470;&#12540;
it becomes: &#12502;&#12521;&#12454;&#12470;&#12540;&#12540; (doubling the last character).


If someone has some experience with this type of search and replace problem, I would appreciate some guidance.
Regards, Dorothy


Current Thread