| Subject: Re: [xsl] Katakana substitution regex From: Wolfgang Laun <wolfgang.laun@xxxxxxxxx> Date: Sat, 7 Aug 2010 09:34:18 +0200 | 
I suppose that there can't be a sequence of two or more ー
characters. If so, I'd just go ahead and replace all substrings with
the substring + #12540 and then, in a second call, replace all
#12540#12540 by #12540.
Sometimes it is simpler not to try to avoid to do something that can
be easily undone.
Below is the stylesheet. Substrings are sorted by descending length -
I don't know whether there  are substrings similar to 'abcd' and 'bc',
where the suffix must be appended to 'abcd' but not to the 'bc'
within.
<xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:wl="w.l">
<xsl:function name="wl:make-pattern" as="xs:string">
  <xsl:param name="reps" as="xs:string*"/>
  <xsl:variable name="sorted" as="xs:string*">
    <xsl:perform-sort select="$reps" >
      <xsl:sort select="string-length(.)" order="descending"/>
    </xsl:perform-sort>
  </xsl:variable>
  <xsl:sequence select="concat('(',string-join($sorted,'|'),')')"/>
</xsl:function>
<xsl:function name="wl:rep-subs" as="xs:string">
  <xsl:param name="text"    as="xs:string"/>
  <xsl:param name="pattern" as="xs:string"/>
  <xsl:sequence select="replace(replace($text, $pattern,
'$1ー'), 'ーー', 'ー')"/>
</xsl:function>
<xsl:variable name="pattern"
              select="wl:make-pattern(('ab', 'abcd', 'cd', 'bc'))"/>
<xsl:template match="/">
   <xsl:apply-templates/>
</xsl:template>
<xsl:template match="text">
  <xsl:copy>
    <xsl:value-of select="wl:rep-subs(text(),$pattern)"/>
  </xsl:copy>
</xsl:template>
</xsl:stylesheet>
On 6 August 2010 22:14, Hoskins & Gretton <hoskgret@xxxxxxxxxxxxxxxx> wrote:
>
> HI, I have to convert some Katakana strings from "original" to "new" by
adding ー (#x30fc;) a pronunciation character (see
http://www.fileformat.info/info/unicode/char/30fc/index.htm).
> In Japanese, there aren't any word boundaries, so essentially all of my
search strings are substrings of the text of the current element.
> When substring "a" is followed by the character ー I do not want to
make the replacement.
>
> example:        ブラウザ is a search string but it
is followed by ー already -- do nothing
>
> When substring "a" is not followed by the character ー I want to make
the replacement to create "a" followed by ー.
>
> example:        ブラウザ is a search string but it
is not followed by #x30fc; already
>                add to the end to make it
>                ブラウザー
>
> If I was going to just add the ー, I was able to do that with a regex
that contained the strings that I wanted to find by using regex and
analyze-string, where $regexSearch contains all of my search Katakana
strings:
>
>                <xsl:analyze-string select="." regex="({$regexSearch})">
>                    <xsl:matching-substring>
>                        <xsl:value-of select="regex-group(1)"/>
>                        <xsl:text>ー</xsl:text>
>                    </xsl:matching-substring>
>                    <xsl:non-matching-substring>
>                        <xsl:value-of select="."/>
>                    </xsl:non-matching-substring>
>                </xsl:analyze-string>
> However,I can't figure out how I should fit this in to an overall xslt,
where I need to check check ahead in the element text before I decide to make
the substitution. Currently, if there is a string:              
 ブラウザー
> it becomes:     ブラウザーー (doubling
the last character).
>
> If someone has some experience with this type of search and replace problem,
I would appreciate some guidance.
> Regards, Dorothy
| Current Thread | 
|---|
| 
 | 
| <- Previous | Index | Next -> | 
|---|---|---|
| Re: [xsl] Katakana substitution reg, Terry Badger | Thread | [xsl] [XPath 1.0] Why is .[A] illeg, Costello, Roger L. | 
| Re: [xsl] question about generate-i, Dave Pawson | Date | [xsl] [XPath 1.0] Why is .[A] illeg, Costello, Roger L. | 
| Month |