[xsl] Generate identifier

Subject: [xsl] Generate identifier
From: "Vladimir Nesterovsky" <vladimir@xxxxxxxxxxxxxxxxxxxx>
Date: Tue, 29 Dec 2009 07:13:58 -0800
Hello!

I need to convert a string into an identifier.
Earlier I was using the following function:

  <!--
    Creates an normalized name for a specified name components.
      $component - name components to generate normalized name for.
      $default-name - a default name in case a name cannot be built.
      Returns a normalized name (upper case first).
  -->
  <xsl:function name="t:create-name" as="xs:string?">
    <xsl:param name="components" as="xs:string*"/>
    <xsl:param name="default-name" as="xs:string?"/>

    <xsl:variable name="parts" as="xs:string*">
      <xsl:for-each select="$components">
        <xsl:analyze-string
          regex="(\p{{L}}|\d)+"
          flags="imx"
          select=".">
          <xsl:matching-substring>
            <xsl:sequence select="."/>
          </xsl:matching-substring>
        </xsl:analyze-string>
      </xsl:for-each>
    </xsl:variable>

    <xsl:choose>
      <xsl:when test="empty($parts)">
        <xsl:sequence select="$default-name"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:sequence select="
          string-join
          (
            (
              for
                $i in 1 to count($parts),
                $part in $parts[$i]
              return
                if
                (
                  ($i = 1) and
                  (
                    for $c in substring($part, 1, 1) return
                      ($c ge '0') and ($c le '9')
                   )
                )
                then
                  (
                    ($default-name, 'name')[1],
                    upper-case($part)
                  )
                else
                  (
                    upper-case($part)
                  )
            ),
            '-'
          )"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:function>

Now, I have to build a name with only containing [A-Za-z0-9] only.
My problem is that I often see characters with modifiers like
00E0 C  LATIN SMALL LETTER A WITH GRAVE
00E1 C! LATIN SMALL LETTER A WITH ACUTE
00E2 C" LATIN SMALL LETTER A WITH CIRCUMFLEX
00E3 C# LATIN SMALL LETTER A WITH TILDE
00E4 C$ LATIN SMALL LETTER A WITH DIAERESIS
...

My questions:
  is it acceptable, from the perspective of a western language, to replace
those characters with a character without modifier;
  is there a way to do this in xslt;
  any better option?

Thanks
--
Vladimir Nesterovsky
http://www.nesterovsky-bros.com

Current Thread