Re: [xsl] find capital letters in string and split it

Subject: Re: [xsl] find capital letters in string and split it
From: Dimitre Novatchev <dnovatchev@xxxxxxxxx>
Date: Mon, 10 Feb 2003 04:44:18 -0800 (PST)
---- "bryan" <bry@xxxxxxxxxx> wrote:

> In Rdf/Xml it's often the habit to camel-case strings in IDs and 
> such. 
> 
> Let's suppose I want to split the string at the upper case letters, 
> the easiest way I can see to do that (the only way that pops into my 
> mind) is to parse the string twice, using translate() and replacing 
> upper-case letters with a string sequence not very likely to occur 
> normally, and then reparse the string splitting it at these 
> occurrences. This is of course resource intensive and not foolproof. 
> Anybody have any thoughts on how to do this?

Hi Bryan,

It seems to me that you want to preserve the capital letters? If *not*
so, then the following is a most straightforword solution using the
"str-split-to-words" template of FXSL:

This transformation:
-------------------
<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
>

   <xsl:import href="strSplit-to-Words.xsl"/>
<!-- This transformation must be applied to:
        testSplitToWords4.xml               
-->

   <xsl:output indent="yes" omit-xml-declaration="yes"/>

   <xsl:variable name="vCaps" 
    select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>
    
    <xsl:template match="/">
      <xsl:call-template name="str-split-to-words">
        <xsl:with-param name="pStr" select="/*"/>
        <xsl:with-param name="pDelimiters" 
                        select="$vCaps"/>
      </xsl:call-template>
    </xsl:template>
</xsl:stylesheet>

when applied against this source.xml:

<t>thisIsACamelCasedWord</t>

Produces:

<word>this</word>
<word>s</word>
<word>amel</word>
<word>ased</word>
<word>ord</word>


In case you need to preserve the capital letters, the solution is
slightly different. One first pass is made on the string, which inserts
a space in front of every capital letter. The newly produced string is
then tokenised. In the first pass I also use the "str-map" template
from FXSL.

This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
 xmlns:myMark="f:MarkAnUppercase" 
 exclude-result-prefixes="myMark"
>

   <xsl:import href="str-map.xsl"/>
   <xsl:import href="strSplit-to-Words.xsl"/>
<!-- This transformation must be applied to:
        testSplitToWords4.xml               
-->

   <xsl:output indent="yes" omit-xml-declaration="yes"/>

   <xsl:variable name="vCaps" 
    select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>
    
    <myMark:myMark/>
    <xsl:template match="myMark:*">
      <xsl:param name="arg1"/>
      
      <xsl:if test="contains($vCaps, $arg1)">
        <xsl:text> </xsl:text>
      </xsl:if>
      <xsl:value-of select="$arg1"/>
    </xsl:template>
    
    <xsl:template match="/">
    
      <xsl:variable name="vSpaceDelimited">
        <xsl:call-template name="str-map">
          <xsl:with-param name="pFun" 
            select="document('')/*/myMark:*[1]"/>
          <xsl:with-param name="pStr" select="/*"/>
        </xsl:call-template>
      </xsl:variable>
      
      <xsl:call-template name="str-split-to-words">
        <xsl:with-param name="pStr" select="$vSpaceDelimited"/>
        <xsl:with-param name="pDelimiters" 
                        select="' '"/>
      </xsl:call-template>
    </xsl:template>
</xsl:stylesheet>

when applied against the same source.xml produces:

<word>this</word>
<word>Is</word>
<word>A</word>
<word>Camel</word>
<word>Cased</word>
<word>Word</word>


Hope this helped.






=====
Cheers,

Dimitre Novatchev.
http://fxsl.sourceforge.net/ -- the home of FXSL

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread