[xsl] A Functional Tokenizer (Was: Re: Looping over a CSV in XSL)

Subject: [xsl] A Functional Tokenizer (Was: Re: Looping over a CSV in XSL)
From: Dimitre Novatchev <dnovatchev@xxxxxxxxx>
Date: Mon, 19 Nov 2001 21:03:40 -0800 (PST)
> Now in XSL land I want to iterate over a
> nodelist and compare some attribute of the current node to each value in the
> CSV for equality.

You have a CSV string (a list of characters), you need to inspect every character
and to gradually accumulate the result -- a list of words, that were delimited by
special characters (in this particular case by comma and/or white space).

A "generic accumulator" function over the elements of a list is the "foldl" function
-- the classic king of generic list processing. We pass to "foldl" as parameter a
function that will be called with two arguments -- the acuumulated result until now
(the list of tokens so far) and the next character in the input string.
Based on these two arguments, this function updates the accumulated result
appropriately -- it either appends the character to the last token, or "cuts" the
last token and starts a new one.

And here's the solution:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:str-split2words-func="f:str-split2words-func"
exclude-result-prefixes="xsl msxsl str-split2words-func"
>

   <xsl:import href="str-foldl.xsl"/>

   <str-split2words-func:str-split2words-func/>

   <xsl:param name="pDelimiters" select="', &#9;&#10;&#13;'"/>

   <xsl:output indent="yes" omit-xml-declaration="yes"/>
   
    <xsl:template match="/">
      <xsl:call-template name="str-split-to-words">
        <xsl:with-param name="pStr" select="/*/*"/>
      </xsl:call-template>
    </xsl:template>

    <xsl:template name="str-split-to-words">
      <xsl:param name="pStr" select="dummy"/>
      
      <xsl:variable name="vsplit2wordsFun"
                    select="document('')/*/str-split2words-func:*[1]"/>

      <xsl:call-template name="str-foldl">
        <xsl:with-param name="pFunc" select="$vsplit2wordsFun"/>
        <xsl:with-param name="pStr" select="$pStr"/>
        <xsl:with-param name="pA0" select="/.."/>
      </xsl:call-template>

    </xsl:template>

    <xsl:template match="str-split2words-func:*">
      <xsl:param name="arg1" select="/.."/>
      <xsl:param name="arg2"/>
         
      <xsl:choose>
        <xsl:when test="contains($pDelimiters, $arg2)">
            <xsl:copy-of select="$arg1/*"/>
            <xsl:if test="string($arg1/*[last()])">
              <word/>
            </xsl:if>
        </xsl:when>
        <xsl:otherwise>
          <xsl:copy-of select="$arg1/*[position() &lt; last()]"/>
          <word><xsl:value-of select="concat($arg1/*[last()], $arg2)"/></word>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:template>

</xsl:stylesheet>

When applied on the following xml document:

<contents>
  <csv>Fredrick, Aaron, john, peter</csv>
</contents>

The result is:

<word>Fredrick</word><word>Aaron</word><word>john</word><word>peter</word>

We need just one more small step in order to obtain the ultimate tokenizer -- if we
manage to pass the list of delimiters to the accumulating function that we pass as
parameter to str-foldl, then we have the most general tokenizer function. You'll
never anymore need to code your own tokenizer, just call this one with your
parameters.

The solution is to always specify the list of delimiters as the first element of the
"accumulator" list:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:str-split2words-func="f:str-split2words-func"
exclude-result-prefixes="xsl msxsl str-split2words-func"
>

   <xsl:import href="str-foldl.xsl"/>

   <str-split2words-func:str-split2words-func/>

   <xsl:output indent="yes" omit-xml-declaration="yes"/>
   
    <xsl:template match="/">
      <xsl:call-template name="str-split-to-words">
        <xsl:with-param name="pStr" select="/*/*"/>
        <xsl:with-param name="pDelimiters" select="', &#9;&#10;&#13;'"/>
      </xsl:call-template>
    </xsl:template>

    <xsl:template name="str-split-to-words">
      <xsl:param name="pStr"/>
      <xsl:param name="pDelimiters"/>
      
      <xsl:variable name="vsplit2wordsFun"
                    select="document('')/*/str-split2words-func:*[1]"/>
                    
      <xsl:variable name="vrtfParams">
       <delimiters><xsl:value-of select="$pDelimiters"/></delimiters>
      </xsl:variable>

      <xsl:variable name="vResult">
	      <xsl:call-template name="str-foldl">
	        <xsl:with-param name="pFunc" select="$vsplit2wordsFun"/>
	        <xsl:with-param name="pStr" select="$pStr"/>
	        <xsl:with-param name="pA0" select="msxsl:node-set($vrtfParams)"/>
	      </xsl:call-template>
      </xsl:variable>
      
      <xsl:copy-of select="msxsl:node-set($vResult)/word"/>

    </xsl:template>

    <xsl:template match="str-split2words-func:*">
      <xsl:param name="arg1" select="/.."/>
      <xsl:param name="arg2"/>
         
      <xsl:copy-of select="$arg1/*[1]"/>
      <xsl:copy-of select="$arg1/word[position() != last()]"/>
      
      <xsl:choose>
        <xsl:when test="contains($arg1/*[1], $arg2)">
          <xsl:if test="string($arg1/word[last()])">
             <xsl:copy-of select="$arg1/word[last()]"/>
          </xsl:if>
          <word/>
        </xsl:when>
        <xsl:otherwise>
          <word><xsl:value-of select="concat($arg1/word[last()], $arg2)"/></word>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:template>

</xsl:stylesheet>

And with the same xml source document, here's the result:

<word>Fredrick</word>
<word>Aaron</word>
<word>john</word>
<word>peter</word>

Hope this helped.

Cheers,
Dimitre Novatchev.



__________________________________________________
Do You Yahoo!?
Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
http://geocities.yahoo.com/ps/info1

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread