[xsl] tokenize

Subject: [xsl] tokenize
From: Peter Flynn <pflynn@xxxxxx>
Date: Fri, 14 Oct 2011 13:51:20 +0100
It's either my brain slowing down, or the fact that it's nearly the
weekend, or my lack of sleep and coffee, but I can't understand this: I
need to break up the content of a td element which represents a Unix
filepath, tokenizing on slashes, and getting rid of bogus visual formatting:

  <xsl:template match="h:tbody/h:tr">
    <!-- tokenise the uri so that we only extract valid data, eg
         <td class="xl">
&#160;&#160;&#160;&#160;/researchprofiles/A015/pcrowley/</td>
    -->
    <xsl:variable name="uri">
      <xsl:value-of
           select="translate(h:td[@class='xl'],'&#160;&#xa;','')"/>
    </xsl:variable>
    <xsl:variable name="urifrag" select="tokenize($uri,'/')"/>
    <xsl:text>"</xsl:text>
    <xsl:value-of select="$urifrag[1]"/>
    <xsl:text>" </xsl:text>
    <xsl:text>&#xa;</xsl:text>
    ...
  </xsl:template>

(the commented example is Tidy'd output from the 'analog' web logfile
analyser). The result for the example td element is output as:

   "/researchprofiles/A015/pcrowley"

In other words, not only has it not tokenized the string, but something
has gobbled the trailing slash from the input content. I suspected that
there was some character encoding error (slashes except the final one
not being real slashes, perhaps) but they are all genuine.

I have clearly misunderstood how tokenize works (except that I have been
using it perfectly happily elsewhere for years). The variable $urifrag
seems to be returning the entire string rather than breaking it up,
except for the trailing slash, which means it is actually splitting the
string on its final slash only, instead of on all slashes.

Current Thread