From: "Imsieke, Gerrit, le-tex" <gerrit.imsieke@xxxxxxxxx>
Date: Fri, 25 Dec 2009 15:14:57 +0100
Dear Brian,

There are two issues:

1. empty (or whitespace-only) elements
2. everything seems to be position()=1

ad 1.
This is because if you split '|a|b|c|' at '|', you will get five (!) elements, the first and last being empty. You should process the input so that your inner tokenizer encounters strings such as 'a|b|c'.

ad 2.
This is because in the for-each loop, the dynymic context is being set to every item in turn. So inside for-each, the current item is the only item in the dynymic context, and its position is always 1. In these circumstances it often helps to store the sequence in a variable (you might use something like fn:index-of($line, .) inside for-each if you really want to check the position of the current element -- but this is only unambiguous if everybody has a unique name within a family).

Here's a solution:

 <xsl:template match="/">
     <xsl:if test="unparsed-text-available('test.txt', 'ISO-8859-1')">
       <xsl:variable name="datafile" select="unparsed-text('test.txt', 'ISO-8859-1')" as="xs:string"/>
       <xsl:for-each select="tokenize(normalize-space($datafile), '[\s*\r\n]+')">
           <xsl:variable name="line"
                       replace(., '^[|]Family[|](.+?)[|]?\s*$', '$1'),
             as="xs:string*" />
             <xsl:value-of select="$line[1]"/>
             <xsl:value-of select="$line[2]"/>
           <xsl:for-each select="$line[position() gt 2]">
               <xsl:value-of select="."/>


