Re: [xsl] Linenumbering & word index

Subject: Re: [xsl] Linenumbering & word index
From: James Cummings <James.Cummings@xxxxxxxxxxxxxx>
Date: Fri, 6 Aug 2004 17:39:34 +0100 (BST)
On Fri, 6 Aug 2004, David Carlisle wrote:

> You can't do 
> tokenize(l/text(), '\s+')
> because it wants a single string as its first argument and that's
> probably more than one. 

Yup.  And that's one of the places I was getting confuddled. :-(

> You can do
>  select="for $l in l return tokenize($l,'\s+')"
> or same with for-each and tokenize them one at a time.

ok, I think I understand that, and might work for smaller things.

> however you really want to make yourself a tree first something like:
> 
Let's see if I understand the way this works. (I do like getting 
solutions, but also want to learn ;-)   )

> <xsl:template match="/">
> <xsl:variable name="x">
> <xsl:apply-templates mode="a" select="div[@type='poem']"/>
> </xsl:variable>

Creates variable $x from the templates of mode a below for 
only the poem divs.  (See, now *that* is how to avoid the 
stuff I don't want to include.. *doh*)

> [
> <xsl:copy-of  select="$x"/>
> ]

Copy of the temporary tree listing each poem, and word in line 
for that poem.

> <xsl:for-each-group select="$x/div/l/word" group-by=".">

Groups by each word in the temporary tree and sorts them
outputting  the word 
>  <xsl:sort />
>   <xsl:text>&#10;</xsl:text>
>   <xsl:value-of select="."/>

then for each instance of a word (keys always confuse me) it 
outputs the @poem and @n line numbers.

>   <xsl:for-each select="key('w',.)">
>   <xsl:text> </xsl:text>
>   <xsl:value-of select="../../@poem"/>:<xsl:value-of select="../@n"/>
>   </xsl:for-each>
> </xsl:for-each-group>
> </xsl:template>
> 

Applies the original mode a match for divs only 
to head and lg/l (modes...yes, must use modes more.)
> <xsl:template mode="a" match="div">
> <div poem="{position()}">
> <xsl:apply-templates mode="a" select="head"/>
> <xsl:apply-templates mode="a" select="lg/l"/>
> </div>
> </xsl:template>
> 

When you find a head, tokenize it into a temporary 
tree of <word> elements
> <xsl:template mode="a" match="head">
> <l n="head">
> <xsl:for-each select="tokenize(.,'(\s|[,\.!])+')">
> <word><xsl:value-of select="lower-case(.)"/></word>
> </xsl:for-each>
> </l>
> </xsl:template>
> 

When you find a l tokenize it into a temporary tree 
of <word> elements, recording the line's position

> <xsl:template mode="a" match="l">
> <l n="{position()}">
> <xsl:for-each select="tokenize(.,'\s+')">
> <word><xsl:value-of select="."/></word>
> </xsl:for-each>
> </l>
> </xsl:template>
> 

For each <word> element that we've just created 
make a key of name w.
> <xsl:key name="w" match="word" use="."/>

Seems to work absolutely perfectly.  (well, I'll customise 
the tokenize string...)

Many many thanks.

-James

---
Dr James Cummings, Oxford Text Archive, University of Oxford
James dot Cummings at oucs dot ox dot ac dot uk 
CALL FOR PAPERS: Digital Medievalism (Kalamazoo) and 
Early Drama (Leeds) see http://users.ox.ac.uk/~jamesc/cfp.html

Current Thread