RE: [xsl] Linenumbering & word index

Subject: RE: [xsl] Linenumbering & word index
From: James Cummings <James.Cummings@xxxxxxxxxxxxxx>
Date: Fri, 6 Aug 2004 14:15:36 +0100 (BST)
On Thu, 5 Aug 2004, Michael Kay wrote:

> > But can't see how to get the word position whilst tokenizing the 
> > whole lot? Everything I try doesn't work.
> > 
> 
> I think the clue here is that you need a data structure consisting of a list
> of (word, position) pairs. As soon as you need more than a linear sequence,
> it's probably a good idea to use a temporary tree. So you probably want
> something like:
> 
> <xsl:variable name="words">
>   <xsl:for-each select="tokenize(...)">
>     <word value="{.}" position="{position()}"/>
>   </
> </
> 
> and then do further processing on this tree.

Ok, I'm horribly muddled now, sorry.  

I understand I build a temporary tree $words which contains 
a whole bunch of individual <word> elements for storing the 
value of the word.  I don't understand how position() helps 
me since what I really want is poemnumber&linenumber? Unless 
when processing $words I'm meant to go find the linenumber based 
somehow on the position? 

What I have currently that completely fails is:
--------------
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="2.0">
<xsl:template match="/">
<!-- want: Word: 1:15, 4:12, etc. type of reference -->
<xsl:for-each-group select="$words" group-by=".">
<xsl:sort/>
<xsl:value-of select="word/@value"/> --   
<xsl:for-each select="current-group()">
<a href="#{concat('poem',@poemnumber,'line',@linenumber)}">
<xsl:value-of select="@poemnumber"/>:<xsl:value-of
select="@linenumber"/></a>
</xsl:for-each>
</xsl:for-each-group>
</xsl:template>

<xsl:variable name="words">
<xsl:for-each select="tokenize(lower-case(string(translate(.,',.!:;',' '))),'\s+')[string(.)]">
<!-- How do I only match text in 'head' and 'l' elements rather than 
all text? -->
<xsl:variable name="poemnumber">
<!-- How do I get poem number here?  i.e. xsl:number
     count="div[@type='poem'] when I was matching 'l' " -->
</xsl:variable>
<xsl:variable name="linenumber">
<!-- How do I get line number here? i.e. xsl:number
     from="div[@type='poem'] when I was matching 'l'-->
</xsl:variable>
 <word value="{.}" litposition="{position()}" poemnumber="$poemnumber"
       linenumber="$linenumber"/>
</xsl:for-each>
</xsl:variable>

<!-- some of the things I don't want to match -->
<xsl:template match="teiHeader|foreign|p|milestone|gap" priority="-1" />
------------------

I'm sure I'm being dense, I'm just not used to doing things with 
temporary trees.

Any more suggestions appreciated,

-James
---
Dr James Cummings, Oxford Text Archive, University of Oxford
James dot Cummings at oucs dot ox dot ac dot uk 

Current Thread