|
Subject: Re: [xsl] Linenumbering & word index From: James Cummings <James.Cummings@xxxxxxxxxxxxxx> Date: Fri, 6 Aug 2004 17:39:34 +0100 (BST) |
On Fri, 6 Aug 2004, David Carlisle wrote:
> You can't do
> tokenize(l/text(), '\s+')
> because it wants a single string as its first argument and that's
> probably more than one.
Yup. And that's one of the places I was getting confuddled. :-(
> You can do
> select="for $l in l return tokenize($l,'\s+')"
> or same with for-each and tokenize them one at a time.
ok, I think I understand that, and might work for smaller things.
> however you really want to make yourself a tree first something like:
>
Let's see if I understand the way this works. (I do like getting
solutions, but also want to learn ;-) )
> <xsl:template match="/">
> <xsl:variable name="x">
> <xsl:apply-templates mode="a" select="div[@type='poem']"/>
> </xsl:variable>
Creates variable $x from the templates of mode a below for
only the poem divs. (See, now *that* is how to avoid the
stuff I don't want to include.. *doh*)
> [
> <xsl:copy-of select="$x"/>
> ]
Copy of the temporary tree listing each poem, and word in line
for that poem.
> <xsl:for-each-group select="$x/div/l/word" group-by=".">
Groups by each word in the temporary tree and sorts them
outputting the word
> <xsl:sort />
> <xsl:text> </xsl:text>
> <xsl:value-of select="."/>
then for each instance of a word (keys always confuse me) it
outputs the @poem and @n line numbers.
> <xsl:for-each select="key('w',.)">
> <xsl:text> </xsl:text>
> <xsl:value-of select="../../@poem"/>:<xsl:value-of select="../@n"/>
> </xsl:for-each>
> </xsl:for-each-group>
> </xsl:template>
>
Applies the original mode a match for divs only
to head and lg/l (modes...yes, must use modes more.)
> <xsl:template mode="a" match="div">
> <div poem="{position()}">
> <xsl:apply-templates mode="a" select="head"/>
> <xsl:apply-templates mode="a" select="lg/l"/>
> </div>
> </xsl:template>
>
When you find a head, tokenize it into a temporary
tree of <word> elements
> <xsl:template mode="a" match="head">
> <l n="head">
> <xsl:for-each select="tokenize(.,'(\s|[,\.!])+')">
> <word><xsl:value-of select="lower-case(.)"/></word>
> </xsl:for-each>
> </l>
> </xsl:template>
>
When you find a l tokenize it into a temporary tree
of <word> elements, recording the line's position
> <xsl:template mode="a" match="l">
> <l n="{position()}">
> <xsl:for-each select="tokenize(.,'\s+')">
> <word><xsl:value-of select="."/></word>
> </xsl:for-each>
> </l>
> </xsl:template>
>
For each <word> element that we've just created
make a key of name w.
> <xsl:key name="w" match="word" use="."/>
Seems to work absolutely perfectly. (well, I'll customise
the tokenize string...)
Many many thanks.
-James
---
Dr James Cummings, Oxford Text Archive, University of Oxford
James dot Cummings at oucs dot ox dot ac dot uk
CALL FOR PAPERS: Digital Medievalism (Kalamazoo) and
Early Drama (Leeds) see http://users.ox.ac.uk/~jamesc/cfp.html
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| Re: [xsl] Linenumbering & word inde, David Carlisle | Thread | Re: [xsl] Linenumbering & word inde, David Carlisle |
| RE: [xsl] Automatic generation of X, Pieter Reint Siegers | Date | Re: [xsl] Preserve HTML formatting , David Carlisle |
| Month |