Subject: RE: [xsl] Linenumbering & word index From: "Michael Kay" <mhk@xxxxxxxxx> Date: Fri, 6 Aug 2004 16:23:00 +0100 |
> -----Original Message----- > From: James Cummings [mailto:James.Cummings@xxxxxxxxxxxxxx] > Sent: 06 August 2004 14:41 > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Subject: Re: [xsl] Linenumbering & word index > > On Fri, 6 Aug 2004, David Carlisle wrote: > > > > > I lost or forgot the start of this thread so I'll ignore your main > > questions but I can answer one of the questions in comments > > Right, I'll start from the beginning again then. > In a document with a lot of poems laid out as: > <div type="poem"> > <head>headers should be included in word index</head> > <lg> > <l>This is a line that really should be included</l> > <l>This is a line that should be included</l> > </lg> > <p>This shouldn't be included</p> > <lg> > <l>This is a line that really should be included</l> > <l>This is a line that should be included</l> > </lg> > </div> > > What I want to produce is a word-index of > poem number and line number, something like: > > a (4) -- 1:1, 1:2, 1:3, 1:4, 2:3, 2:5 (well, no poem 2 here ;-) ) > be (5) -- 1:head, 1:1, 1:2, 1:3, 1:4 > ... > really (2) -- 1:1, 1:3, 2:1, 2:3 (if it was in poem 2 as well) What I was trying to suggest was that you go in two phases: (a) build a list containing (word, poem number, line number) (b) group that list by word and that the output of (a) should be a temporary tree. Sorry if the reference to position() confused you - I was concentrating on the top-level design, not the detail. For example phase 1 might actually be <xsl:variable name="wordlist"> <xsl:for-each select="//text()"> <xsl:for-each select="tokenize(., xxx)"> <word w="."> <poem><xsl:number count="poem"/></poem> <line><xsl:number count="l"/></line> </word> </ </ </ Michael Kay > > I had previously done word frequency lists as: > ------- > <xsl:template match="/"> > <xsl:for-each-group > select="tokenize(lower-case(string(translate(.,',.!:;',' > '))),'\s+')[string(.)]" group-by="."> > <xsl:sort />[<xsl:value-of select="."/> - <xsl:value-of > select="count(current-group())"/>] > </xsl:for-each-group> > </xsl:template> > ------ > > And Mike suggested I first build a temporary tree something like: > <xsl:variable name="words"> > <xsl:for-each select="tokenize(., '\s+')"> > <word value="{.}" position="{position()}"/> > </xsl:for-each> > > But I don't see how I a) tokenize only the output of l/text() and > head/text() (it complains of multiple inputs when I do so), and > b) how I get line-number and poem-number based on position()? > -------------- > My completely messed up xsl so far is: > <xsl:template match="l/text()"> > <xsl:for-each-group select="$words" group-by="."> > <xsl:sort/> > <xsl:value-of select="word/@value"/> -- > <xsl:for-each select="current-group()"> > <a href="#{concat('poem',@poemnumber,'line',@linenumber)}"> > <xsl:value-of select="@poemnumber"/>:<xsl:value-of > select="@linenumber"/></a> > </xsl:for-each> > </xsl:for-each-group> > </xsl:template> > > <xsl:variable name="words"> > <xsl:for-each select="tokenize(lower-case(string(translate(.,',.!:;',' > '))),'\s+')[string(.)]"> > <!-- How do I only match text in 'head' and 'l' elements? --> > <xsl:variable name="poemnumber"> > <!-- How do I get poem number here? i.e. xsl:number > count="div[@type='poem'] when I was matching 'l' " --> > </xsl:variable> > <xsl:variable name="linenumber"> > <!-- How do I get line number here? i.e. xsl:number > from="div[@type='poem'] when I was matching 'l'--> > </xsl:variable> > <word value="{.}" litposition="{position()}" poemnumber="$poemnumber" > linenumber="$linenumber"/> > </xsl:for-each> > </xsl:variable> > > <!-- some of the things I don't want to match --> > <xsl:template match="teiHeader|foreign|p|milestone|gap" > priority="-1" /> > ------------------ > > Does that clarify my confuddled state of mind? > > -James > --- > Dr James Cummings, Oxford Text Archive, University of Oxford > James dot Cummings at oucs dot ox dot ac dot uk > CALL FOR PAPERS: Digital Medievalism (Kalamazoo) and > Early Drama (Leeds) see http://users.ox.ac.uk/~jamesc/cfp.html
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Linenumbering & word inde, David Carlisle | Thread | RE: [xsl] Preserve HTML formatting , Karl J. Stubsjoen |
RE: [xsl] Preserve HTML formatting , Karl J. Stubsjoen | Date | RE: [xsl] Preserve HTML formatting , Michael Kay |
Month |