Subject: Re: [xsl] Concordance with XSLT From: Dimitre Novatchev <dnovatchev@xxxxxxxxx> Date: Sun, 6 Nov 2005 18:23:12 +1100 |
Here's a quick result. The following is an XSLT2.0 transformation, which produces the concordance for a given word. The example shows the results for all 56 occurences of the word "loved" in the Old Testament. On my 3GHz PC this took 250 milliseconds. By using the function f:wordConcord() it is straightforword to produce a complete concordance, first finding all unique words in the text and then invoking f:wordConcord() for every word in this set. Certainly, there is a much faster algorithm (, which hopefully doesn't require too much memory), in which the complete concordance is produced by reading the text just once (not reading the document once for every unique word) -- I'll play with this when there's again some free time. Below is the xslt code: <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:f="http://fxsl.sf.net/" exclude-result-prefixes="f xs" > <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:template match="/"> <concord> <xsl:sequence select="f:wordConcord(/,'loved')"/> </concord> </xsl:template> <xsl:function name="f:wordConcord" as="element()*"> <xsl:param name="pDoc" as="document-node()"/> <xsl:param name="pWord" as="xs:string"/> <xsl:for-each select= "$pDoc/tstmt/bookcoll/book/chapter/(.|div) /v[contains(.,$pWord)]"> <xsl:variable name="vverseWords" select="tokenize(lower-case(string(.)), '[\s.?!,;:\-]+')[.]"/> <xsl:if test="$pWord = $vverseWords"> <xsl:variable name="vVerse" select="."/> <xsl:for-each select="$vverseWords[. = $pWord]"> <occurs w="{$pWord}" book="{substring($vVerse/ancestor::book[1]/bktshort,1,3)}" chapter="{count($vVerse/ancestor::chapter[1] /preceding-sibling::chapter)+1}" verse="{count($vVerse/preceding-sibling::v)+1}" > <xsl:sequence select= "f:displayContext(string($vVerse), $pWord, position(), 15)" /> </occurs> </xsl:for-each> </xsl:if> </xsl:for-each> </xsl:function> <xsl:function name="f:displayContext" as="xs:string"> <xsl:param name="pText" as="xs:string"/> <xsl:param name="pWord" as="xs:string"/> <xsl:param name="pwordNum" as="xs:integer"/> <xsl:param name="pRadius" /> <xsl:variable name="vwOffset" select= "f:nthWord($pText, $pWord, $pwordNum, 0)" /> <xsl:variable name="vWLen" select="string-length($pWord)"/> <xsl:variable name="vText2" select= "concat(substring($pText,1,$vwOffset ), substring($pWord,1,1), '.', substring($pText,$vwOffset+$vWLen+1) )" /> <xsl:variable name="vStart" select= "if($vwOffset > $pRadius) then $vwOffset - $pRadius else 1" /> <xsl:sequence select= "substring($vText2, $vStart, 2*$pRadius+$vWLen)" /> </xsl:function> <xsl:function name="f:nthWord" as="xs:integer"> <xsl:param name="pText" as="xs:string"/> <xsl:param name="pWord" as="xs:string"/> <xsl:param name="pwordNum" as="xs:integer"/> <xsl:param name="pOffset" as="xs:integer"/> <xsl:variable name="vZ" select="'ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ'"/> <xsl:variable name="vWLen" select="string-length($pWord)"/> <xsl:variable name="vTLen" select="string-length($pText)"/> <xsl:variable name="vZWord" select= "substring($vZ,1,$vWLen)"/> <xsl:sequence select= "if($pwordNum lt 1) then $pOffset - $vWLen +1 else for $txt in replace($pText, concat('[^\w]',$pWord, '[^\w]'), concat(' ',$vZWord, ' ') ), $off in string-length(substring-before($txt,$vZWord)) return f:nthWord(substring($pText, $off + $vWLen), $pWord, $pwordNum - 1, $pOffset +$vTLen -string-length(substring-after($txt,$vZWord)) -1 ) " /> </xsl:function> </xsl:stylesheet> When this moderate length (105 lines) transformation is applied on the xml-ized version of the Old Testament ot.xml (ommitted for brevity), the following result is produced: <concord> <occurs w="loved" book="Gen" chapter="24" verse="67">is wife; and he l. her: and Isaac w</occurs> <occurs w="loved" book="Gen" chapter="25" verse="28">And Isaac l. Esau, because he did e</occurs> <occurs w="loved" book="Gen" chapter="25" verse="28">on: but Rebekah l. Jacob. </occurs> <occurs w="loved" book="Gen" chapter="27" verse="14">h as his father l.. </occurs> <occurs w="loved" book="Gen" chapter="29" verse="18">And Jacob l. Rachel; and said, I wi</occurs> <occurs w="loved" book="Gen" chapter="29" verse="30"> Rachel, and he l. also Rachel more</occurs> <occurs w="loved" book="Gen" chapter="34" verse="3">f Jacob, and he l. the damsel, and </occurs> <occurs w="loved" book="Gen" chapter="37" verse="3">Now Israel l. Joseph more than all </occurs> <occurs w="loved" book="Gen" chapter="37" verse="4">at their father l. him more than al</occurs> <occurs w="loved" book="Deu" chapter="4" verse="37">And because he l. thy fathers, ther</occurs> <occurs w="loved" book="Deu" chapter="7" verse="8">ecause the LORD l. you, and because</occurs> <occurs w="loved" book="Deu" chapter="23" verse="5">he LORD thy God l. thee. </occurs> <occurs w="loved" book="Deu" chapter="33" verse="3">Yea, he l. the people; all his sain</occurs> <occurs w="loved" book="Jud" chapter="16" verse="4">erward, that he l. a woman in the v</occurs> <occurs w="loved" book="1 S" chapter="1" verse="5">portion; for he l. Hannah: but the </occurs> <occurs w="loved" book="1 S" chapter="16" verse="21">ore him: and he l. him greatly; and</occurs> <occurs w="loved" book="1 S" chapter="18" verse="1">d, and Jonathan l. him as his own s</occurs> <occurs w="loved" book="1 S" chapter="18" verse="3">ant, because he l. him as his own s</occurs> <occurs w="loved" book="1 S" chapter="18" verse="16">srael and Judah l. David, because h</occurs> <occurs w="loved" book="1 S" chapter="18" verse="20">Saul's daughter l. David: and they </occurs> <occurs w="loved" book="1 S" chapter="18" verse="28">Saul's daughter l. him. </occurs> <occurs w="loved" book="1 S" chapter="20" verse="17">ain, because he l. him: for he love</occurs> <occurs w="loved" book="1 S" chapter="20" verse="17">ved him: for he l. him as he loved </occurs> <occurs w="loved" book="1 S" chapter="20" verse="17">loved him as he l. his own soul. </occurs> <occurs w="loved" book="2 S" chapter="12" verse="24">n: and the LORD l. him. </occurs> <occurs w="loved" book="2 S" chapter="13" verse="1">he son of David l. her. </occurs> <occurs w="loved" book="2 S" chapter="13" verse="15">herewith he had l. her. And Amnon s</occurs> <occurs w="loved" book="1 K" chapter="3" verse="3">And Solomon l. the LORD, walking in</occurs> <occurs w="loved" book="1 K" chapter="10" verse="9">ecause the LORD l. Israel for ever,</occurs> <occurs w="loved" book="1 K" chapter="11" verse="1">ut king Solomon l. many strange wom</occurs> <occurs w="loved" book="2 C" chapter="2" verse="11">e the LORD hath l. his people, he h</occurs> <occurs w="loved" book="2 C" chapter="9" verse="8">because thy God l. Israel, to estab</occurs> <occurs w="loved" book="2 C" chapter="11" verse="21">And Rehoboam l. Maachah the daughte</occurs> <occurs w="loved" book="2 C" chapter="26" verse="10"> Carmel: for he l. husbandry. </occurs> <occurs w="loved" book="Est" chapter="2" verse="17">And the king l. Esther above all th</occurs> <occurs w="loved" book="Job" chapter="19" verse="19">and they whom I l. are turned again</occurs> <occurs w="loved" book="Psa" chapter="26" verse="8">LORD, I have l. the habitation of t</occurs> <occurs w="loved" book="Psa" chapter="47" verse="4">f Jacob whom he l.. Selah. </occurs> <occurs w="loved" book="Psa" chapter="78" verse="68">t Zion which he l.. </occurs> <occurs w="loved" book="Psa" chapter="109" verse="17">As he l. cursing, so let it come un</occurs> <occurs w="loved" book="Psa" chapter="119" verse="7">s, which I have l.. </occurs> <occurs w="loved" book="Psa" chapter="119" verse="8">s, which I have l.; and I will medi</occurs> <occurs w="loved" book="Isa" chapter="43" verse="4">ble, and I have l. thee: therefore </occurs> <occurs w="loved" book="Isa" chapter="48" verse="14">? The LORD hath l. him: he will do </occurs> <occurs w="loved" book="Jer" chapter="2" verse="25"> no; for I have l. strangers, and a</occurs> <occurs w="loved" book="Jer" chapter="8" verse="2"> whom they have l., and whom they h</occurs> <occurs w="loved" book="Jer" chapter="14" verse="10"> Thus have they l. to wander, they </occurs> <occurs w="loved" book="Jer" chapter="31" verse="3">ng, Yea, I have l. thee with an eve</occurs> <occurs w="loved" book="Eze" chapter="16" verse="37"> that thou hast l., with all them t</occurs> <occurs w="loved" book="Hos" chapter="9" verse="1"> God, thou hast l. a reward upon ev</occurs> <occurs w="loved" book="Hos" chapter="9" verse="10">cording as they l.. </occurs> <occurs w="loved" book="Hos" chapter="11" verse="1">a child, then I l. him, and called </occurs> <occurs w="loved" book="Mal" chapter="1" verse="2">I have l. you, saith the LORD. Yet </occurs> <occurs w="loved" book="Mal" chapter="1" verse="2">erein hast thou l. us? Was not Esau</occurs> <occurs w="loved" book="Mal" chapter="1" verse="2">the LORD: yet I l. Jacob, </occurs> <occurs w="loved" book="Mal" chapter="2" verse="11">e LORD which he l., and hath marrie</occurs> </concord> Hope this helped. -- Cheers, Dimitre Novatchev --------------------------------------- To avoid situations in which you might make mistakes may be the biggest mistake of all. On 11/6/05, Dimitre Novatchev <dnovatchev@xxxxxxxxx> wrote: > > Just curious, do you think XSL is the best tool for this job or > > something that can be used to do this job? > > Can't say in advance. > > As we know, XSLT 2.0 has better string processing capabilities (such > as regular expressions) and is easier and more appropriate to use for > string processing than XSLT1.0. > > My personal preference would be to use FXSL 2.0, having used it > successfully for other string processing tasks such as spelling > checking and text justification. > > Also, Saxon 8.6 just came out with a huge improvement in appending to > a sequence -- it is logical to expect a very similar improvement for > string concatenation in the future... > > To summarise: I wouldn't be surprised if XSLT 2.0 + FXSL 2.0 handle > this task better than expected. > > -- > Cheers, > Dimitre Novatchev > --------------------------------------- > To avoid situations in which you might make mistakes may be the > biggest mistake of all.
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Concordance with XSLT, Dimitre Novatchev | Thread | Re: [xsl] Concordance with XSLT, Dimitre Novatchev |
Re: [xsl] xsl:key only checks first, David Carlisle | Date | Re: [xsl] Concordance with XSLT, Dimitre Novatchev |
Month |