Subject: Re: [xsl] Using XSLT to build an index From: Dimitre Novatchev <dnovatchev@xxxxxxxxx> Date: Sun, 30 Oct 2011 16:54:50 -0700 |
On Sun, Oct 30, 2011 at 2:47 PM, Mark <mark@xxxxxxxxxxxx> wrote: > The list archives did not seem to contain an XSLT stylesheet that could > index an XML file, but I may have missed it. Perhaps my post from 2005 in this list on Concordance Building can help? http://www.stylusstudio.com/xsllist/200511/post00190.html -- Cheers, Dimitre Novatchev --------------------------------------- Truly great madness cannot be achieved without significant intelligence. --------------------------------------- To invent, you need a good imagination and a pile of junk ------------------------------------- Never fight an inanimate object ------------------------------------- Quality means doing it right when no one is looking. ------------------------------------- You've achieved success in your field when you don't know whether what you're doing is work or play ------------------------------------- Facts do not cease to exist because they are ignored. ------------------------------------- I finally figured out the only reason to be alive is to enjoy it. On Sun, Oct 30, 2011 at 2:47 PM, Mark <mark@xxxxxxxxxxxx> wrote: > The list archives did not seem to contain an XSLT stylesheet that could > index an XML file, but I may have missed it. Is it practical to write my own > XSLT 2 indexing stylesheet? If so, I have a bilingual XML file that I want > to index. My assumptions are that I must get rid of the punctuation > properly, then isolate the words, sort them, remove stop words, and so on. > To get started, I need a bit of help. All of the phrases are found in two > attributes: @czech and @eng. > > Three questions: > (1) I am aware from Michaelbs book that regex expressions may be used in the > replace() function, but I do not know how to write that regex expression. I > would like to remove all the punctuation from a phrase as follows: for > everything except a hyphen [-], replacement should be with an empty string; > the hyphen should be replaced with a single space. > > (2) I assume that to get rid of extra spaces (if any), I can use a construct > like: normalize-space(replace(@czech, bsome regex expressionb)). > > (3) I assume that tokenize(normalize-space(replace(@czech, 'some regex > expression'))) will permit me to write out a list of the words found in > those attributes to an XML document. I am not completely clear as to what > tokenize() returns, or how to access that return. > > I would appreciate any comments, and especially the construction of the > regex expression needed. > Thanks, > Mark
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Using XSLT to build an in, Mark | Thread | Re: [xsl] Using XSLT to build an in, Mark |
Re: [xsl] Using XSLT to build an in, Mark | Date | Re: [xsl] Using XSLT to build an in, Mark |
Month |