Subject: Re: [xsl] Using XSLT to build an index From: "Mark" <mark@xxxxxxxxxxxx> Date: Sun, 30 Oct 2011 17:14:28 -0700 |
The list archives did not seem to contain an XSLT stylesheet that could index an XML file, but I may have missed it.
-- Cheers, Dimitre Novatchev --------------------------------------- Truly great madness cannot be achieved without significant intelligence. --------------------------------------- To invent, you need a good imagination and a pile of junk ------------------------------------- Never fight an inanimate object ------------------------------------- Quality means doing it right when no one is looking. ------------------------------------- You've achieved success in your field when you don't know whether what you're doing is work or play ------------------------------------- Facts do not cease to exist because they are ignored. ------------------------------------- I finally figured out the only reason to be alive is to enjoy it.
The list archives did not seem to contain an XSLT stylesheet that could
index an XML file, but I may have missed it. Is it practical to write my own
XSLT 2 indexing stylesheet? If so, I have a bilingual XML file that I want
to index. My assumptions are that I must get rid of the punctuation
properly, then isolate the words, sort them, remove stop words, and so on.
To get started, I need a bit of help. All of the phrases are found in two
attributes: @czech and @eng.
Three questions:
(1) I am aware from Michaelbs book that regex expressions may be used in the
replace() function, but I do not know how to write that regex expression. I
would like to remove all the punctuation from a phrase as follows: for
everything except a hyphen [-], replacement should be with an empty string;
the hyphen should be replaced with a single space.
(2) I assume that to get rid of extra spaces (if any), I can use a construct
like: normalize-space(replace(@czech, bsome regex expressionb)).
(3) I assume that tokenize(normalize-space(replace(@czech, 'some regex expression'))) will permit me to write out a list of the words found in those attributes to an XML document. I am not completely clear as to what tokenize() returns, or how to access that return.
I would appreciate any comments, and especially the construction of the regex expression needed. Thanks, Mark
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Using XSLT to build an in, Dimitre Novatchev | Thread | [xsl] Warning: The attribute axis s, Mark |
Re: [xsl] Using XSLT to build an in, Dimitre Novatchev | Date | Re: [xsl] Using XSLT to build an in, Mark |
Month |