Subject: Re: [xsl] Using XSLT to build an index From: "Mark" <mark@xxxxxxxxxxxx> Date: Sun, 30 Oct 2011 22:29:08 -0700 |
The list archives did not seem to contain an XSLT stylesheet that could index an XML file, but I may have missed it. Is it practical to write my own XSLT 2 indexing stylesheet? If so, I have a bilingual XML file that I want to index.
Where you simply want all words, except your stop words, collected to automate the index generation, I've never been successful with automated indexing myself. For my books I've authored the components of the index, and then pointed to those components from within the code.
My assumptions are that I must get rid of the punctuation properly, then isolate the words, sort them, remove stop words, and so on. To get started, I need a bit of help. All of the phrases are found in two attributes: @czech and @eng.
Three questions:
(1) I am aware from MichaelC"b,b"s book that regex expressions may be used in the replace() function, but I do not know how to write that regex expression. I would like to remove all the punctuation from a phrase as follows: for everything except a hyphen [-], replacement should be with an empty string; the hyphen should be replaced with a single space.
Simple character removal can be done with translate() in XSLT 1 or 2 rather than using a regular expression:
(2) I assume that to get rid of extra spaces (if any), I can use a construct like: normalize-space(replace(@czech, C"b,Ksome regex expressionC"b,b")).
(3) I assume that tokenize(normalize-space(replace(@czech, 'some regex expression'))) will permit me to write out a list of the words found in those attributes to an XML document. I am not completely clear as to what tokenize() returns, or how to access that return.
Actually, you want to turn the expression inside-out to get a list of words from the entire document then something along these lines should work:
distinct-values( (//@czech)/tokenize(translate(normalize-space(.),'-,$%.#',' ')) )
That gives you a sequence of unique words. Can you work from that in order to do the hyperlinking, or do you need help there as well? Remember you will have to do the same translation when creating your links, so perhaps you should have a user function:
-- Contact us for world-wide XML consulting and instructor-led training Crane Softwrights Ltd. http://www.CraneSoftwrights.com/s/ G. Ken Holman mailto:gkholman@xxxxxxxxxxxxxxxxxxxx Google+ profile: https://plus.google.com/116832879756988317389/about Legal business disclaimers: http://www.CraneSoftwrights.com/legal
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Using XSLT to build an in, Mark | Thread | Re: [xsl] Using XSLT to build an in, Michael Kay |
Re: [xsl] Using XSLT to build an in, Mark | Date | Re: [xsl] Using XSLT to build an in, Michael Kay |
Month |