Subject: Re: [xsl] Using memory addressing to retrieve a value vice using a software string library to retrieve a value From: "Dimitre Novatchev dnovatchev@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Fri, 20 Nov 2015 16:56:41 -0000 |
> I am aware of at least two very efficient algorithms: While the two described algorithms (of contains() and index-of()) aren't directly related to the question, what I wanted to say was that just by saying "string search algorithms" this shouldn't imply in general that any such algorithm is inefficient. Cheers, Dimitre On Fri, Nov 20, 2015 at 8:38 AM, Dimitre Novatchev dnovatchev@xxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > In addition to the replies by Dr. Kay and David Carlisle: > > It depends on the concrete case and the concrete XML document and the > string itself. > > 1. Imagine a string starting with a space and then having the wanted > substring (that is not too-long). This may be faster than scanning > sequentially through thousands of nodes -- as in general an XPath > engine may use a linked-list and not a vector. > > 2. On the other side, imagine an XPath engine that has the sequence of > nodes in a vector (self-expandable array structure). Then the wanted > node can be obtained in O(1). Imagine at the same time that you have a > sting long many Megs, snd the first space happens only at the end of > the string, and the wanted substring is also very long. In this case > searching through the sting using the specified expression can be > orders of magnitude slower. > > A general remark about finding the first/all occurrence of a given > string as a substring of another given string, or whether a string1 > contains a string2. > > I am aware of at least two very efficient algorithms: > > 1. Using *suffix-trees* -- all possible suffixes of the given > string are organized as a trie. The search (same as the contains() > function) is very fast. > > 2. Using the hash of the search-string and scanning the given string > and sequentially computing the hash of every next substring with the > same length as the search string. Because computing the hash of the > moving "window" can be done incrementally, this algorithm is O(N) -- > much better than the naC/ve O(N*M) algorithm. More about this -- in the > book of Steven Skiena "The Algorithm Design Manual", > http://www.amazon.com/Algorithm-Design-Manual-Steven-Skiena/dp/1849967202/ref =sr_1_1?s=books&ie=UTF8&qid=1448037467&sr=1-1&keywords=skiena&pebp=1448037471 640&perid=0D1Z0T6Y90MCMDPDA330 > > > Cheers, > Dimitre > > > To summarize: All depends. > > > > On Fri, Nov 20, 2015 at 4:13 AM, Costello, Roger L. costello@xxxxxxxxx > <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: >> Hi Folks, >> >> I want to retrieve "west". >> >> Which of these is faster? >> >> ------------------- >> Approach #1 >> ------------------- >> The <edge> element contains text: >> >> <edge>garden west door</edge> >> >> "west" is retrieved using this XPath: >> >> substring-before(substring-after(., ' '), ' ') >> >> Note: assume that <edge> is the context node. >> >> ------------------- >> Approach #2 >> ------------------- >> The <edge> element contains markup: >> >> <edge> >> <garden/> >> <west/> >> <door/> >> </edge> >> >> "west" is retrieved using this XPath: >> >> *[2]/name() >> >> >> I believe that Approach #2 is much, much faster. Do you agree? >> >> As I see it, Approach #2 is simply a memory addressing task (which computers do very well) to obtain <west/> and then output the symbols at that memory address. Approach #1, on the other hand, requires the use of software string libraries, which, I imagine, results in hundreds or thousands of machine instructions. In fact, I would imagine that Approach #2 is hundreds or thousands of times faster than Approach #1. Do you agree? >> >> /Roger >> > > > > -- > Cheers, > Dimitre Novatchev > --------------------------------------- > Truly great madness cannot be achieved without significant intelligence. > --------------------------------------- > To invent, you need a good imagination and a pile of junk > ------------------------------------- > Never fight an inanimate object > ------------------------------------- > To avoid situations in which you might make mistakes may be the > biggest mistake of all > ------------------------------------ > Quality means doing it right when no one is looking. > ------------------------------------- > You've achieved success in your field when you don't know whether what > you're doing is work or play > ------------------------------------- > To achieve the impossible dream, try going to sleep. > ------------------------------------- > Facts do not cease to exist because they are ignored. > ------------------------------------- > Typing monkeys will write all Shakespeare's works in 200yrs.Will they > write all patents, too? :) > ------------------------------------- > Sanity is madness put to good use. > ------------------------------------- > I finally figured out the only reason to be alive is to enjoy it. > -- Cheers, Dimitre Novatchev --------------------------------------- Truly great madness cannot be achieved without significant intelligence. --------------------------------------- To invent, you need a good imagination and a pile of junk ------------------------------------- Never fight an inanimate object ------------------------------------- To avoid situations in which you might make mistakes may be the biggest mistake of all ------------------------------------ Quality means doing it right when no one is looking. ------------------------------------- You've achieved success in your field when you don't know whether what you're doing is work or play ------------------------------------- To achieve the impossible dream, try going to sleep. ------------------------------------- Facts do not cease to exist because they are ignored. ------------------------------------- Typing monkeys will write all Shakespeare's works in 200yrs.Will they write all patents, too? :) ------------------------------------- Sanity is madness put to good use. ------------------------------------- I finally figured out the only reason to be alive is to enjoy it.
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Using memory addressing t, Dimitre Novatchev dn | Thread | Re: [xsl] Using memory addressing t, Michael Kay mike@xxx |
Re: [xsl] Using memory addressing t, Dimitre Novatchev dn | Date | Re: [xsl] Using memory addressing t, Michael Kay mike@xxx |
Month |