Re: [xsl] Using memory addressing to retrieve a value vice using a software string library to retrieve a value

Subject: Re: [xsl] Using memory addressing to retrieve a value vice using a software string library to retrieve a value
From: "Dimitre Novatchev dnovatchev@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 20 Nov 2015 16:56:41 -0000
> I am aware of at least two very efficient algorithms:

While the two described algorithms (of contains() and index-of())
aren't directly related to the question, what I wanted to say was that
just by saying "string search algorithms" this shouldn't imply in
general that any such algorithm is inefficient.


Cheers,
Dimitre

On Fri, Nov 20, 2015 at 8:38 AM, Dimitre Novatchev
dnovatchev@xxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> In addition to the replies by Dr. Kay and David Carlisle:
>
> It depends on the concrete case and the concrete XML document and the
> string itself.
>
> 1. Imagine a string starting with a space and then having the wanted
> substring (that is not too-long). This may be faster than scanning
> sequentially through thousands of nodes -- as in general an XPath
> engine may use a linked-list and not a vector.
>
> 2. On the other side, imagine an XPath engine that has the sequence of
> nodes in a vector (self-expandable array structure). Then the wanted
> node can be obtained in O(1). Imagine at the same time that you have a
> sting long many Megs, snd the first space happens only at the end of
> the string, and the wanted substring is also very long. In this case
> searching through the sting using the specified expression can be
> orders of magnitude slower.
>
> A general remark about finding the first/all  occurrence of a given
> string as a substring of another given string, or whether a string1
> contains a string2.
>
> I am aware of at least two very efficient algorithms:
>
>   1. Using *suffix-trees*  -- all possible suffixes of the given
> string are organized as a trie.  The search (same as the contains()
> function) is very fast.
>
>   2. Using the hash of the search-string and scanning the given string
> and sequentially computing the hash of every next substring with the
> same length as the search string. Because computing the hash of the
> moving "window" can be done incrementally, this algorithm is O(N)  --
> much better than the naC/ve O(N*M) algorithm. More about this -- in the
> book of Steven Skiena "The Algorithm Design Manual",
>
http://www.amazon.com/Algorithm-Design-Manual-Steven-Skiena/dp/1849967202/ref
=sr_1_1?s=books&ie=UTF8&qid=1448037467&sr=1-1&keywords=skiena&pebp=1448037471
640&perid=0D1Z0T6Y90MCMDPDA330
>
>
> Cheers,
> Dimitre
>
>
> To summarize:  All depends.
>
>
>
> On Fri, Nov 20, 2015 at 4:13 AM, Costello, Roger L. costello@xxxxxxxxx
> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>> Hi Folks,
>>
>> I want to retrieve "west".
>>
>> Which of these is faster?
>>
>> -------------------
>> Approach #1
>> -------------------
>> The <edge> element contains text:
>>
>>         <edge>garden west door</edge>
>>
>> "west" is retrieved using this XPath:
>>
>>         substring-before(substring-after(., ' '), ' ')
>>
>> Note: assume that <edge> is the context node.
>>
>> -------------------
>> Approach #2
>> -------------------
>> The <edge> element contains markup:
>>
>>         <edge>
>>             <garden/>
>>             <west/>
>>             <door/>
>>         </edge>
>>
>> "west" is retrieved using this XPath:
>>
>>         *[2]/name()
>>
>>
>> I believe that Approach #2 is much, much faster. Do you agree?
>>
>> As I see it, Approach #2 is simply a memory addressing task (which
computers do very well) to obtain <west/> and then output the symbols at that
memory address. Approach #1, on the other hand, requires the use of software
string libraries, which, I imagine, results in hundreds or thousands of
machine instructions. In fact, I would imagine that Approach #2 is hundreds or
thousands of times faster than Approach #1. Do you agree?
>>
>> /Roger
>>
>
>
>
> --
> Cheers,
> Dimitre Novatchev
> ---------------------------------------
> Truly great madness cannot be achieved without significant intelligence.
> ---------------------------------------
> To invent, you need a good imagination and a pile of junk
> -------------------------------------
> Never fight an inanimate object
> -------------------------------------
> To avoid situations in which you might make mistakes may be the
> biggest mistake of all
> ------------------------------------
> Quality means doing it right when no one is looking.
> -------------------------------------
> You've achieved success in your field when you don't know whether what
> you're doing is work or play
> -------------------------------------
> To achieve the impossible dream, try going to sleep.
> -------------------------------------
> Facts do not cease to exist because they are ignored.
> -------------------------------------
> Typing monkeys will write all Shakespeare's works in 200yrs.Will they
> write all patents, too? :)
> -------------------------------------
> Sanity is madness put to good use.
> -------------------------------------
> I finally figured out the only reason to be alive is to enjoy it.
>



--
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
To avoid situations in which you might make mistakes may be the
biggest mistake of all
------------------------------------
Quality means doing it right when no one is looking.
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play
-------------------------------------
To achieve the impossible dream, try going to sleep.
-------------------------------------
Facts do not cease to exist because they are ignored.
-------------------------------------
Typing monkeys will write all Shakespeare's works in 200yrs.Will they
write all patents, too? :)
-------------------------------------
Sanity is madness put to good use.
-------------------------------------
I finally figured out the only reason to be alive is to enjoy it.

Current Thread