Re: [xsl] is there a way to hash an element?

Subject: Re: [xsl] is there a way to hash an element?
From: "Graydon graydon@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 13 Jun 2016 01:17:24 -0000
On Sat, Jun 11, 2016 at 05:21:09PM -0000, Dimitre Novatchev dnovatchev@xxxxxxxxx scripsit:
Hi Dimitre --
> Actually, I believe that calling deep-equal() can be more efficient
> than comparing hashes.
> 
> The reason is simple: deep-equal() most probably returns false at the
> first possible moment -- for example, noticing that an element has
> different attributes than its counterpart.
> 
> On the other side, with hashing,  the hashes for the two whole
> subtrees have to be calculated and only after that they can be
> compared.
> 
> To summarize, with the exception of the case when the two subtrees are
> equal, deep-equal may perform faster than generating and comparing
> hashes on the subtrees.

I've got one input document with ~5000 trees that are mappable to XSD
schema definitions; about half are complexTypes.  Many are structurally
the same but have different names. (All ~5000 have unique names.)

The idea is to group them by structural sameness; deep-equal, even very
efficiently implemented deep-equal, gives me n^2 as I have to go through
the whole tree for each element and ask "are you like me?" pairwise.
Some of the equivalent structures will have a lot of matches -- hundreds
-- where I can't expect deep-equal to fail quickly and thus efficiently.

Going through and decorating every element with its hash value
(@hash="something") and then using for-each-group on the lot on the
basis of the hash gives me 2n.  Even if it's a very naive hash
implementation, I'd expect 2n to beat n^2 performance.

Am I missing something?

(I'll certainly keep deep-equal in mind if the hash approach has
unacceptable performance.)

-- Graydon

Current Thread