Re: [xsl] is there a way to hash an element?

I agree with you, computing a hash and adding it as an attribute to the top
level element, then grouping on the hash looks like a good strategy.

I may have missed something in this thread, but I don't recall seeing a
specification of your matching rules that is sufficiently precise to enable
one to write a hash algorithm. We need to see a definition: "two elements A
and B are considered to be the same if and only if they satisfy the following
conditions: ....".

Michael Kay
Saxonica

> On 13 Jun 2016, at 02:17, Graydon graydon@xxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Sat, Jun 11, 2016 at 05:21:09PM -0000, Dimitre Novatchev
dnovatchev@xxxxxxxxx scripsit:
> Hi Dimitre --
>> Actually, I believe that calling deep-equal() can be more efficient
>> than comparing hashes.
>>
>> The reason is simple: deep-equal() most probably returns false at the
>> first possible moment -- for example, noticing that an element has
>> different attributes than its counterpart.
>>
>> On the other side, with hashing,  the hashes for the two whole
>> subtrees have to be calculated and only after that they can be
>> compared.
>>
>> To summarize, with the exception of the case when the two subtrees are
>> equal, deep-equal may perform faster than generating and comparing
>> hashes on the subtrees.
>
> I've got one input document with ~5000 trees that are mappable to XSD
> schema definitions; about half are complexTypes.  Many are structurally
> the same but have different names. (All ~5000 have unique names.)
>
> The idea is to group them by structural sameness; deep-equal, even very
> efficiently implemented deep-equal, gives me n^2 as I have to go through
> the whole tree for each element and ask "are you like me?" pairwise.
> Some of the equivalent structures will have a lot of matches -- hundreds
> -- where I can't expect deep-equal to fail quickly and thus efficiently.
>
> Going through and decorating every element with its hash value
> (@hash="something") and then using for-each-group on the lot on the
> basis of the hash gives me 2n.  Even if it's a very naive hash
> implementation, I'd expect 2n to beat n^2 performance.
>
> Am I missing something?
>
> (I'll certainly keep deep-equal in mind if the hash approach has
> unacceptable performance.)
>
> -- Graydon

Current Thread
Re: [xsl] is there a way to hash an element?, (continued) Liam R. E. Quin liam@xxxxxx - 11 Jun 2016 07:18:48 -0000 Dimitre Novatchev dnovatchev@xxxxxxxxx - 11 Jun 2016 17:21:01 -0000 Graydon graydon@xxxxxxxxx - 13 Jun 2016 01:17:24 -0000 Dimitre Novatchev dnovatchev@xxxxxxxxx - 13 Jun 2016 06:54:22 -0000 Michael Kay mike@xxxxxxxxxxxx - 13 Jun 2016 07:04:00 -0000 <= Graydon graydon@xxxxxxxxx - 13 Jun 2016 09:48:12 -0000 Michael Kay mike@xxxxxxxxxxxx - 13 Jun 2016 10:12:05 -0000 Michael Kay mike@xxxxxxxxxxxx - 10 Jun 2016 09:30:24 -0000 Graydon graydon@xxxxxxxxx - 10 Jun 2016 21:49:09 -0000

Current Thread

Re: [xsl] is there a way to hash an element?, (continued)
- Michael Kay mike@xxxxxxxxxxxx - 10 Jun 2016 09:30:24 -0000
  - Graydon graydon@xxxxxxxxx - 10 Jun 2016 21:49:09 -0000

<- Previous	Index	Next ->
Re: [xsl] is there a way to hash an, Dimitre Novatchev dn	Thread	Re: [xsl] is there a way to hash an, Graydon graydon@xxxx
Re: [xsl] is there a way to hash an, Dimitre Novatchev dn	Date	Re: [xsl] is there a way to hash an, Graydon graydon@xxxx
	Month

<-prev [Thread] next->	<-prev [Date] next->
Month Index \| List Home