Re: [xsl] is there a way to hash an element?

Subject: Re: [xsl] is there a way to hash an element?
From: "David Rudel fwqhgads@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 10 Jun 2016 07:59:38 -0000
Note that if these serializations end up being very long and you want
to reduce to a small signature (to match a typical hash), you can use
string-to-codepoints() function to generate a set of integers from any
string that can be used to roll-you-own hashing function. Since you
are just interested in checking that two descendant subtrees are
identical---and are not concerned with security---a very simple
compaction function would work fine. For example, you could create a
user-defined function that takes any sequence of integers and returns
the string X---Y, where X = length of sequence and Y is the remainder
of $seq!(position() * .) upon division by a suitably large number (an
extension of the typical UPC checksum algorithm).

On Fri, Jun 10, 2016 at 12:51 AM, Dimitre Novatchev
dnovatchev@xxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> You may even not need a hash function.
>
> Just use the standard XPath 3.0 function:
>
>   serialize()
>
>
> http://www.w3.org/TR/xpath-functions-30/#func-serialize
>
>
> Cheers,
> Dimitre
>
> On Thu, Jun 9, 2016 at 3:08 PM, Graydon graydon@xxxxxxxxx
> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>> Hello all --
>>
>> So I've got about half a gibabyte of XML messages describing various
>> health care actions.  Many of these are structural duplicates of each
>> other; the top elements differ by their attribute values, but the
>> structure and values of the descendant elements is the same.  The amount
>> of duplication varies from none to thousands.
>>
>> I've got an apparently useful heuristic based on descendant attribute
>> values, but would -- it is health care data -- really like to have a
>> more robust way to group the elements into set of equivalent top-level
>> names by their structural sameness.  (I can't hand-check the whole data
>> set.)
>>
>> So I find myself wanting an equivalent of sha256sum for elements so I
>> could generate a grouping key from the descendant elements and their
>> associated attributes as a unit.
>>
>> Is there such a thing?  Equivalent approaches?
>>
>> Thanks!
>> Graydon
>>
>
>
>
> --
> Cheers,
> Dimitre Novatchev
> ---------------------------------------
> Truly great madness cannot be achieved without significant intelligence.
> ---------------------------------------
> To invent, you need a good imagination and a pile of junk
> -------------------------------------
> Never fight an inanimate object
> -------------------------------------
> To avoid situations in which you might make mistakes may be the
> biggest mistake of all
> ------------------------------------
> Quality means doing it right when no one is looking.
> -------------------------------------
> You've achieved success in your field when you don't know whether what
> you're doing is work or play
> -------------------------------------
> To achieve the impossible dream, try going to sleep.
> -------------------------------------
> Facts do not cease to exist because they are ignored.
> -------------------------------------
> Typing monkeys will write all Shakespeare's works in 200yrs.Will they
> write all patents, too? :)
> -------------------------------------
> Sanity is madness put to good use.
> -------------------------------------
> I finally figured out the only reason to be alive is to enjoy it.
> 



-- 

"A false conclusion, once arrived at and widely accepted is not
dislodged easily, and the less it is understood, the more tenaciously
it is held." - Cantor's Law of Preservation of Ignorance.

Current Thread