Re: [xsl] is there a way to hash an element?

Subject: Re: [xsl] is there a way to hash an element?
From: "David Rudel fwqhgads@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 9 Jun 2016 22:19:40 -0000
How about

group-by="string-join(descendant!udf:sha(.),'-')"

where udf:sha is a user-defined function that returns the value (or
name if that is what you need) of the element and the value of each of
its attributes, sorted alphabetically by the name of the attribute.





On Fri, Jun 10, 2016 at 12:08 AM, Graydon graydon@xxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> Hello all --
>
> So I've got about half a gibabyte of XML messages describing various
> health care actions.  Many of these are structural duplicates of each
> other; the top elements differ by their attribute values, but the
> structure and values of the descendant elements is the same.  The amount
> of duplication varies from none to thousands.
>
> I've got an apparently useful heuristic based on descendant attribute
> values, but would -- it is health care data -- really like to have a
> more robust way to group the elements into set of equivalent top-level
> names by their structural sameness.  (I can't hand-check the whole data
> set.)
>
> So I find myself wanting an equivalent of sha256sum for elements so I
> could generate a grouping key from the descendant elements and their
> associated attributes as a unit.
>
> Is there such a thing?  Equivalent approaches?
>
> Thanks!
> Graydon
> 



-- 

"A false conclusion, once arrived at and widely accepted is not
dislodged easily, and the less it is understood, the more tenaciously
it is held." - Cantor's Law of Preservation of Ignorance.

Current Thread