RE: [xsl] String hashing code

Subject: RE: [xsl] String hashing code
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Fri, 14 Dec 2007 09:43:22 -0000
It sounds as if you want a result that is ASCII, that is of modest length,
and that has a high probability of being unique without offering a
guarantee.

You could do the equivalent of

string(sum(for $c at $p in string-to-codepoints(document-uri(/)) return
$c*$p))

(the equivalent in XSLT is a bit more longwinded because of the lack of "at
$p")

Michael Kay
http://www.saxonica.com/


> -----Original Message-----
> From: Deborah Pickett [mailto:debbiep-list-xsl@xxxxxxxxxx] 
> Sent: 14 December 2007 07:36
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] String hashing code
> 
> A challenge to the XSLT demigods...
> 
> I am processing a number of separate XML documents using an 
> Ant <xslt> task, pulling out the MathML that is embedded 
> inside them into their own XML files using 
> xsl:result-document (where I render them using Batik).
> I want to make sure that the result document names don't 
> clash, but because they are across several source files, 
> generate-id() isn't going to suffice.  There are thousands of 
> source files, all with English-sounding names spread across 
> many directories.
> 
> I was thinking of hashing document-uri(/) to produce a 
> probably-unique string that I can then append generate-id(.) 
> to.  I rejected
> encode-for-uri() as producing strings that are too long, and 
> for not anonymizing the document uri enough.  All the hashing 
> algorithms I know (MD5, for instance) happen to be heavy on 
> bitwise operations, and I feel dirty doing bitwise operations 
> with arithmetic.
> 
> I prefer not to escape to non-XSLT, because I am providing 
> this as part of a library that needs to run on almost any 
> XSLT 2.0 platform.
> 
> Any clever ideas?

Current Thread