Re: [xsl] String hashing code

Subject: Re: [xsl] String hashing code
From: Sascha Mantscheff <922492@xxxxxx>
Date: Fri, 14 Dec 2007 09:07:58 +0100
I don't have a XSLT solution. With Saxon I use for a similar problem an extension which returns md5 hashes for the serialized content. Find the source code below. I call it from within XSLT with

<xsl:variable name="serialized_content">
<xsl:value-of select="saxon:serialize(current-group()[1],'')"/>
</xsl:variable>
<xsl:variable name="hash">
<xsl:value-of select="md5:md5($serialized_content)"/>
</xsl:variable>


--- file md5.java ---
import java.util.*;
import java.io.*;
import java.security.*;

/* Saxon extension for generating unique hash values. */

public class Md5 {
public static String hex(byte[] array) {
StringBuffer sb = new StringBuffer();
for (int i = 0; i < array.length; ++i) {
sb.append(Integer.toHexString((array[i] & 0xFF) | 0x100).toUpperCase().substring(1,3));
}
return sb.toString();
}
public static String md5 (String message) throws NoSuchAlgorithmException, UnsupportedEncodingException {
MessageDigest md = MessageDigest.getInstance("MD5");
return hex (md.digest(message.getBytes("CP1252")));
}
}



Deborah Pickett schrieb:
A challenge to the XSLT demigods...

I am processing a number of separate XML documents using an Ant <xslt>
task, pulling out the MathML that is embedded inside them into their own
XML files using xsl:result-document (where I render them using Batik).
I want to make sure that the result document names don't clash, but
because they are across several source files, generate-id() isn't going
to suffice.  There are thousands of source files, all with
English-sounding names spread across many directories.

I was thinking of hashing document-uri(/) to produce a probably-unique
string that I can then append generate-id(.) to.  I rejected
encode-for-uri() as producing strings that are too long, and for not
anonymizing the document uri enough.  All the hashing algorithms I know
(MD5, for instance) happen to be heavy on bitwise operations, and I feel
dirty doing bitwise operations with arithmetic.

I prefer not to escape to non-XSLT, because I am providing this as part
of a library that needs to run on almost any XSLT 2.0 platform.

Any clever ideas?

Current Thread