Re: [xsl] String hashing code

Subject: Re: [xsl] String hashing code
From: Deborah Pickett <debbiep-list-xsl@xxxxxxxxxx>
Date: Sat, 15 Dec 2007 11:31:43 +1100
Deborah Pickett wrote:
> A challenge to the XSLT demigods...

Thanks to everyone for the bewildering range of the responses.  Some
very different solutions.

Some solutions (Rob's idea of appending the ID to the filename)
unfortunately won't work because of limitations with other, third-party,
stages of the pipeline. (In this case, the Batik SVG rasterizer code
dumps all of the rasterized files into the same output directory,
flattening any directory structure and risking collision again.)

Mostly I'm hamstrung by Ant (or the TraX Liaison implementation), which
may either instantiate the stylesheet once per source file, or bundle up
 many source files to pass to one instance, at its own discretion.
Since I can't be sure that only one XSLT processor instance will process
*all* my files, I can't rely on the uniqueness of generate-id($node)
where $node is a locally-scoped variable containing a document (` la
Abel's UUID thread, which was nonetheless fascinating).

(I judged that there were too many source files to fit into memory at
once, so replacing the Ant for-each-file loop with an XSLT one isn't
practical.  saxon:discard-document() probably would help there, but I
can't be sure that those I provide the code to are using Saxon.)

So it's probably back to a hash, which means that I should look at my
actual filenames and try a few hashes on them to see how many collisions
there are.

Current Thread