Subject: Re: [xsl] String hashing code
From: Deborah Pickett <debbiep-list-xsl@xxxxxxxxxx>
Date: Sat, 15 Dec 2007 11:31:43 +1100
Deborah Pickett wrote: > A challenge to the XSLT demigods... Thanks to everyone for the bewildering range of the responses. Some very different solutions. Some solutions (Rob's idea of appending the ID to the filename) unfortunately won't work because of limitations with other, third-party, stages of the pipeline. (In this case, the Batik SVG rasterizer code dumps all of the rasterized files into the same output directory, flattening any directory structure and risking collision again.) Mostly I'm hamstrung by Ant (or the TraX Liaison implementation), which may either instantiate the stylesheet once per source file, or bundle up many source files to pass to one instance, at its own discretion. Since I can't be sure that only one XSLT processor instance will process *all* my files, I can't rely on the uniqueness of generate-id($node) where $node is a locally-scoped variable containing a document (` la Abel's UUID thread, which was nonetheless fascinating). (I judged that there were too many source files to fit into memory at once, so replacing the Ant for-each-file loop with an XSLT one isn't practical. saxon:discard-document() probably would help there, but I can't be sure that those I provide the code to are using Saxon.) So it's probably back to a hash, which means that I should look at my actual filenames and try a few hashes on them to see how many collisions there are.