[xsl] String cleaning in XSLT and XQuery

Subject: [xsl] String cleaning in XSLT and XQuery
From: James Cummings <james@xxxxxxxxxxxxxxxxx>
Date: Tue, 21 Dec 2010 11:07:13 +0000
Hiya,

Reading about processing text() reminds me of another question I've
been meaning to ask. I have an xslt function that removes all
characters but a-Z, 0-9, and changes spaces to underscores in a string
given to it.  So if I have input of:

<p> <ref>Lorem'&amp;"-=%^$%^"B#!&amp;!`B,,.;'@#~[}*() ipsum 9 dolor
sit</ref> amet.</p>

Some XSLT using it like:

<xsl:template match="tei:ref" xmlns="http://www.tei-c.org/ns/1.0";>
    <ref target="{concat('/',jc:cleanString(.),
'.html')}"><xsl:apply-templates/></ref>
</xsl:template>

<xsl:function name="jc:cleanString" as="xs:string">
    <xsl:param name="string"/>
    <xsl:variable name="cleanedString">
    <xsl:analyze-string select="lower-case(normalize-space($string))"
regex="[a-zA-Z0-9\s]+">
        <xsl:matching-substring><xsl:value-of select="translate(., '
', '_')"/></xsl:matching-substring>
        <xsl:non-matching-substring/>
    </xsl:analyze-string>
    </xsl:variable>
    <xsl:value-of select="$cleanedString"/>
 </xsl:function>

which gives me:

 <p> <ref
target="/lorem_ipsum_9_dolor_sit.html'">Lorem'&amp;"-=%^$%^"B#!&amp;!`B,,.;'@
#~[}*()
ipsum 9 dolor sit</ref> amet.</p>

which is exactly what I want. (hurrah, it works...maybe not the most
elegant, but works.)

Questions:
a) Am I doing this in a good way? Anything I've overlooked? (Always
happy to get tips on making things that already work better!)
b) I've decided I should probably replace all multiple spaces with a
single underscore...there seem to be all sorts of ways to do this,
ranging from a nested analyze-string to another function...
recommendations on how you'd do it?
c) I've got to implement *exactly* the same string transformation to
do in XQuery (identical so that the url generated matches up with the
query being done). Any suggestions on the best way to do this in
XQuery? (Or I could go ask on the XQuery list...) I don't have
analyze-string in XQuery :-(

Thanks for any suggestions.

-James

Current Thread