[xsl] Sorting chemical formulae in XSLT 2.0

Subject: [xsl] Sorting chemical formulae in XSLT 2.0
From: "Emma Burrows" <Emma.Burrows@xxxxxxxxxxx>
Date: Wed, 24 Nov 2010 16:55:33 -0000
Hello,

Using Saxon 9.2 and XSLT 2.0, I am currently sorting a list of chemical
formulae which appears in the following format:

<list>
 
<item1>(C<sub>19</sub>H<sub>22</sub>N<sub>2</sub>O)<sub>2</sub>,H<sub>2</sub>
SO<sub>4</sub>,7H<sub>2</sub>O</item1>
  <item1>C<sub>4</sub>H<sub>7</sub>Cl<sub>3</sub>O<sub>2</sub></item1>
  <item1>CHCl<sub>3</sub></item1>
  <item1>CNa<sub>3</sub>O<sub>5</sub>P </item1>
</list>

The desired sort order is:

CHCl3
CNa3O5P
C4H7Cl3O2
(C19H22N2O)2,H2SO4,7H2O

So the rules are
a. ignore brackets
b. sort letters before numbers
c. sort numbers numerically

Using the following templates, I've managed to get as far as a and b, but I
need a little help adding c to the mix:

<xsl:template match="list">
  <xsl:for-each select="item1">
    <xsl:sort select="rps:molSort(item1)" case-order="upper-first"/>
    <xsl:copy-of select="item1"/>
  </xsl:for-each>
</xsl:template>

<xsl:function name="rps:molSort" as="xs:string">
   <xsl:param name="node"/>
   <xsl:variable name="step1" select="replace(replace($node, '\(',''),
'\)','')"/>
   <xsl:variable name="step2" select="replace(replace($step1, '\[',''),
'\]','')"/>
   <xsl:variable name="step3"
select="translate($step2,'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxy
z0123456789','0123456789aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ'
)"/>
   <xsl:value-of select="$step3"/>
</xsl:function>
     
This produces the following output:
CHCl3
CNa3O5P
(C19H22N2O)2,H2SO4,7H2O
C4H7Cl3O2

In other words, numbers are sorted as letters rather than numbers, so the
subscripts go "1 10 11 2 3.." instead of "1 2 3... 10 11". I need an
additional criterion somewhere to sort the numbers correctly but I haven't
found a solution that works yet, so a nudge in the right direction would be
great.

Thank you!

Current Thread