Re: [xsl] Sorting chemical formulae in XSLT 2.0

Subject: Re: [xsl] Sorting chemical formulae in XSLT 2.0
From: Emmanuel Bégué <eb@xxxxxxxxxx>
Date: Thu, 25 Nov 2010 10:17:58 +0100
Hello,

I think regexp would help. While it's been a while since I have had to
deal with chemical elements, and am therefore not sure I completely
understand your requirements, the following stylesheet gives the
expected result:

<xsl:template match="list">
	<xsl:for-each select="*">
		<xsl:sort select="ms:molSort2(.)"/>
		<xsl:copy-of select="."/>
		</xsl:for-each>
	</xsl:template>

<xsl:function name="ms:molSort2">
	<xsl:param name="node"/>
	<xsl:variable name="filter"><!-- take out unwanted characters and
only keep letters and numbers -->
		<xsl:analyze-string select="string($node)" regex="[A-Za-z0-9]+">
			<xsl:matching-substring>
				<xsl:value-of select="."/>
				</xsl:matching-substring>
			</xsl:analyze-string>
		</xsl:variable>
	<xsl:variable name="sortString">
		<!-- does two things: pads numbers, and transforms letters to their
code, so that at the end
		we only have a long string of numbers -->
		<xsl:analyze-string select="$filter" regex="\d+">
			<xsl:matching-substring><!-- this is a number -->
				<xsl:value-of select="format-number(number(.), '000')"/>
				</xsl:matching-substring>
			<xsl:non-matching-substring><!-- (at this point) this is a character -->
				<xsl:value-of select="string-to-codepoints(.)"/>
				</xsl:non-matching-substring>
			</xsl:analyze-string>
		</xsl:variable>
	<xsl:value-of select="$sortString"/>
	</xsl:function>

Hope this helps.
Regards,
EB


On Wed, Nov 24, 2010 at 5:55 PM, Emma Burrows <Emma.Burrows@xxxxxxxxxxx>
wrote:
> Hello,
>
> Using Saxon 9.2 and XSLT 2.0, I am currently sorting a list of chemical
formulae which appears in the following format:
>
> <list>
>  
<item1>(C<sub>19</sub>H<sub>22</sub>N<sub>2</sub>O)<sub>2</sub>,H<sub>2</sub>
SO<sub>4</sub>,7H<sub>2</sub>O</item1>
>   <item1>C<sub>4</sub>H<sub>7</sub>Cl<sub>3</sub>O<sub>2</sub></item1>
>   <item1>CHCl<sub>3</sub></item1>
>   <item1>CNa<sub>3</sub>O<sub>5</sub>P </item1>
> </list>
>
> The desired sort order is:
>
> CHCl3
> CNa3O5P
> C4H7Cl3O2
> (C19H22N2O)2,H2SO4,7H2O
>
> So the rules are
> a. ignore brackets
> b. sort letters before numbers
> c. sort numbers numerically
>
> Using the following templates, I've managed to get as far as a and b, but I
need a little help adding c to the mix:
>
> <xsl:template match="list">
>   <xsl:for-each select="item1">
>     <xsl:sort select="rps:molSort(item1)" case-order="upper-first"/>
>     <xsl:copy-of select="item1"/>
>   </xsl:for-each>
> </xsl:template>
>
> <xsl:function name="rps:molSort" as="xs:string">
>    <xsl:param name="node"/>
>    <xsl:variable name="step1" select="replace(replace($node, '\(',''),
'\)','')"/>
>    <xsl:variable name="step2" select="replace(replace($step1, '\[',''),
'\]','')"/>
>    <xsl:variable name="step3"
select="translate($step2,'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxy
z0123456789','0123456789aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ'
)"/>
>    <xsl:value-of select="$step3"/>
> </xsl:function>
>
> This produces the following output:
> CHCl3
> CNa3O5P
> (C19H22N2O)2,H2SO4,7H2O
> C4H7Cl3O2
>
> In other words, numbers are sorted as letters rather than numbers, so the
subscripts go "1 10 11 2 3.." instead of "1 2 3... 10 11". I need an
additional criterion somewhere to sort the numbers correctly but I haven't
found a solution that works yet, so a nudge in the right direction would be
great.
>
> Thank you!

Current Thread