|
Subject: Re: [xsl] Sorting chemical formulae in XSLT 2.0 From: Emmanuel Bégué <eb@xxxxxxxxxx> Date: Thu, 25 Nov 2010 10:17:58 +0100 |
Hello,
I think regexp would help. While it's been a while since I have had to
deal with chemical elements, and am therefore not sure I completely
understand your requirements, the following stylesheet gives the
expected result:
<xsl:template match="list">
<xsl:for-each select="*">
<xsl:sort select="ms:molSort2(.)"/>
<xsl:copy-of select="."/>
</xsl:for-each>
</xsl:template>
<xsl:function name="ms:molSort2">
<xsl:param name="node"/>
<xsl:variable name="filter"><!-- take out unwanted characters and
only keep letters and numbers -->
<xsl:analyze-string select="string($node)" regex="[A-Za-z0-9]+">
<xsl:matching-substring>
<xsl:value-of select="."/>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:variable>
<xsl:variable name="sortString">
<!-- does two things: pads numbers, and transforms letters to their
code, so that at the end
we only have a long string of numbers -->
<xsl:analyze-string select="$filter" regex="\d+">
<xsl:matching-substring><!-- this is a number -->
<xsl:value-of select="format-number(number(.), '000')"/>
</xsl:matching-substring>
<xsl:non-matching-substring><!-- (at this point) this is a character -->
<xsl:value-of select="string-to-codepoints(.)"/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:variable>
<xsl:value-of select="$sortString"/>
</xsl:function>
Hope this helps.
Regards,
EB
On Wed, Nov 24, 2010 at 5:55 PM, Emma Burrows <Emma.Burrows@xxxxxxxxxxx>
wrote:
> Hello,
>
> Using Saxon 9.2 and XSLT 2.0, I am currently sorting a list of chemical
formulae which appears in the following format:
>
> <list>
>
<item1>(C<sub>19</sub>H<sub>22</sub>N<sub>2</sub>O)<sub>2</sub>,H<sub>2</sub>
SO<sub>4</sub>,7H<sub>2</sub>O</item1>
> <item1>C<sub>4</sub>H<sub>7</sub>Cl<sub>3</sub>O<sub>2</sub></item1>
> <item1>CHCl<sub>3</sub></item1>
> <item1>CNa<sub>3</sub>O<sub>5</sub>P </item1>
> </list>
>
> The desired sort order is:
>
> CHCl3
> CNa3O5P
> C4H7Cl3O2
> (C19H22N2O)2,H2SO4,7H2O
>
> So the rules are
> a. ignore brackets
> b. sort letters before numbers
> c. sort numbers numerically
>
> Using the following templates, I've managed to get as far as a and b, but I
need a little help adding c to the mix:
>
> <xsl:template match="list">
> <xsl:for-each select="item1">
> <xsl:sort select="rps:molSort(item1)" case-order="upper-first"/>
> <xsl:copy-of select="item1"/>
> </xsl:for-each>
> </xsl:template>
>
> <xsl:function name="rps:molSort" as="xs:string">
> <xsl:param name="node"/>
> <xsl:variable name="step1" select="replace(replace($node, '\(',''),
'\)','')"/>
> <xsl:variable name="step2" select="replace(replace($step1, '\[',''),
'\]','')"/>
> <xsl:variable name="step3"
select="translate($step2,'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxy
z0123456789','0123456789aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ'
)"/>
> <xsl:value-of select="$step3"/>
> </xsl:function>
>
> This produces the following output:
> CHCl3
> CNa3O5P
> (C19H22N2O)2,H2SO4,7H2O
> C4H7Cl3O2
>
> In other words, numbers are sorted as letters rather than numbers, so the
subscripts go "1 10 11 2 3.." instead of "1 2 3... 10 11". I need an
additional criterion somewhere to sort the numbers correctly but I haven't
found a solution that works yet, so a nudge in the right direction would be
great.
>
> Thank you!
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| Re: [xsl] Sorting chemical formulae, David Carlisle | Thread | RE: [xsl] Sorting chemical formulae, Emma Burrows |
| [xsl] Localisation using xslt, Dave Pawson | Date | RE: [xsl] Sorting chemical formulae, Emma Burrows |
| Month |