[xsl] Different regex behaviour on Windows & Linux using Saxon

Subject: [xsl] Different regex behaviour on Windows & Linux using Saxon
From: AAS Contractor <AAS.Contractor@xxxxxxx>
Date: Thu, 30 Aug 2007 14:41:02 +0100
(I have posted this to the Saxon help forum on sourceforge, but thought 
I'd also ask here in case it is not a Saxon-specific problem, but 
something more general that I'm, missing.)

I have a strange problem here. I am using SaxonB 8.9 java version on both 
Linux and Windows. The code is being developed on a Win PC but will 
eventually run in a production environment on a Linux box. However, I get 
different outputs from the same stylesheet depending on which machine I 
run it on. The input is something like  
 
<kwd>stars: individual (RX J0052.9-7158, 2E0053.7-7227, SMC X-2)</kwd> 
 
The desired output for this would be 
 
<kwd>stars: individual<ind>RX 
J0052.9-7158</ind><ind>2E0053.7-7227</ind><ind>SMC X-2</ind></kwd> 
 
And the relevant code I am using is 
 
<xsl:analyze-string select="." regex="\s*\(([^)]+)\)\s*"> 
<xsl:matching-substring> 
<xsl:for-each select="tokenize(regex-group(1),'\s*,\s*')"> 
<ind><xsl:value-of select="."/></ind> 
</xsl:for-each> 
</xsl:matching-substring> 
<xsl:non-matching-substring> 
<xsl:value-of select="."/> 
</xsl:non-matching-substring> 
</xsl:analyze-string> 
 
which does indeed produce the desired output on both platforms. However, 
if the string being matched contains an entity or character reference, the 
string will still be matched on the Windows machine but not on the Linux 
one! eg. 
 
<kwd>stars: individual (RX J0052.9&#8722;7158, 2E0053.7-7227, SMC 
X-2)</kwd> 
 
and 
 
<kwd>stars: individual (RX J0052.9&minus;7158, 2E0053.7-7227, SMC 
X-2)</kwd> 
 
 
produce output of  
 
<kwd>stars: individual<ind>RX 
J0052.9&#8722;7158</ind><ind>2E0053.7-7227</ind><ind>SMC X-2</ind></kwd> 
 
on the Win box but are not matched on the Linux box and passed out as the 
non-matching-substring, eg 
 
<kwd>stars: individual (RX J0052.9&#8722;7158, 2E0053.7-7227, SMC 
X-2)</kwd> 
 
Has anyone got a clue as to why this is happening? 
 
cheers, 
 
Bruce  

************************************************************************
This email (and attachments) are confidential and intended for the addressee(s) only. If you are not the intended recipient please notify the sender, delete any copies and do not take action in reliance on it. Any views expressed are the author's and do not represent those of IOP, except where specifically stated. IOP takes reasonable precautions to protect against viruses but accepts no responsibility for loss or damage arising from virus infection. For the protection of IOP's systems and staff emails are scanned automatically.

IOP Publishing Limited Registered in England under Registration No 467514. Registered Office: Dirac House, Temple Back, Bristol BS1 6BE England
Vat No GB 461 6000 84.

Current Thread