RE: [xsl] length limitations on string input to xsl:analyze-string

Subject: RE: [xsl] length limitations on string input to xsl:analyze-string
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Mon, 27 Feb 2006 13:44:51 -0000
> I am having difficulty using xsl:analyze-string with some 
> lengthy input
> (800+ chars).  Is this a known limitation of analyze-string?

If it were a limit, it would be a limit in the implementation, not in the
spec; so you need to say which product you are using. But the error message
quoted gives a clue that it is Saxon. I'm not aware of any hard limits in
the Saxon implementation, though of course performance will depend on the
length of the string and the complexity of the regex. Any limits that there
are, of course, will be in the Java regex implementation, not in Saxon
itself. These may vary from one Java VM to another.

> I saw
> recently that the regular expression itself is limited to about 30
> characters, 

It may be good practice to try to keep your regular expressions to this size
(I often find it useful to apply several simple regular expressions rather
than one complex one, if only because a long regex quickly becomes
unreadable and undebuggable), but there's no such limit enforced by the
product.

> 
> The error message, occurring on the for-each in the following 
> code,  is:
> "SXLM0001: Too many nested apply-templates calls. The stylesheet is
> probably looping.
> Transformation failed: Run-time errors were reported"

This error message occurs when apply-templates hits a stack overflow. It's
trying to be helpful by guessing the cause, but it's only a guess and can be
wrong. 
> 
> The xsl:

     <xsl:variable name="regular_expression" select="'([a-z][a-z0-9_]*)'"
 
>    <xsl:analyze-string select="$string" flags="i"
>       regex="^{$regular_expression}\s?=\s?([^;]+)\s?;\s?(.*)$">
>     <!--  ......   limit seems to be about 814 characters.  
> The input I
> need to handle is much greater -->

Try using the simplest regex that will do the job. It looks to me as if
you're expecting an input of the form

keyword = something ; something-else

So try matching it as 

^(.*?)=(.*?);(.*)$

Then pick out the three groups and do further analysis on them with new
regular expressions.

Alternatively, this is one that could be tackled without difficulty using
substring-before() and substring-after().

Michael Kay
http://www.saxonica.com/

Current Thread