RE: [xsl] Memory problem when stokenize big data

Subject: RE: [xsl] Memory problem when stokenize big data
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Tue, 10 Jan 2006 14:44:39 -0000
The str:tokenize() function defined in EXSLT constructs a tree containing
one element for each token. Unless the implementation is clever enough to
construct a virtual or lazy tree, this is going to take a fair bit of
memory.

By contrast, the XPath 2.0 tokenize() function returns a sequence of
strings, and it's a reasonable bet that any decent implementation is going
to be pipelined, so that it reads off the tokens one at a time as they are
needed.

Michael Kay
http://www.saxonica.com/  

> -----Original Message-----
> From: Richard Zhang [mailto:richard_zhang@xxxxxxxxxx] 
> Sent: 10 January 2006 14:30
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] Memory problem when stokenize big data
> 
> Thanks for your reply to my prior question about breaking 
> down strings.
> 
> Now I am trying to use stokenize to breakdown a big data.
> 
> The input big data is like:
> 
>           <textdata sep=" &#x000A;&#x000D;">
>             5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9
>             ...
>             ...
>           </textdata>
>             ...
>             ...
>           <textdata sep=" &#x000A;&#x000D;">
>             5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9
>             ...
>             ...
>           </textdata>
> 
> and my xsl template is like:
> 
>   <xsl:template match="textdata">
>     <data>
>     <xsl:for-each select="str:tokenize(.,' &#x000A;&#x000D;')">
>       <e>
>       <xsl:value-of select="."/>
>       </e>
>     </xsl:for-each>
>     </data>
>   </xsl:template>
> 
> The textdata can be very big. My question is, will the 
> stokenzing have 
> problem when handling big data? if yes, how big is the data 
> that stokenize 
> can handle? I ran the transformation in Jbuilder and it shows 
> some '10mb 
> help left' problem.
> 
> Thanks a lot.
> Richard 

Current Thread