RE: [xsl] 10,000 document()'s

Subject: RE: [xsl] 10,000 document()'s
From: "Michael Kay" <mhk@xxxxxxxxx>
Date: Wed, 9 Apr 2003 20:38:42 +0100
I would suggest writing a SAX filter that invokes the XSLT
transformations (one transformation for each file) via JAXP, gets the
result back in a StringWriter, and adds an element containing the word
count to the output stream.

Michael Kay
Software AG
home: Michael.H.Kay@xxxxxxxxxxxx
work: Michael.Kay@xxxxxxxxxxxxxx 

> -----Original Message-----
> From: owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx 
> [mailto:owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx] On Behalf Of 
> Peter Binkley
> Sent: 08 April 2003 17:06
> To: 'xsl-list@xxxxxxxxxxxxxxxxxxxxxx'
> Subject: [xsl] 10,000 document()'s
> 
> 
> I need advice on how to tackle this problem: I've got a file 
> that contains a list of about 10,000 other files, and I want 
> to process the list so as to add a wordcount for each of the 
> external files. Something like this:
> 
> Input:
> 
> <files>
> 	<file>
> 		<filename>/path/to/file/2844942.xml</filename
> 	<file>
> 	<file> .... </file>
> <files>
> 
> Output:
> 
> <files>
> 	<file>
> 		<filename>/path/to/file/2844942.xml</filename
> 		<wordcount>2938</wordcount>
> 	<file>
> 	<file> ....	</file>
> <files>
> 
> The obvious approach is to use a for-each loop that includes 
> a variable that opens the external file using a document() 
> call. The problem is that the process inevitably runs out of 
> memory, both with Saxon and Xalan. It seems that the 
> variables are passing out of scope and being destroyed as 
> they should, but I gather from a posting by Michael Kay
> (http://www.biglist.com/lists/xsl-list/archives/200212/msg0050
7.html) that all of those document() source trees are remaining in
memory throughout the transformation, adding up to megabytes of data.

Can anyone suggest a strategy? The process doesn't have to be fast, it
just has to finish.

Peter Binkley
Digital Initiatives Technology Librarian
Information Technology Services
4-30 Cameron Library
University of Alberta Libraries
Edmonton, Alberta
Canada T6G 2J8
Phone: (780) 492-3743
Fax: (780) 492-9243
e-mail: peter.binkley@xxxxxxxxxxx




 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread