RE: [xsl] 10,000 document()'s

Subject: RE: [xsl] 10,000 document()'s
From: Peter Binkley <Peter.Binkley@xxxxxxxxxxx>
Date: Thu, 10 Apr 2003 12:26:51 -0600
Thanks to all who sent suggestions. It looked like David's suggestion of
using Xalan's no-caching feature would let me move forward, but the thing
still ground to a halt. So for now I'm following Charles' line and have
written a PHP script to do the job. Too bad, though; I was looking forward
to being able to say I'd done the whole job in XSL. Ultimately I see I need
to get further into JAXP and learn to do these things properly. My
conclusion is that even where XSL isn't the right tool for full-scale
production, it's an awfully handy prototyping tool.

Peter


Peter Binkley
Digital Initiatives Technology Librarian
Information Technology Services
4-30 Cameron Library
University of Alberta Libraries
Edmonton, Alberta
Canada T6G 2J8
Phone: (780) 492-3743
Fax: (780) 492-9243
e-mail: peter.binkley@xxxxxxxxxxx



> -----Original Message-----
> From: Michael Kay [mailto:mhk@xxxxxxxxx] 
> Sent: Wednesday, April 09, 2003 1:39 PM
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: RE: [xsl] 10,000 document()'s
> 
> 
> I would suggest writing a SAX filter that invokes the XSLT 
> transformations (one transformation for each file) via JAXP, 
> gets the result back in a StringWriter, and adds an element 
> containing the word count to the output stream.
> 
> Michael Kay
> Software AG
> home: Michael.H.Kay@xxxxxxxxxxxx
> work: Michael.Kay@xxxxxxxxxxxxxx 
> 
> > -----Original Message-----
> > From: owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> > [mailto:owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx] On Behalf Of 
> > Peter Binkley
> > Sent: 08 April 2003 17:06
> > To: 'xsl-list@xxxxxxxxxxxxxxxxxxxxxx'
> > Subject: [xsl] 10,000 document()'s
> > 
> > 
> > I need advice on how to tackle this problem: I've got a file
> > that contains a list of about 10,000 other files, and I want 
> > to process the list so as to add a wordcount for each of the 
> > external files. Something like this:
> > 
> > Input:
> > 
> > <files>
> > 	<file>
> > 		<filename>/path/to/file/2844942.xml</filename
> > 	<file>
> > 	<file> .... </file>
> > <files>
> > 
> > Output:
> > 
> > <files>
> > 	<file>
> > 		<filename>/path/to/file/2844942.xml</filename
> > 		<wordcount>2938</wordcount>
> > 	<file>
> > 	<file> ....	</file>
> > <files>
> > 
> > The obvious approach is to use a for-each loop that includes
> > a variable that opens the external file using a document() 
> > call. The problem is that the process inevitably runs out of 
> > memory, both with Saxon and Xalan. It seems that the 
> > variables are passing out of scope and being destroyed as 
> > they should, but I gather from a posting by Michael Kay
> > (http://www.biglist.com/lists/xsl-list/archives/200212/msg0050
> 7.html) that all of those document() source trees are 
> remaining in memory throughout the transformation, adding up 
> to megabytes of data.
> 
> Can anyone suggest a strategy? The process doesn't have to be 
> fast, it just has to finish.
> 
> Peter Binkley
> Digital Initiatives Technology Librarian
> Information Technology Services
> 4-30 Cameron Library
> University of Alberta Libraries
> Edmonton, Alberta
> Canada T6G 2J8
> Phone: (780) 492-3743
> Fax: (780) 492-9243
> e-mail: peter.binkley@xxxxxxxxxxx
> 
> 
> 
> 
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
> 
> 
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
> 

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread