Searching huge xml-documents

Subject: Searching huge xml-documents
From: Thomas Weholt <u970130@xxxxxxxxxxxxx>
Date: Wed, 14 Apr 1999 10:53:37
hi,

I was thinking using XML as fileformat for a CD database. Each cd can
containg approx. 20000 files, like clipart or source code, and
I need a fast ( I don`t mind waiting for 5-10 secs. to search 300-500 cds (
max 20000 entries pr. cd )) method to pick out items according to a given
index. 

Something like 

<cd_doc>
	... info about the cd ...
	<entries>
	  <entry no="1" path="/cdrom/stugg/long path/more
text/python_stuff.tar.gz" ... more info .../>
	  ... 199999 or so more entries
	</entries>
</cd_doc>

I want to search by entry no or sort by any other attribute. Perhaps
genererate an word-index to speed-up the process. The reason I want to use
XML is that I use java, perl, python and other programming languages on
several platforms. XML is readable for humans and easy to put on the web.
My main consern is speed. Storagespace is not an issue. How fast is XSL?
How fast is available Java packages? Any thoughts? 

I don`t even know if this is "doable", like generating indexes etc., but
would like to use xml at least for learning purposes. 

As an experiment I created a xml-document with the structure above,
containing 90000 entries and searched for a given entry no, using Xt and a
simple xsl-stylesheet. The result was a little slow. Has anybody tested the
IBM java tools for searching, not generating html, but just looking up a
given element in a huge document, the result ( in time ) would be
interesting. Xt probably has some overhead due to the fact it`s written in
java -> starting VM and so on. If a java-app is allready running, how fast
can I locate several elements in a given xml-document?





----------------------------------------------
              Thomas Weholt
       eMail : weholt@xxxxxxxxxxxxxx
     HTTP://www.linuxfreak.com/~weholt
        Phone : +47 - 92 09 59 68
----------------------------------------------



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread