RE: Searching huge xml-documents

Subject: RE: Searching huge xml-documents
From: "Ed Nixon" <ed.nixon@xxxxxxxxxxxxxxxxx>
Date: Wed, 14 Apr 1999 20:37:09 -0400
This will get you there and supply you with some added value:
http://metalab.unc.edu/xql/


> -----Original Message-----
> From: owner-xsl-list@xxxxxxxxxxxxxxxx
> [mailto:owner-xsl-list@xxxxxxxxxxxxxxxx]On Behalf Of Rick Geimer
> Sent: Wednesday, April 14, 1999 8:05 PM
> To: xsl-list@xxxxxxxxxxxxxxxx
> Subject: Re: Searching huge xml-documents
>
>
> Could you post the URL, if there is one. I would be very
> interested in being
> able to store and query a DOM on secondary storage.
>
> Rick Geimer
> National Semiconductor
> rick.geimer@xxxxxxx
>
> Ed Nixon wrote:
>
> > There was a posting on Robin Cover's XML News site last
> week or the week
> > before about an implementation of XQL from folks in
> Darmstadt. They have
> > implemented an 'compliation' mechanism that takes the DOM
> tree, indexes it
> > and writes to disk. At that point it's possible to run XQL
> against this
> > file, either in memory or cached to disk.
> >
> > Perhaps this would be worth a look?
>     ...edN
> >
> > > -----Original Message-----
> > > From: owner-xsl-list@xxxxxxxxxxxxxxxx
> > > [mailto:owner-xsl-list@xxxxxxxxxxxxxxxx]On Behalf Of Thomas Weholt
> > > Sent: Wednesday, April 14, 1999 6:54 AM
> > > To: xsl-list@xxxxxxxxxxxxxxxx
> > > Subject: Searching huge xml-documents
> > >
> > >
> > > hi,
> > >
> > > I was thinking using XML as fileformat for a CD database.
> Each cd can
> > > containg approx. 20000 files, like clipart or source code, and
> > > I need a fast ( I don`t mind waiting for 5-10 secs. to search
> > > 300-500 cds (
> > > max 20000 entries pr. cd )) method to pick out items
> > > according to a given
> > > index.
> > >
> > > Something like
> > >
> > > <cd_doc>
> > >       ... info about the cd ...
> > >       <entries>
> > >         <entry no="1" path="/cdrom/stugg/long path/more
> > > text/python_stuff.tar.gz" ... more info .../>
> > >         ... 199999 or so more entries
> > >       </entries>
> > > </cd_doc>
> > >
> > > I want to search by entry no or sort by any other
> attribute. Perhaps
> > > genererate an word-index to speed-up the process. The reason
> > > I want to use
> > > XML is that I use java, perl, python and other programming
> > > languages on
> > > several platforms. XML is readable for humans and easy to put
> > > on the web.
> > > My main consern is speed. Storagespace is not an issue. How
> > > fast is XSL?
> > > How fast is available Java packages? Any thoughts?
> > >
> > > I don`t even know if this is "doable", like generating
> > > indexes etc., but
> > > would like to use xml at least for learning purposes.
> > >
> > > As an experiment I created a xml-document with the
> structure above,
> > > containing 90000 entries and searched for a given entry no,
> > > using Xt and a
> > > simple xsl-stylesheet. The result was a little slow. Has
> > > anybody tested the
> > > IBM java tools for searching, not generating html, but just
> > > looking up a
> > > given element in a huge document, the result ( in time ) would be
> > > interesting. Xt probably has some overhead due to the fact
> > > it`s written in
> > > java -> starting VM and so on. If a java-app is allready
> > > running, how fast
> > > can I locate several elements in a given xml-document?
> > >
> > >
> > >
> > >
> > >
> > > ----------------------------------------------
> > >               Thomas Weholt
> > >        eMail : weholt@xxxxxxxxxxxxxx
> > >      HTTP://www.linuxfreak.com/~weholt
> > >         Phone : +47 - 92 09 59 68
> > > ----------------------------------------------
> > >
> > >
> > >
> > >  XSL-List info and archive:
> http://www.mulberrytech.com/xsl/xsl-list
> > >
> >
> >  XSL-List info and archive:
> http://www.mulberrytech.com/xsl/xsl-list
>
>
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
>


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread