Re: [xsl] Searching for values in XML using XSL using Saxon

Subject: Re: [xsl] Searching for values in XML using XSL using Saxon
From: Michael Kay <mike@xxxxxxxxxxxx>
Date: Thu, 14 Oct 2010 13:54:41 +0100
Handling thousands of topics shouldn't be a problem; if there were millions I would consider an XML database.

Search time in the document shouldn't be a problem if you can keep it (and its indexes, whether xsl:key indexes or auto-generated Saxon indexes) in memory. But repeated loading of the document from disk every time it's needed could get very slow.

Michael Kay
Saxonica

On 14/10/2010 12:01 PM, Jacobus Reyneke wrote:
Good day,

I am trying to write a system on a pure XML data store. There are various reasons for doing this, but the most important is that I am always transforming the results, and because the system's data structure is dynamic an hierarchical, so XML is a lovely fit.

One part of my data will be large vocabularies of data, like dictionaries, and I would like to know from the experts if I'm going to run into trouble in the long term and should rather move to a relational database solution with proper indexing etc. I intend to use Saxon, simply because it's written in Java, it supports XSLT 2.0 and Michael has a good history of sticking behind his product.

Other options may be using XML databases, but the visibility provided by free standing XML files compared to an administrator console to a database is nice.

The data will look something like this:

<topic>
<name>hamburger</name>
<related-topics><topic-ref>food</topic-ref><topic-ref>dead-cows</topic-ref><topic-ref>health</topic-ref>
<keywords>burger, ketchup, mustard, hungry</keywords>
<description>Hamburgers are nice, but are not always good for your health. They are especially bad for the health of the cow, but this is o.k. if you don't know the cow</description>
</topic>

These topics will be built on the fly during chatroom conversations, so the related-topics and keywords will not be known before hand. Yet, it's the related-topics and keywords, that will be used on-the-fly to find matching topics, and format them into diargrams and charts etc.

In a couple of month's time there will be thousands of topics, so I am looking for a way to do this that will scale. Another problem is that some topics may be different in structure, e.g. a topic on cars may have a<max-speed> element, while one on houses may have a<price>, again another reason why a dynamic hierarchical data store makes more sense than a traditional relational database.

If someone can give me some advice, or suggest an efficient search on something like the keywords, I will be very grateful.

Kind regards,
Jacobus

Current Thread