Re: [xsl] Searching for values in XML using XSL using Saxon

Subject: Re: [xsl] Searching for values in XML using XSL using Saxon
From: Jacobus Reyneke <jacobusreyneke@xxxxxxxxx>
Date: Thu, 14 Oct 2010 17:34:04 +0200
Thank you both Michael and PQQP5QP;P0P2,

You answered my main question: From a design point it would appear that XML as
a datastore is viable and that if someday Saxon should start feeling the load,
then I can refactor the architecture to use exist-db for enormous data sets.

One question regarding keys and indexing, is it possible to index comma
separated values such as my keywords list, or should each keyword by a
separate element?

Thank you kindly for the help,
Jacobus

On 14 Oct 2010, at 2:58 PM, PQQP5QP;P0P2 P!P5P4P>P2 wrote:

> 2010/10/14 Michael Kay <mike@xxxxxxxxxxxx>:
>>  Handling thousands of topics shouldn't be a problem; if there were
millions
>> I would consider an XML database.
>
> http://exist-db.org - very good case
>
>>
>> Search time in the document shouldn't be a problem if you can keep it (and
>> its indexes, whether xsl:key indexes or auto-generated Saxon indexes) in
>> memory. But repeated loading of the document from disk every time it's
>> needed could get very slow.
>>
>> Michael Kay
>> Saxonica
>>
>> On 14/10/2010 12:01 PM, Jacobus Reyneke wrote:
>>>
>>> Good day,
>>>
>>> I am trying to write a system on a pure XML data store. There are various
>>> reasons for doing this, but the most important is that I am always
>>> transforming the results, and because the system's data structure is
dynamic
>>> an hierarchical, so XML is a lovely fit.
>>>
>>> One part of my data will be large vocabularies of data, like
dictionaries,
>>> and I would like to know from the experts if I'm going to run into
trouble
>>> in the long term and should rather move to a relational database solution
>>> with proper indexing etc. I intend to use Saxon, simply because it's
written
>>> in Java, it supports XSLT 2.0 and Michael has a good history of sticking
>>> behind his product.
>>>
>>> Other options may be using XML databases, but the visibility provided by
>>> free standing XML files compared to an administrator console to a
database
>>> is nice.
>>>
>>> The data will look something like this:
>>>
>>> <topic>
>>> <name>hamburger</name>
>>>
>>>
<related-topics><topic-ref>food</topic-ref><topic-ref>dead-cows</topic-ref><t
opic-ref>health</topic-ref>
>>> <keywords>burger, ketchup, mustard, hungry</keywords>
>>> <description>Hamburgers are nice, but are not always good for your
health.
>>> They are especially bad for the health of the cow, but this is o.k. if
you
>>> don't know the cow</description>
>>> </topic>
>>>
>>> These topics will be built on the fly during chatroom conversations, so
>>> the related-topics and keywords will not be known before hand. Yet, it's
the
>>> related-topics and keywords, that will be used on-the-fly to find
matching
>>> topics, and format them into diargrams and charts etc.
>>>
>>> In a couple of month's time there will be thousands of topics, so I am
>>> looking for a way to do this that will scale. Another problem is that
some
>>> topics may be different in structure, e.g. a topic on cars may have
>>> a<max-speed>  element, while one on houses may have a<price>, again
another
>>> reason why a dynamic hierarchical data store makes more sense than a
>>> traditional relational database.
>>>
>>> If someone can give me some advice, or suggest an efficient search on
>>> something like the keywords, I will be very grateful.
>>>
>>> Kind regards,
>>> Jacobus

Current Thread