[xsl] xslt, xml, and databases - howto combine?

Subject: [xsl] xslt, xml, and databases - howto combine?
From: Graham Seaman <graham@xxxxxxxxx>
Date: Tue, 9 Apr 2002 15:58:14 +0100 (BST)
Hi all,

This question may be a little off-topic; if anyone thinks it belongs
elsewhere, please let me know of a more suitable list!

I have a web site based on numbers of relatively small xml documents
(currently stored as flat files). The documents contain a mix of simple
elements (strings, integers etc) and elements with longer stretches of
text.
Each HTML page is typically generated using xsl from a number of these
files. Given the filenames for the xml files, the xsl processing is
reasonably fast (it will never be a very high traffic site).

The problem comes with identifying the filenames in the first place.
Currently, minimal information (id, filename,date) on each file is held in
a database and a database lookup is used to select a set of files; the
filenames are passed to the xsl as a csv string. I only have access to a
standard relational db, not xml-based ones.

To do more complicated queries to select the set of files to process for a
page by anything more than date, I could: 

a) Use xsl to search through each file using document(). I believe this
would be impossibly slow (number of files is currently only around 1,000,
but it will grow).

b) Duplicate the simpler information from the xml files in the database,
and search on this to identify files, combined with a search engine
indexer to index the text in the files. I would then have duplicated data
to be kept coherent through each update; which makes the site rather
fragile.

c) Move the whole of the flat files into the database and generate the xml
from them on demand. All searches can then be directly on the database,
and there's no duplication of data. But there's an extra processing stage
needed to generate the xml for the xsl to process.

c) sounds the most robust, but both b) and c) run into a problem: I'm
trying to keep the background site code (perl) and the xsl relatively
independent of the particular xml format; for example, generating forms by
using xsl to process an xsd schema, so that schema changes don't need the
site code to be rewritten. But if I force my xml into a relational
database, I lose this independence. 
One way I can see to keep it is to assume a one-to-one correspondence
between field names and XPath expressions (eg. fields with names like
item/admin/poster/email), so that I can regenerate xml from the database
without one-off rules which break if the schema changes. But this means I
can only ever have one leaf node identified by a particular XPath
expression, which seems very limiting.
Another possibility might be to generate xml from the database in multiple
stages: first a quite flat xml document which directly reflects db field
names, which is then processed by xsl via a rule-set which converts the
first stage xml to xml as defined in the schema. Yet another layer of
processing...

Am I missing other ways round this? Is there some standard way to deal
with this problem? Or are pure xml databases the only way to avoid
kludges like this?

Thanks for any advice
Graham


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread