Re: [xsl] [xslt performance for big xml files]

Subject: Re: [xsl] [xslt performance for big xml files]
From: Robert Koberg <rob@xxxxxxxxxx>
Date: Sun, 26 Apr 2009 06:54:09 -0400
On Apr 25, 2009, at 10:05 PM, Liam Quin wrote:

On Sat, Apr 25, 2009 at 07:16:04PM -0400, Robert Koberg wrote:
Of all the real world applications deployed that use XQuery (I suppose
I could be more specific and say as recommended by Liam, but maybe
probably not necessary), how many do you think would work on more than
one XQuery processor?

I think quite a few, although yes, you generally will have to change the collection() and document() arguments. Try creating a SQL database and querying it in Oracle, DB2, MySQL, PostgresQL and you'll generally find you have to change the code at least a little, but that does not make SQL completely non-interoperable. It's a case of managing expectations, and of "the application was ported in a week" vs "we would need to rewrite millions of lines of code from scratch".

yes, but people aren't being encouraged to write webapps in SQL. And there are tools that allow you to abstract away the differences.




[...] XQuery as used/promoted by the XML DBs tend to favor their
own extensions in documentation and lists  (though there seems to be
more caveats on the lists lately, though).

I don't actually remember which implementations I suggested -- most likely MarkLogic, Qizx and dbxml, since I've used them. I've not had major problems moving queries between them, though, once the files are indexed, which is a separate (although not unfair) question.

We didn't standardise collection() -- at some point you have to
say, "this is the scope of our spec" and stop.  Maybe for XQuery 1.1
we could consider an optional directory-of-files-as-collection()
function, but then people would say they needed options to say whether
to re-run indexes, what collation sequences and file encodings to
assume, whether to follow shortcuts and sumbolic links... and pretty
soon it'd be a huge mess.  or at least that's been a difficulty in
the past.  Relational database schemas aren't entirely portable
either, and neither are filenames (e.g. between MS Windows and
Solaris and OS X the character sets, lengths, and default encodings
differ).

You're right that extension functions are a problem -- that's true
for XSLT as well, of course, and XPath, and for that matter C and
Perl and Python....

But with XSL and XPath the extensions are really not needed and you are not steered away from the standard right away.


best,
-Rob




Liam

--
Liam Quin, W3C XML Activity Lead, http://www.w3.org/People/Quin/
http://www.holoweb.net/~liam/ * http://www.fromoldbooks.org/

Current Thread