Re: [xsl] [xslt performance for big xml files]

Subject: Re: [xsl] [xslt performance for big xml files]
From: Robert Koberg <rob@xxxxxxxxxx>
Date: Sun, 26 Apr 2009 06:47:58 -0400
On Apr 26, 2009, at 6:08 AM, Florent Georges wrote:


Robert Koberg wrote:


I think only considering standard XQuery only is disingenuous,
especially the way it is being presented by the vendors and
people who write about it.

I am not sure to understand what does mean "the way it is being presented by the vendors and people who write about it."


Well, the XML DB vendors pitch the XML DB as a web application platform. There is also the XRX push from some who push this as well. As soon as you get your first request parameter (or import your vendor's request namespace) you are into vendor defined extensions. This happens almost right away. It seems to me that the XQuery standard is a hook for the user that is tossed aside as soon as they are on the line.


 In my
opinion, in every languages I had the opportunity to deal with,
you have standard functionalities, and what is implementation-
defined.  And that's the responsibility of the developer to
architecture its code to isolate what is vendor-specific into
specific area of its code base.

 Whatever the vendor's samples are, how often they use vendor's
extensions, that remains the responsibility of the developer to
know what is an extension and what is standard, and use wisely
this distinction.

Of course, but we are talking about a W3 standard. I can run HTML on any browser, parse XML with any XML parser, run XSL on any XSL processor (and there is a mechanism to check for vendor extensions and choose the correct one based on the processor)





Of all the real world applications deployed that use XQuery (I
suppose I could be more specific and say as recommended by
Liam, but maybe probably not necessary), how many do you think
would work on more than one XQuery processor?

I really do not know (how could I know about "all the real world applications deployed using XQuery"?) But even if I could, that would not be an evidence of the intrinsic interoperability of the language, it would only be a clue about how it is used in that regard.

I don't know, but from what I have seen posted on pertinent
mailing lists and that XQuery as used/promoted by the XML DBs
tend to favor their own extensions in documentation and lists

I agree. I think two different things about that fact: 1/ vendor extensions are required to fulfill areas not covered in standard XQuery, and that's a good thing that vendors provide those features through extensions, as long as they keep themselves within the scope of the standard extensibility mechanisms, and 2/ those extensions are more than often overused in cases where standard XQuery expressions could have been used instead, and when needed, they are not isolated to ease porting the code to another processor.

Also, the vendor specific ones are usually better performing than there standard alternative.




 I guess that's the "database-world syndrome."  I've rarely met a
DBA who tries to make her SQL code as portable as possible, and
who cares about (over)using vendor-specific extensions.

I am not a DBA, but I use Hibernate to keep the (generated) SQL vendor agnostic. But again, XQuery is a W3 standard. I just can't help feeling there is a con going on. That is probably too strong, but I hear, "Look at my standards compliant XML DB - you can use standards based XQuery. However, to use it as we suggest, you will need to use our extensions."





(though there seems to be more caveats on the lists lately,
though).

Good ;-) Seriously, I think that's an important point, and I think that during the next couple of year, the way developers write their XQuery expression will determinate if it is important to provide interoperability facilities or instead to try to lock the users: if vendors see that this is a major concern of their users, they will provide it.

:)





Say the path/to/col contains the documents 1.xml, 2.xml and
3.xml
cross XQuery processor, what does a collection('path/to/col')/*
provide?

Once you will have defined what exactly means "path/to/col contains the documents 1.xml, 2.xml and 3.xml" on several processors, you will get the response.

 I think you have to get the problem the other way around: "I
have defined my application to use the collection mechanism to
organize my documents, how can I setup this processor to get the
correct documents when I access this or that collection?"

I don't understand the point you are making here.





 But I agree there is a set of database-related functions (as
administration, inserts, etc.) that would be valuable to everyone
to be defined in a standard way, across several XML databases. I
think EXQuery <http://www.exquery.org/> will play an important
role in that regard.

Yes and thanks for taking the lead on this.


best,
-Rob



Regards,

--
Florent Georges
http://www.fgeorges.org/

Current Thread