Re: [xsl] Proposed syntax for namespace binding in XPath

Subject: Re: [xsl] Proposed syntax for namespace binding in XPath
From: Abel Braaksma <abel.online@xxxxxxxxx>
Date: Thu, 19 Apr 2007 18:56:41 +0200
Michael Kay wrote:
The basic idea is simple:

("your special syntax") and rest/of/expr ("your special syntax")[2] | rest/of/expr

For example, with namespace binding syntax, this could become:

("xmlns(xmlns=http://mynamespace xmlns:you=http://yournamespace)")[2] | path/with/you:your-namespace

In interesting idea. But I can think of some further drawbacks:


1. Not easy to recognize when the syntax is in use (applies to human readers
as well as XPath processors)

2. Impossible to generate error messages: difficult to diagnose mistakes

3. Is there really such a thing as a no-op? Your examples aren't: the first
example changes EXP to boolean(EXP) and the second causes an error if the
expression delivers atomic values rather than nodes.

An adaptation that eliminates these disadvantages would be:

saxon:namespaces("xmlns=abc.uri xmlns:p=pqr.uri"), EXPR

(I think that E1,E2 where E1 is an empty sequence really IS a no-op)

But it feels a bit like an abuse.

Indeed.


Summarizing so far, this thread has shown three basic ideas (not counting my own):

1. Use an extension function (main drawback: the extension function namespace must first be declared)
2. Introduce special syntax that is automatically ignored by current implementation (i.e., with special comments)
3. Using a way of an existing standard, for instance, XPointer syntax, i.e., with xmlns(ns=http://namespace)


There were more approaches, but I believe each can be categorized under 1, 2 or 3 above. From what I've read, the tendency is (still) towards the second category, introducing special syntax that would only be recognized by processors that know of them, and which others would ignore without warning or errors, thus keeping the xpath expressions portable.

I have one general remark and one additional proposal (along the lines of my earlier, rather awkward proposal, but differently so).

REMARK.: A lot of the discussion is about letting non-understanding XPath processors use this new syntax without errors. But, come to think of it, if you declare a namespace this way, I assume you want to use the namespace prefix in the rest of your expression. If non-understanding XPath processors encounter these then unbound prefixes, they will throw errors about the undeclared prefixes. And even without errors, there will be no matches because the prefixes are not bound. So, if one uses this proposed extension, it will only be feasible on extension-enabled processors and others should *not* ignore it, but error on it. Shouldn't they?

PROPOSAL: Based on the remark, a whole host of syntaxes comes to mind, and perhaps the xmlns(....) is the easiest to adopt. Though, if the remark is not valid, I figured an improvement of the no-op approach, based on MK's note above of a no-op really being a no-op if the result is an empty sequence:

(xmlns.ns = 'http://www.test.com')[0] , ...xpathexpression...

The xmlns.ns will never select anything, because it is not allowed to have any element name in XML starting with [Xx][Mm][Ll]. The dot should ideally be replaced with a colon, but that creates a non-conformant expression (it is not allowed to bind the xmlns prefix to anything or to use it), but perhaps this can be overcome, I wouldn't know.

The rest of the expression speaks for itself: with non-understanding XPath processors, it will yield a sequence with a value of false(), of which you select nothing (the zero-predicate), yielding an empty sequence. The comma operator makes sure the rest of the xpath expression will be evaluated normally. The result can be anything, the first part won't intrude that.

I believe this syntax solves two problems not solved with earlier solutions in category 2:

1. Existing syntax checkers will show you when you have an error, because it uses basic XPath syntax
2. It is easily readable and parsable (not sure) and has no side-effects on non-understanding processors


Still, I'd opt for the extension function or similar, because having an error in non-understanding processors would be better than having an expression that selects nothing or errs on the unbound ns prefixes, methinks.

Cheers,
-- Abel Braaksma

Current Thread