[xsl] XML Schema union type is evil (for XPath 2.0 processing)

Subject: [xsl] XML Schema union type is evil (for XPath 2.0 processing)
From: "Costello, Roger L." <costello@xxxxxxxxx>
Date: Thu, 10 Apr 2008 08:50:26 -0400
Hi Folks,

In Michael Kay's book, XPath 2.0 (p. 259 and 289), he gives 3 cases
where the use of a union type can lead to problems.  Here are the 3
cases:

1. Consider this <prices> element which contains a list of prices
(decimal values), and, if no price is available "N/A" is listed:

   <prices>40.99 19.00 N/A 23.80</prices>

Each list value is either a decimal or the string value, "N/A".  That
is, each list value is a union of:

    - xs:decimal
    - a simpleType with enumeration value of "N/A"

Now suppose that I want to write an XPath expression to see if there
are some prices over 30.00.  Here's one way to express it:

    if (some $i in data(prices) satisfies $i gt 30.00) then
       'Expensive stuff'
    else 'Cheap stuff'

I ran this XPath using SAXON and got this output: Expensive stuff.

Then I changed the input by swapping the first list value with the N/A
value:

   <prices>N/A 19.00 40.99 23.80</prices>

I ran the same XPath against this input, using the same SAXON processor
and I got an error message saying that I can't compare the string "N/A"
against the decimal 30.00

So, depending on the "order" of the input data I get a successful
result or an error!

Furthermore, even with the first version of the input:

   <prices>40.99 19.00 N/A 23.80</prices>

I may, or may not get an error.  SAXON evaluates the list values from
left to right, and stops as soon as it finds a true value (40.99 gt
30.00 is true, so it stops).  XPath processors are free to evaluate the
list values in any order.  So, another XPath processor may evaluate the
list values from right-to-left, and give an error.

Recap:

(a) You may, or may not, get an error depending on the order of the
list values.

(b) You may, or may not, get an error depending on the XPath processor
that you use.

The good news is that there is a way to protect yourself against this
problem:

    if (some $i in data(prices)[. instance of xs:decimal] satisfies $i
gt 30.00) then
       'Expensive stuff'
    else 'Cheap stuff'

The predicate will filter the "N/A" list value, and so there will never
arise the situation where "N/A" is compared against 30.00

2. There is the same problem when using the "every" expression, e.g.

    if (every $i in data(prices) satisfies $i lt 30.00) then
       'Buy at this store'
    else 'Shop elsewhere'

With this input:

   <prices>40.99 19.00 N/A 23.80</prices>

SAXON gives this output: Shop elsewhere

With this input (swap the first list value with "N/A"):

   <prices>N/A 19.00 40.99 23.80</prices>

SAXON generates an error.

Again, it is possible to protect yourself:

    if (every $i in data(prices)[. instance of xs:decimal] satisfies $i
lt 30.00) then
       'Buy at this store'
    else 'Shop elsewhere'

3. Next, consider a <quantity> element whose value is either a number
or the string "out-of-stock".  Here are two examples:

   <quantity>out-of-stock</quantity>

   <quantity>20</quantity>

The value is either a number or the string value "out-of-stock".  That
is, the value is a union of:

    - xs:nonNegativeInteger
    - a simpleType with enumeration value of "out-of-stock"

Now, suppose I want to write an XPath expression to see if the quantity
is out-of-stock:

    if (data(quantity) eq 'out-of-stock') then
        'Bummer'
    else
        'Buy them all!'

With the first example above as input, the output is: Bummer

With the second example as input, an error is generated.

Again, there is a way to protect yourself:

    if (data(quantity) instance of xs:string) then
       if (data(quantity) eq 'out-of-stock') then
          'Bummer'
        else
            "Something is screwed up in the input"
    else
        'Buy them all!'

SUMMARY

1. Input data that contains union values must be dealt with carefully.

2. If you don't design the XPath to protect yourself, then your XPath
may succeed with some inputs and fail with others; it may succeed with
some XPath processors and fail with others.

QUESTIONS

1. While it is possible to write XPath expressions to "protect
yourself" it is, I think, likely that people will either:

   - forget to do so
   - not know how to do so
   - not be aware of the problem with union types

What's Best Practice?  Perhaps Best Practice is: "Avoid using union
types."

What do you think?

2. Are there other cases where the union type presents a problem?  (I
haven't yet read all of Michael's book, so there may be other cases he
identifies in his book that I haven't yet read)

/Roger

Current Thread