[xsl] XML Schema Design for Effective Utilization by XSLT and XPath

Subject: [xsl] XML Schema Design for Effective Utilization by XSLT and XPath
From: "Costello, Roger L." <costello@xxxxxxxxx>
Date: Fri, 29 Feb 2008 09:07:11 -0500
Hi Folks,

Below are some thoughts on how to design an XML Schema so that it can
be effectively utilized by XSLT and XPath.  I would appreciate your
comments, particularly on any errors, and on poor/incorrect wording.
Thanks!  /Roger

-------------------------------------------------------------
XML SCHEMA DESIGN FOR EFFECTIVE UTILIZATION BY XSLT AND XPATH

XSLT and XPath can leverage the information in an XML Schema when
processing input documents.

However, leveraging the information in an XML Schema is possible only
if the schema is designed in a certain fashion.  Only globally declared
elements and elements with named types can be leveraged.  Elements that
are locally declared and have anonymous types cannot be leveraged.
(Note: these comments also apply to attributes)

For example, note the declaration for Publisher in this schema:

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema";
           elementFormDefault="qualified">

    <xs:element name="Book">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="Title" type="xs:string"/>
                <xs:element name="Author" type="xs:string"/>
                <xs:element name="Date" type="xs:gYear"/>
                <xs:element name="ISBN" type="xs:string"/>
                <xs:element name="Publisher">
                    <xs:simpleType>
                        <xs:restriction base="xs:string">
                            <xs:enumeration value="O'Reilly Media,
Inc." />
                            <xs:enumeration value="Simon &amp;
Schuster" />
                            <xs:enumeration value="Three River Press"
/>
                            <xs:enumeration value="W. W. Norton &amp;
Company, Inc." />
                            <xs:enumeration value="Harvard Business
School Press" />
                            <xs:enumeration value="Random House Trade
Paperbacks" />
                            <xs:enumeration value="Wrox Press" />
                            <xs:enumeration value="Prentice Hall" />
                            <xs:enumeration value="McMillan Publishing"
/>
                        </xs:restriction>
                    </xs:simpleType>
                </xs:element>
            </xs:sequence>
        </xs:complexType>
    </xs:element>

</xs:schema>

Publisher is declared using an inline element declaration and has an
anonymous type.  XSLT and XPath will not be able to leverage this
element declaration.

All the other elements can be leveraged: the Book element is globally
declared, and Author, Title, Date, and ISBN all have named types.

There are two ways to redesign the schema so that Publisher can be
leveraged by XSLT and XPath.

Design 1: make Publisher global.

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema";
           elementFormDefault="qualified">

    <xs:element name="Book">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="Title" type="xs:string"/>
                <xs:element name="Author" type="xs:string"/>
                <xs:element name="Date" type="xs:gYear"/>
                <xs:element name="ISBN" type="xs:string"/>
                <xs:element ref="Publisher"/>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
    <xs:element name="Publisher">
        <xs:simpleType>
            <xs:restriction base="xs:string">
                <xs:enumeration value="O'Reilly Media, Inc." />
                <xs:enumeration value="Simon &amp; Schuster" />
                <xs:enumeration value="Three River Press" />
                <xs:enumeration value="W. W. Norton &amp; Company,
Inc." />
                <xs:enumeration value="Harvard Business School Press"
/>
                <xs:enumeration value="Random House Trade Paperbacks"
/>
                <xs:enumeration value="Wrox Press" />
                <xs:enumeration value="Prentice Hall" />
                <xs:enumeration value="McMillan Publishing" />
            </xs:restriction>
        </xs:simpleType>
    </xs:element>

</xs:schema>

Here is an example of an XPath statement that leverages the Publisher
element declaration:

if ($p instance of schema-element(Publisher)) then
      concat('The publisher is: ', data($p))
else
      error()

Read as: "if the value of the 'p' variable validates against the
Publisher element declaration in the XML Schema then concatentate the
string 'The publisher is: ' with the variable's value, otherwise
generate an error."

Design 2: give Publisher a named type.

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema";
           elementFormDefault="qualified">

    <xs:element name="Book">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="Title" type="xs:string"/>
                <xs:element name="Author" type="xs:string"/>
                <xs:element name="Date" type="xs:gYear"/>
                <xs:element name="ISBN" type="xs:string"/>
                <xs:element name="Publisher" type="publishers"/>
            </xs:sequence>
        </xs:complexType>
    </xs:element>

    <xs:simpleType name="publishers">
        <xs:restriction base="xs:string">
            <xs:enumeration value="O'Reilly Media, Inc." />
            <xs:enumeration value="Simon &amp; Schuster" />
            <xs:enumeration value="Three River Press" />
            <xs:enumeration value="W. W. Norton &amp; Company, Inc." />
            <xs:enumeration value="Harvard Business School Press" />
            <xs:enumeration value="Random House Trade Paperbacks" />
            <xs:enumeration value="Wrox Press" />
            <xs:enumeration value="Prentice Hall" />
            <xs:enumeration value="McMillan Publishing" />
        </xs:restriction>
    </xs:simpleType>

</xs:schema>

Here is an example of an XPath statement that leverages the Publisher
element:

if ($p instance of element(Publisher, publishers)) then
      concat('The publisher is: ', data($p))
else
      error()

Read as: "if the value of the 'p' variable is an element and has the
type 'publishers' then concatentate the string 'The publisher is: '
with the variable's value, otherwise generate an error."

Lesson Learned: when designing an XML Schema, to facilitate its use by
XSLT and XPath, avoid anonymous, local element declarations.

Current Thread