RE: [xsl] keys and idrefs - XSLT2 request?

Subject: RE: [xsl] keys and idrefs - XSLT2 request?
From: Joerg Pietschmann <joerg.pietschmann@xxxxxx>
Date: Wed, 10 Oct 2001 19:46:21 +0200
DPawson@xxxxxxxxxxx wrote
> [i wrote]
> > I'm proposing to drop even more complexity from the standards
> > instead of piling it up.
> ? Or put it in a more appropriate place?

That's a way to express it. There is no way to make complexity
magically go away, but rearranging concepts and developing higher
level patterns is a quite useful strategy to deal with it.

>  The latter will make the software
> > also more complex, more expensive, late on market and buggy.
> Already happening with some of the xml standards. Schema for instance?

Yeah, XSchema is bloated. It's not easy to please everyone who
wants his view of what's a universal datatype to be included.
OTOH, it could be worse. In fact, much worse:
- SQL3
- ALGOL68
- CORBA Common Services
This is known as the Design-By-Committee Anti Pattern...
If this weren't the XSL list, i would add HyTime and DSSSL... <duck/>

> > 4. The concept of lookup subsets of the data defined by a rule. This
> >   is xsl:key plus key(). This reminds more of relational databases,
> >  for a good reason.
> Is this where your 'bias' comes from?

What bias? That i'm interpreting XML not only as a form to put down
documents intended for human reading? No offence taken, just need
clarification.
In our project we use XML for specifying the interfaces between components
at a high level. From this specification we derive both the human
readable documentation and soure code for compilers and configuration
generators for various physical implementations of the data transport.
We use XML because
- We can tailor the structure and content constraints of the spec to
  our needs
- We can use XSL to implement the generators which create the HTML
  for the browser as well as the source code (IDL, DTDs). We could
  use XSL to create class diagrams of our business objects in SVG,
  or XMI for loading class and deployment diagrams into design tools
  like Rational Rose. Now that's REAL power. I like it. No longer
  being stuck with inferior tools :-)

When i'm teaching XML to web designers i tell them that XML+XSL
separates content (for example news stories) from layout (all the
navigation and blinking banner stuff around the text), and it
can also be used to automatically generate links and overview
pages.
When i'm telling e-Business students about XML, i say "You have
the essential data of a bill in XML, and then you can:
- feed it into your XML aware software for automatic accounting
- view it in a browser so that you can manually type it into
  your old accounting software
- make WML and transfer it to your mobile so that you can already
  get a hickup while still on the way to office"
Sometimes they get a grasp on what B2B really means.

> I've never used a data base
> directly, all my document processing has been from 'document' based
> content. I can see the differentiation.

You don't have to. But if you think of tree nodes as records in a DB
table, a key looks suddenly like a parametrized SQL statement.

>   Your 'validation' comes from the fact that the database is
>   rigorous in its checking.

You are interpreting my words too much. In fact, i don't store any
data from the XML documents i'm working on in a DB.

For me, DTD/Schema validation is an optional step performed by
standard software which tells whether a document fits certain rules.
You can't express every rule in a DTD/Schema. For example, in
a bill you can define the elements "issuedOn" and "payableUntil"
to be dates (using XSchema), but you can't say that "issuedOn"
must be earlier than "payableTo". Nevertheless, the validation step
could relieve some burden from the application which processes XML data,
even though in real business applications another verification layer
is needed.
DTD/XSchema also improves the productivity of programmers: instead
of writing a program which checks the structure and types, one can
declare how it should look like. Yeah, XSchema is almost
incomprehensible in it's XML form, but there are already tools
where you can design a structure graphically.

>    I need the same for my source documents, mainly hand generated.

Yes, validation is intended to be a comfortable way to catch many
errors so that you can concentrate on higher levels of procesing.
For human readable documents, this "higher levels of processing"
are esentially built in into standard software, like a browser, so
you don't have to worry about the tedious writing of buggy programs
at all :-)

> > 6. The concept of assigning various semantics to the data. There are
> >   enough standards proposing vocabularies, however, some are more
> >   generals than others. There are also many overlapping developments.
> 
> Possibly the least developed?

Possibly? Most certainly! I'm not aware of a formal standard of
how to write standards :-) There are many pieces used in various
contexts, for example, IETF RFCs have a certain structure of chapters
and a consistent use of verbs like SHOULD and MUST. I think they
finally agreed also on a single BNF grammar for defining languages.

As for overlapping developments in a specific area, there is our
beloved presentation of content for human reading. We have (X)HTML+CSS
on one hand and XSLFO on the other. If we take voice rendering into
account, which is also attempted to be covered by both standards,
there are quite a few more initiatives.
Another thing that bothers me is that CSS is not XML syntax. This
makes reprocessing XHTML or SVG harder. It'd be interesting when
they come up with XCSS. Also, if CSS3 ever gets implemented
completely, you can write complete pages with div tags only :-)
(oops, still need html, head and body too :-)

> > keep the interfaces between them simple and slender.
> (Note David C's comments on namespaces earlier this week ;-)

Yeah. If you'd been in charge, and without even a commitment
to backwards compatibility, how would you have done it? :-)

> > It could be argued that the ID/IDREF/IDREFS combo is a datatype, a
> > reference, therefore it should be included into concept 2 and the
> > following concepts should get the tools to deal with it.
> 
> Todays situation?

The current model says it's part of the structure definition (DTD),
and XPath has a function to lookup elements by ID, which may come
from IDREFS. The thread was started because of the apparent lack
of a function which would do the reverse lookup: getting elements
by IDREF/IDREFS.
Note that ID/IDREFS got into the spec before they even dreamt of
xsl:key.

> > to disagree. First, the reference concept is also incorporated in
> > concept 4, which is of course broader than ID/IDREFS. The
> > difference is where it is defined what actually is a "reference".
> 
> I'd add that I very much use references, internally and externally,

Hehe, IDs can't refer to external stuff. I too use references a lot.
I define them at the semantic level, we have objref and structref
and machineref elements. Not everybodies best solution, but this allows
us to implement a sort of type checking (IDs can refer to everything,
a machineref must refer to a machine). It is also less redundant (elements
which can be referred to must have a unique name for other reasons anyway),
and the references don't change if we split or merge documents. I want
to note that this is tailored to our problem space. Preparing
publications is another matter, where IDs fit quite nicely (text bits
don't naturally have unique names).

> hence I'm very eager to have valid cross document references using
> xpointer.

XLink defines also a semantic for certain types of linking, it's
designed to be an extension of HTML links. It does, of course, fit
quite nicely the problem of hyperlinking content.

> > There is nothing which would forbid to introduce the concept of
> > keys in XSchema, and an XSLT program should be able to read it
> > from there. Of course, we still need xsl:key for
> > - ad-hoc semantic and references not expressend in a schema
> > - processing XML without an attached schema definition
> > - technical keys (Muenchean Grouping etc.)
> 
> And validation of such pointers via Schematron or a transform?

You'll have to write it by yourself. We do so in the transformation

  <xsl:key name="machine" match="machine" use="formalname"/>

  <xsl:template match="machineref" mode="html-link">
    <xsl:variable name="target" select="key('machine',.)"/>
    <xsl:choose>
      <xsl:when test="count($target)=1">
        <a href="{$root}/spec/mach/{.}.html>
           <xsl:apply-templates select="$target" mode="html-link-title"/>
        </a>
      </xsl:when>
      <xsl:when test="count($target)=0">
        <xsl:text>Dangling link to </xsl:text>
        <xsl:value-of select="."/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:call-template name="referenceerror"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

We have also more structured references, consisting of scopes and names
therein.
If i'd omit the "Dangling link" case, which is presentation specific,
i could easily declare a rule that there should be a one-to-one relation
between machinerefs and machines in a schema. I have not yet bothered
to examine XSchema whether it could do the job for me, as mature schema
validating parsers are still somewhat short in supply. With the "Dangling
link" case, early validation doesn't buy much anyway.

> Generally I can see where you are coming from, and must say I've
> never taken such a broad view.

I hope i havn't discouraged you from doing so.

> The pace of XML development (as noted on the xml-dev list) is rather
> fast. I guess the TAG group on W3C are the only ones who may have
> time to sit back and put items in their right place, but somehow
> I can't see it happening until something is really broken.

Talk to other standardisation guys. I was involved in STEP (CAD system
data exchange) and OSACA (EU project for defining an architecture
and interoperability for software components in Numeric Controls. An
abyssmal failure). They face always overwhelming complexity (if its
easy nobody would bother to setup a committee), and often company
politics. Wisdom is always late.

Regards
J.Pietschmann

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread