Re: [SML] Whether to support Attribute or not?

Subject: Re: [SML] Whether to support Attribute or not?
From: "Oren Ben-Kiki" <oren@xxxxxxxxxxxxx>
Date: Tue, 30 Nov 1999 12:48:53 +0200
Sean McGrath <digitome@xxxxxx> wrote:
> The notion that elements are for information whilst
> attributes are for meta-information is, IMO, bogus.

I'd like to second that. The distinction between "information" and "meta
information" is strictly application dependent, for the very same document.
For example, consider an HTML document.

For a audio-only browser, the texts - and only the texts - are the
"information". Note that some of this text is in attributes (alt="...").

For a graphic design tool, the layout is the "information". This is
definitely in attributes (width="...", not to mention style="..."). Note
that in this case, some tools might choose to ignore the text altogether.
For example, if one is designing a template into which some text will be
"poured" later.

A text retrieval engine might have a different notion - maybe similar to the
audio-only browser, but it would probably be interested in more "semantics"
which might be available in yet another set of attributes...

And, of course, there is your regular browser, for which _everything_ is
"information".

I don't see that one can firmly say "this is meta-information" and "this is
information". Look at it another way - if some data isn't "information" for
_some_ application, then it wouldn't have been included in the document in
the first place...

As for whether SML should contain attributes: the only good reason given for
them was improved performance. Let's examine the case of streaming SML XSLT
processor - presumably this is where the problems will surface. Suppose we
want to find all "tag"s with a given "id", and do something to its "sub-tag"
content. There are several alternatives:

1. No attributes, allow "forward rules" (Paul Tchistopolskii's terminology).
The document can be in any order, the stylesheet would look like:

<xsl:template match="tag[id='value']">
    <xsl:for-each select="sub-tag">
        ...
    </xsl:for-each>
</xsl:template>

This seems the cleanest approach, except that the processor would need to
buffer arbitrary amounts of data. Given advanced optimization, proper
document ordering, and knowledge of this ordering (as in the input DTD),
this could achieve the same effect as (3), greatly reducing buffering.

2. No attributes, disallow all "forward rules". We'd have to assume that the
document author was polite enough to specify the id "attribute" element
before any "content" element. The stylesheet would look like:

<xsl:template match="tag/id='value'">
    <xsl:for-each select="following-siblings::sub-tag">
        ...
    </xsl:for-each>
</xsl:template>

The document designer is responsible for ensuring that the order of elements
in the document is such that all required processing is possible. Of course,
the stylesheet writer could do some buffering himself:

<xsl:template match="tag">
    <xsl:assign name="id-of-tag" expr=""/>
</xsl:template>
<xsl:template match="tag/id">
    <xsl:assign name="id-of-tag" expr="."/>
</xsl:template>
<xsl:template match="tag/sub-tag">
    <xsl:if test="$id-of-tag='value'">
        ...
    </xsl:if>
</xsl:template>

This allows matching on more then one attribute but is cumbersome and there
would still be "impossible" stylesheets - unless one allows matching on
result tree fragments.

3. A combination; allow forward rules but still rely on document order. In
this approach, the trick is to avoid needless buffering of "sub-tag"
elements when the "id" element is missing or has the wrong value. This can
be done as follows:

<xsl:template match="tag[id='value']">
    <xsl:for-each select="sub-tag">
        ...
    </xsl:for-each>
</xsl:template>
<xsl:template match="tag/*"/>

The processor is "greedy" - it will use the first template that matches,
preferring "higher" ones. Therefore, "tag/id" would cause the first template
to trigger, while "tag/sub-tag" would trigger the second one. Since only one
template may match each input element (streaming!), "tag/sub-tag" would also
disqualify the first template from being considered further, canceling the
buffering.

I haven't figured out how this interacts with a template priority mechanism,
but it is clear that regardless of the exact rules, writing an efficient
stylesheet would be much trickier this way - and it still relies on the
document writer using proper ordering of the elements.

4. Allow attributes, disallow "forward rules" except for matching on an
attribute. The stylesheet would look like:

<xsl:template match="tag[@id='value']">
    <xsl:for-each select="sub-tag">
        ...
    </xsl:for-each>
</xsl:template>

In this scheme, attributes are simply text valued elements which will be
buffered by the processor. Note that this has nothing to do with semantics
(meta-information vs. real content). The document writer has to change the
type of some elements to attributes - a much worse pollution of modeling by
implementation issues then a simple reordering. Also note that this is a
weaker approach then (1) and (3), since only a single level of lookahead is
allowed.

5. Allow both attributes and "forward rules". The stylesheet could be either
like (1) or (4), depending on the document structure. In this scheme,
attributes are simple text valued elements which will (i) be buffered by the
processor, and more importantly (ii) match patterns on attributes would be
resolved before any content is seen. Here attributes are really just an
optimization hack, and there would be endless debates as to when one should
use them (that is, the existing situation).

I'm partial to (3), myself, while investigating the possibility that (1) can
be optimized to achieve the same effect. If we are going to rely on
optimization hacks, at least lets make them as unobtrusive as possible.

Have fun,

    Oren Ben-Kiki



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread