Subject: Re: [SML] Whether to support Attribute or not? From: "Oren Ben-Kiki" <oren@xxxxxxxxxxxxx> Date: Tue, 30 Nov 1999 12:48:53 +0200 |
Sean McGrath <digitome@xxxxxx> wrote: > The notion that elements are for information whilst > attributes are for meta-information is, IMO, bogus. I'd like to second that. The distinction between "information" and "meta information" is strictly application dependent, for the very same document. For example, consider an HTML document. For a audio-only browser, the texts - and only the texts - are the "information". Note that some of this text is in attributes (alt="..."). For a graphic design tool, the layout is the "information". This is definitely in attributes (width="...", not to mention style="..."). Note that in this case, some tools might choose to ignore the text altogether. For example, if one is designing a template into which some text will be "poured" later. A text retrieval engine might have a different notion - maybe similar to the audio-only browser, but it would probably be interested in more "semantics" which might be available in yet another set of attributes... And, of course, there is your regular browser, for which _everything_ is "information". I don't see that one can firmly say "this is meta-information" and "this is information". Look at it another way - if some data isn't "information" for _some_ application, then it wouldn't have been included in the document in the first place... As for whether SML should contain attributes: the only good reason given for them was improved performance. Let's examine the case of streaming SML XSLT processor - presumably this is where the problems will surface. Suppose we want to find all "tag"s with a given "id", and do something to its "sub-tag" content. There are several alternatives: 1. No attributes, allow "forward rules" (Paul Tchistopolskii's terminology). The document can be in any order, the stylesheet would look like: <xsl:template match="tag[id='value']"> <xsl:for-each select="sub-tag"> ... </xsl:for-each> </xsl:template> This seems the cleanest approach, except that the processor would need to buffer arbitrary amounts of data. Given advanced optimization, proper document ordering, and knowledge of this ordering (as in the input DTD), this could achieve the same effect as (3), greatly reducing buffering. 2. No attributes, disallow all "forward rules". We'd have to assume that the document author was polite enough to specify the id "attribute" element before any "content" element. The stylesheet would look like: <xsl:template match="tag/id='value'"> <xsl:for-each select="following-siblings::sub-tag"> ... </xsl:for-each> </xsl:template> The document designer is responsible for ensuring that the order of elements in the document is such that all required processing is possible. Of course, the stylesheet writer could do some buffering himself: <xsl:template match="tag"> <xsl:assign name="id-of-tag" expr=""/> </xsl:template> <xsl:template match="tag/id"> <xsl:assign name="id-of-tag" expr="."/> </xsl:template> <xsl:template match="tag/sub-tag"> <xsl:if test="$id-of-tag='value'"> ... </xsl:if> </xsl:template> This allows matching on more then one attribute but is cumbersome and there would still be "impossible" stylesheets - unless one allows matching on result tree fragments. 3. A combination; allow forward rules but still rely on document order. In this approach, the trick is to avoid needless buffering of "sub-tag" elements when the "id" element is missing or has the wrong value. This can be done as follows: <xsl:template match="tag[id='value']"> <xsl:for-each select="sub-tag"> ... </xsl:for-each> </xsl:template> <xsl:template match="tag/*"/> The processor is "greedy" - it will use the first template that matches, preferring "higher" ones. Therefore, "tag/id" would cause the first template to trigger, while "tag/sub-tag" would trigger the second one. Since only one template may match each input element (streaming!), "tag/sub-tag" would also disqualify the first template from being considered further, canceling the buffering. I haven't figured out how this interacts with a template priority mechanism, but it is clear that regardless of the exact rules, writing an efficient stylesheet would be much trickier this way - and it still relies on the document writer using proper ordering of the elements. 4. Allow attributes, disallow "forward rules" except for matching on an attribute. The stylesheet would look like: <xsl:template match="tag[@id='value']"> <xsl:for-each select="sub-tag"> ... </xsl:for-each> </xsl:template> In this scheme, attributes are simply text valued elements which will be buffered by the processor. Note that this has nothing to do with semantics (meta-information vs. real content). The document writer has to change the type of some elements to attributes - a much worse pollution of modeling by implementation issues then a simple reordering. Also note that this is a weaker approach then (1) and (3), since only a single level of lookahead is allowed. 5. Allow both attributes and "forward rules". The stylesheet could be either like (1) or (4), depending on the document structure. In this scheme, attributes are simple text valued elements which will (i) be buffered by the processor, and more importantly (ii) match patterns on attributes would be resolved before any content is seen. Here attributes are really just an optimization hack, and there would be endless debates as to when one should use them (that is, the existing situation). I'm partial to (3), myself, while investigating the possibility that (1) can be optimized to achieve the same effect. If we are going to rely on optimization hacks, at least lets make them as unobtrusive as possible. Have fun, Oren Ben-Kiki XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [SML] Whether to support Attrib, Oren Ben-Kiki | Thread | RE: use of id(...) ?, Harbarth, Juliane |
RE: How do I drop an apostrophe?, Kay Michael | Date | RE: undefined behavior (was documen, Kay Michael |
Month |