Re: Hypergroves

Subject: Re: Hypergroves
From: Eliot Kimber <eliot@xxxxxxxxxx>
Date: Tue, 23 Feb 1999 19:29:23 -0600
Russell Steven Shawn O'Connor wrote:
> I understand this, but it's not quite the answer to the question I have.
> I know that the grove data structure can contain urefnodes. This will link
> nodes in one grove to nodes in another grove.  Obviously the ENTITY
> reference you suggest links a node to the root of another grove.  I want
> to know if there is a natural way in SGML to link one node (element) to
> another specific node (element) in another grove (another SGML file).
> 
> I can fake this process by using to attributes.  One the entity of the
> other file (giving me the root of the grove) and another attribute of
> NMTOKEN type to give me the ID of the element in the other file (giving me
> a paticular node in the other grove).  This process just seems less than
> optimal.

There is no "direct" way to do this, because you are operating in two
different domains. The SGML document is in the syntax domain, while
groves are in the abstract data object domain. The bridge between these
two domains is the grove constructor that takes the syntax (the SGML
document) and creates the abstraction (the grove).  From the syntax
domain, there is no way to refer to grove nodes directly--they must be
pointed to indirectly through their properties (either their structural
position or things like names).

All addressing is in terms of groves (that is, node-to-node). Thus, to
create addresses in documents (in the syntax domain) you have to define
a syntax that will be reliably interpreted into a node-to-node address
in the abstract domain. It doesn't matter what syntax you choose, as
long as you have software that will interpret it correctly. The whole
reason for creating the grove concept was to provide a standardized
abstraction in terms of which addresses (and other types of processing)
could be defined outside of any implementation.

No matter what syntax you choose, it will always have two parts, at a
minimum: the address of the grove as a whole (normally shortcutted to
the entity from which the grove is constructed) and the address of the
thing within the grove you want to address. [Either of these two parts
could, in turn, have multiple parts, but the first part is always
addressing storage objects (entities) while the second part is always
addressing semantic objects (nodes in groves).]

Two attributes is one way to do this: one for the document entity, one
for the element within the document (e.g., the old doc= and refid=
common in the DynaText world). Another way is to combine the parts into
a single specification, such as a URL (file.xml#element-id). 

There are some relevant standards for these syntaxes. The HyTime
standard provides a robust, but verbose, syntax for representing
grove-based addresses syntactically in SGML or XML documents. The
Xpointer spec provides a fairly robust (but not completely robust) but
very compact syntax for addresses for XML groves (as Xlink is currently
not formally defined in terms of groves generally, but in terms of the
XML data model, it isn't defined for addressing non-XML groves). URL
syntax is more compact, but the least robust of the three.

Many SGML applications get a lot of mileage out of a two-attribute
approach. Note that the HyTime standard (2nd edition) provides
facilities so that almost any attribute-based approach can be understood
in terms of HyTime facilities without modifying any existing documents. 
This means you can do something pragmatic now without regard to HyTime
conformance and still use that data with generalized HyTime processors
(e.g., GroveMinder) in the future.

Cheers,

E.


 DSSSList info and archive:  http://www.mulberrytech.com/dsssl/dssslist


Current Thread