[xsl] Identity of Documents Puzzle

Subject: [xsl] Identity of Documents Puzzle
From: "W. Eliot Kimber" <eliot@xxxxxxxxxx>
Date: Fri, 06 Dec 2002 18:08:05 -0600
I am working on an implementation of xinclude that implements ID rewriting so that links are authored as references to elements in their base source locations but the links are resolved correctly in the transcluded result (as opposed to authoring the links as references to the elements in their transcluded locations).

In order to do this I must determine the set of documents that comprise a single compound document (so that I can tell whether a link to a particular file is in the same compound document or in a different one). I call this the "xinclude BOS (bounded object set)".

This all works great except for links from an included document back to the top-level document. At the start of xinclude processing, I calculate the BOS by constructing a node list of document nodes, one for each unique document in the document tree represented by the xinclude references, including to initial document. However, a subsequent reference to the initial document's file results in a new document node being constructed, a node different from the one initially added to the BOS list for the top-level document. Thus, the "is document in BOS?" check made during the rewriting of link pointers fails and my code treats the reference as a cross-compound-document link, so it fails in the transcluded result doc (because the pointer is not rewritten to reflect the location of the target in the trancluded result document).

This is all tested with Saxon 6.5.2 and depends on, at a minimum, that multiple calls to document() with the same URL will result in the same document node instance (and ideally, calls to the same file system object (e.g., inode in *nx file systems) would result in the same document node instance).

My question: is there any way, other than passing in the filename of the top-level file as a parameter to the style sheet, to ensure that the node for the document as created by the initial style sheet processing is the same as one for a call to document() for the same file (it may or may not be the same filename depending on the relative locations of the files involved)? I can't think of one, but there are many subtleties of XSLT that I have yet to master.

If I pass in the top-level filename as a parameter, then I can use document() on that (and just ignore the initially-created document), but that seems sort of crude. It seems like there ought to be a more fundamental way to do this. [In HyTime, because everything being processed is formally grovified, it is possible for there to be a reliable identity relationship between input files and the document nodes created from them--I don't think anything in XSLT requires that level of precision of in-memory representation.]

Or is there some other way to create a reliable identity for document nodes that doesn't depend on these sorts of implementation details (or on, for example, putting globally-unique identifiers on document elements)? I can't think of one off the top of my head.

Because files are objects and therefore have inherent identity, it shouldn't be necessary, in the abstract, to need to add additional identifying metadata to a document in order to know with certainty that it is or is not another document, regardless of how that document is referenced.

Thanks,

Eliot
--
W. Eliot Kimber, eliot@xxxxxxxxxx
Consultant, ISOGEN International

1016 La Posada Dr., Suite 240
Austin, TX  78752 Phone: 512.656.4139


XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list



Current Thread