Re: feature request

Subject: Re: feature request
From: Eric van der Vlist <vdv@xxxxxxxxxxxx>
Date: Mon, 15 May 2000 20:57:04 +0200
David Carlisle wrote:
> 
> > A XML transformation language should (IMHO) be able to produce whatever
> > valid XML you want to produce and it's not the case with XSLT without
> > nasty kludges.
> 
> It comes down to the question "what is an XML document?" A question on
> which the XML specification is remarkably silent.
> 
> XSLT does not transform the concrete syntax for one XML document into
> the concrete syntax for another, and so the question is: what is
> an entity reference. The view taken by XSL is that (like CDATA marked
> section) it is just a flag to the parser to take some particular action,
> but the actual XML document (aka source tree aka infoset aka grove)
> just contains the resulting data, not the information about how that
> data came to be parsed from an external file.

Yes, it's a problem which has been often discussed but which, IMHO, has
not received a fully satisfying answer. 

Substituting the entity references is, most of the times, the best way
to go as &ref; may have a different meaning in the output document and
this can be confusing.

However, replacing special characters is not the worse issue as, as you
mention, XSLT is managing the encoding type perfectly well.

The problem comes when you use external entities as a way to write
modular XML files.

If you look, for instance at a root docbook document :

<?xml version='1.0'?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V3.1//EN" [
<!ENTITY chap1 SYSTEM "chap1.sgm">
<!ENTITY chap2 SYSTEM "chap2.sgm">
<!ENTITY chap3 SYSTEM "chap3.sgm">
<!ENTITY appa SYSTEM "appa.sgm">
<!ENTITY appb SYSTEM "appb.sgm">
]>
<book><title>My First Book</title>
&chap1;
&chap2;
&chap3;
&appa;
&appb;
</book>

this is a document you can't easily transform using XSLT unless you want
to melt it in a single file.

To workaround this, I have stopped using these kind of constructs to use
either the xt:document extension which is not portable.
 
> So you can't ask `extend XSL so that it can write &foo;' without first
> asking to extend the data model so that it contains entity reference nodes,
> and extend the XML 1.0 parsing model so that entity references produce
> new nodes in the XML document tree.
> 
> So not being able to write &alpha; doesn't limit the XMl documents that
> you can write (for this definition of "XML document") it just limits
> the number of ways in which that document can be serialised. So
> from this point of view it's no worse than not having control over
> whether " or ' is used around attribute values.

Yes it is : in this case, you are loosing the ease with which you can
maintain your documents.
 
> > unless you want to implement a mechanism as nasty as the one which is
> > inserting the entity references in the HTML output methods :(
> 
> that seems a perfectly clean solution to me. It doesn't require anything
> other than characters to be stored in the internal representation, it
> just has a list of special characters that are linearsied using entity
> references that is longer than the list of special characters for
> which this is done in the XML output method.
> 

Yes, except that the DTD is hard coded in this case...
Sorry, I shouldn't have mentioned this example as the notion of clean or
nasty is quite subjective and it's not the worse issue (the "docbook"
one is worse IMO).

Eric
-- 
------------------------------------------------------------------------
Eric van der Vlist       Dyomedea                    http://dyomedea.com
http://xmlfr.org         http://4xt.org              http://ducotede.com
------------------------------------------------------------------------


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread