Re: [xsl] Access to unparsed entities

Subject: Re: [xsl] Access to unparsed entities
From: Gregory Murphy <Gregory.Murphy@xxxxxxxxxxx>
Date: Sun, 20 Oct 2002 16:32:43 -0700 (PDT)
On Fri, 18 Oct 2002, Jeni Tennison wrote:

> Hi Wendell, Greg,
> >>It would be nice to have such [unparsed] entities stored in a table
> >>when the document is first read in, such that an XSL transformation
> >>can read from and write to the table, and such that the table is
> >>again written out in the document's internal DTD subset after
> >>transformation is complete.
> >
> > Wouldn't it? This sounds like something very nice for XSLT 2.0. Off
> > hand, I don't know what what they're planning if anything. (Can
> > anyone speak to that? Jeni?)
> Hmm... Well, there's a "could" requirement for this in the XSLT 2.0
> requirements [1]:
>   2.16 Could Improve Support for Unparsed Entities
>   In XSLT 1.0 there is an asymmetry in support for unparsed entities.
>   They can be handled on input but not on output. In particular, there
>   is no way to do an identity transformation that preserves them. At a
>   minimum we need the ability to retrieve the Public ID of an unparsed
>   entity.
> The latest XSLT 2.0 WD has got a function to support the ability to
> retrieve the public ID of an unparsed entity, namely
> unparsed-entity-public-id() [2]. So there's enough information
> available in the stylesheet to let you build the table of unparsed
> entities yourself.

This is certainly improvement, as at least no information from the source
document is inaccessible to the transformation.

> If you did build such a table, then you can use the set of elements
> described in Appendix G, "Representation of Lexical XML Constructs"
> [3] in order to create a DOCTYPE declaration in which you declare the
> entities that you want to declare. Something like:
>   <lex:doctype name="foo">
>     <xsl:for-each select="$entity">
>       <lex:unparsed-entity-declaration name="{.}"
>         system-id="{unparsed-entity-uri(.)}"
>         public-id="{unparsed-entity-public-id(.)}" />
>     </xsl:for-each>
>   </lex:doctype>
> (Hmm... I see that there's no way of getting the entity notation at
> the moment; we should probably address that, but that, of course,
> means also adding notation declarations, which aren't supported at
> all currently -- or is the notation something that's derivable from
> the public/system ID?)

Another possibility is to build the table using a SAX filter, and insert
the contents of the table into the document using elements defined in
Appendix G, as you demonstrate above. This has the advantage that it could
be made to work with XSLT 1.0, and wouldn't require any extensions.

I hadn't read Appendix G, but now that I have, I think it is preferable to
trying to reconstruct the document type internal subset in the result
document. It converts all those archaic SGML constructs to plain old XML,
which will make all subsequent processing easier to understand.

> If either or both of you could drop a line to
> public-qt-comments@xxxxxx giving an example of what you want to be
> able to do, that would be helpful, especially if what I've described
> above doesn't meet your requirements.

As long as nothing declared in the document is hidden from the
transformation, I think the standard is adequate. XSLT 2.0 has addressed
the lack of access to an entity's public identifier. It would nice if a
future version would also provide access to the notation. Unparsed external
entities are _very_ SGML, and in a schema-enlightened world, will hopefully
go away, so I don't think a strong case could be made for providing extra
support for their construction in an XML result document. Most of them are
probably coming from SGML documents converted to XML.

// Gregory Murphy <Gregory.Murphy@xxxxxxx>
// Software Engineer
// Customer Network Platform, Sun Microsystems

 XSL-List info and archive:

Current Thread