Re: [xsl] Access to unparsed entities

Subject: Re: [xsl] Access to unparsed entities
From: Jeni Tennison <jeni@xxxxxxxxxxxxxxxx>
Date: Fri, 18 Oct 2002 23:59:00 +0100
Hi Wendell, Greg,

>>It would be nice to have such [unparsed] entities stored in a table
>>when the document is first read in, such that an XSL transformation
>>can read from and write to the table, and such that the table is
>>again written out in the document's internal DTD subset after
>>transformation is complete.
>
> Wouldn't it? This sounds like something very nice for XSLT 2.0. Off
> hand, I don't know what what they're planning if anything. (Can
> anyone speak to that? Jeni?)

Hmm... Well, there's a "could" requirement for this in the XSLT 2.0
requirements [1]:

  2.16 Could Improve Support for Unparsed Entities

  In XSLT 1.0 there is an asymmetry in support for unparsed entities.
  They can be handled on input but not on output. In particular, there
  is no way to do an identity transformation that preserves them. At a
  minimum we need the ability to retrieve the Public ID of an unparsed
  entity.
 
The latest XSLT 2.0 WD has got a function to support the ability to
retrieve the public ID of an unparsed entity, namely
unparsed-entity-public-id() [2]. So there's enough information
available in the stylesheet to let you build the table of unparsed
entities yourself.

If you did build such a table, then you can use the set of elements
described in Appendix G, "Representation of Lexical XML Constructs"
[3] in order to create a DOCTYPE declaration in which you declare the
entities that you want to declare. Something like:

  <lex:doctype name="foo">
    <xsl:for-each select="$entity">
      <lex:unparsed-entity-declaration name="{.}"
        system-id="{unparsed-entity-uri(.)}"
        public-id="{unparsed-entity-public-id(.)}" />
    </xsl:for-each>
  </lex:doctype>

(Hmm... I see that there's no way of getting the entity notation at
the moment; we should probably address that, but that, of course,
means also adding notation declarations, which aren't supported at
all currently -- or is the notation something that's derivable from
the public/system ID?)

Does that all sound like too much work? I guess that with typed
attributes we should be able to automate this a bit better -- if we
can spot those attributes that have the type ENTITY in the result, we
should be able to construct a DTD that includes entity declarations
for those entities. But, as you say, that does require us to have a
table somewhere that stores information about the entities, and which
you can update to store new information for the new document as well
as copying over entity information from the old document, and I can't
immediately think of an easy way to do that. The technique above is
the safest bet.

If either or both of you could drop a line to
public-qt-comments@xxxxxx giving an example of what you want to be
able to do, that would be helpful, especially if what I've described
above doesn't meet your requirements.
  
Cheers,

Jeni

[1] http://www.w3.org/TR/xslt20req#section-Requirements
[2] http://www.w3.org/TR/xslt20/#unparsed-entity-public-id
[3] http://www.w3.org/TR/xslt20/#lexical-representation

---
Jeni Tennison
http://www.jenitennison.com/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread