Re: [xsl] Ascii end-of-file character output in an XSL file

Subject: Re: [xsl] Ascii end-of-file character output in an XSL file
From: Kevin Rodgers <kevin.rodgers@xxxxxxx>
Date: Wed, 1 Jun 2005 13:44:07 -0600
[I just came across this draft response to an old thread and finally
finished it.]

David Carlisle writes:
>   Is there a way in XSLT to output an external unparsed entity (which
>   would contain the disallowed character)?
> 
> In standard XSLT1 you can only write one output file anyway so you
> couldn't write the actual entity (even if you could generate the
> character), You can write a reference to such an entity as it's just a
> normal attribute value, but being an attribute  value you can't put it
> anywhere near the end of file. 

I did not mean to ask how to generate the character within the
stylesheet (which isn't possible), but how to read it from an external
file and write it to the output.

Let's not constrain ourselves to XSLT 1, and let's assume there is a
UTF-8 file containing just a single character, ASCII SUB (Control-Z) aka
Unicode SUBSTITUTE.  Since that's an ASCII character, it's UTF-8
encoding is identical, the single byte 1A.

In XML, we can do something like:

<!NOTATION c0-controls SYSTEM "http://www.unicode.org/charts/PDF/U0000.pdf";>
<!ENTITY substitute SYSTEM "substitute.utf-8" NDATA c0-controls>

<!ELEMENT sub EMPTY>
<!ATTLIST sub char ENTITY #FIXED "substitute">

<text>...<sub/>...</text>

A validating XML processor must inform the application of the system
identifier for the entity, and XSLT 2 supports that via the
unparsed-entity-uri function.  So we can do:

<xsl:template match="sub">
  <xsl:value-of select="unparsed-text(unparsed-entity-uri(sub/@char))"/>
</xsl:template>

But!  The spec says:

[ERR XTDE1180] It is a non-recoverable dynamic error if a resource contains
characters that are not permitted XML characters.

What is the rationale for that restriction?  It means that an XSLT
processor can't do anything with the content of files like
substitute.utf-8 -- not to mention binary files such as images or
compressed text.  Even if a processor implements an extension via the
<xsl:output method=qname-but-not-ncname> attribute to output binary
data, it would violate the spec to read such data in the first place.

-- 
Kevin Rodgers

Current Thread