[xsl] SEPM0004 ignores corrected XML when corrected by character-map

Subject: [xsl] SEPM0004 ignores corrected XML when corrected by character-map
From: Abel Braaksma <abel.online@xxxxxxxxx>
Date: Fri, 16 Feb 2007 18:04:52 +0100
Hi List,

I have a specs question that I stumbled upon when helping someone out with his doctype problems earlier today (gmt+1). Whenever you use doctype-system and/or doctype-system and/or standalone (other than 'omit') and you have some text node in your data model under the root, this will raise an SEPM0004 error, as explained here: http://www.w3.org/TR/xslt-xquery-serialization/#ERRSEPM0004.

I found only one parser actually raising that error (Saxon, of course), one other parser (AltovaXML) either output nothing, or output the wrong XML without error. According to the specs, you can ignore the error and recover by removing the doctype/standalone attributes and output as normal.

Of course, the reasons behind having this error are clear in most situations. But what should happen when the output, after character mapping, happens to become legal XHTML/XML? Example (though not a good use-case):

<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
<xsl:output method="xml" indent="yes"
use-character-maps="testmap"
standalone="yes"/>
<xsl:character-map name="testmap">
<xsl:output-character
character="&#xE050;"
string="&#xA;&#xA;" />
</xsl:character-map>
<xsl:template match="/" name="main">
<xsl:text>&#xE050;</xsl:text>
<some-root />
</xsl:template>
</xsl:stylesheet>



The result of this stylesheet, if SEPM0004 were not there, would be the perfectly legal XML (which is the output from the - in this respect - non-conforming AltovaXML):


<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<some-root />


Most of the discussion on character-maps is about "the resulting serialized XML may be non-well-formed or non-validating". But this situation is the other way around, by applying a character-map, the resulting XML becomes correct. Shouldn't the SEPM0004 be raised after this phase? Like: "if the serialized result tree does not contain non-whitespace text nodes in the root the SEPM0004 should not be raised"?


I know, it is all a bit of a corner case. I just stumbled upon it today and I am just curious what your (the list) thoughts are on this matter, if any.

Cheers,
-- Abel Braaksma
  http://www.nuntia.nl

Current Thread