[xsl] copy-of XHTML replicates end-tag when xsl:output is HTML

Subject: [xsl] copy-of XHTML replicates end-tag when xsl:output is HTML
From: Peter Flynn <pflynn@xxxxxx>
Date: Wed, 21 Dec 2011 15:18:46 +0000
I have to scrape a HTML page, not something I do very often, so I pass
it through Tidy -asxml and end up with (XHTML):

...
<h3>Committees</h3>
<table style="width:99%;">
  <tr class="colour2">
    <td width="100%">
      <b>Committee :</b> Association des C)tudes franC'aises et
      francophones en Irlande<br/><b>From:</b>01-JAN-04
      <b>To:</b> 30-DEC-99
    </td>
  </tr>
</table>
...

In my XSLT2 (Saxon9 via Cocoon), with
xmlns:h="http://www.w3.org/1999/xhtml"; and xsl:output="HTML" I have:

<xsl:copy-of select="//h:table
     [preceding-sibling::*[1][local-name()='h3']]
     [preceding-sibling::*[1][local-name()='h3']='Committees']"/>

which produces:

<table xmlns="http://www.w3.org/1999/xhtml"; style="width:99%;">
  <tr class="colour2">
    <td width="100%">
      <b>Committee :</b> Association des C)tudes franC'aises et
      francophones en Irlande<br></br><b>From:</b>01-JAN-04
      <b>To:</b> 30-DEC-99
    </td>
  </tr>
</table>

The <br/> of Tidy's generated XHTML is being expanded by the copy-of to
<br></br> instead of being contracted to <br> as implied by the output
setting of HTML. If copy-of is able to detect the <br/> and perform an
implicit transform like that, I'm puzzled as to why it does it that way
round.

I'm sure there is a good reason for it (although it is opaque to me) but
it results in IE rendering two newlines, not one, and we can't go
upsetting IE users :-)

Is there a way to avoid this, or should I work around it by providing
suitable identity templates and using apply-templates instead of copy-of?

[It's probably blindingly obvious, but at this point in this week I'm
probably not seeing it :-]

///Peter

Current Thread