Re: [xsl] Procesing XHTML files with DOCTYPE statements

Subject: Re: [xsl] Procesing XHTML files with DOCTYPE statements
From: "Mukul Gandhi" <gandhi.mukul@xxxxxxxxx>
Date: Wed, 12 Jul 2006 12:39:55 +0530
I wrote a small JAXP program (using the EntityResolver approach I
wrote), which I think could be useful to you.

The XML file is (named dtdexp.xml):

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
<html>
<head>
<title>simple document</title>
</head>
<body>
<p>a simple paragraph</p>
</body>
</html>

The stylesheet used is (an identity transform) - named dtdexp.xsl:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="1.0">

<xsl:output method="html" indent="yes" />

<xsl:template match="node() | @*">
  <xsl:copy>
    <xsl:apply-templates select="node() | @*" />
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>

When run with Saxon (or Xalan-J) normally, following error message is produced:

Error  java.net.ConnectException: Connection timed out: connect
Transformation failed: Run-time errors were reported

Now I've written a utility Java class, using JAXP as below:

import java.io.*;

import org.w3c.dom.*;
import org.xml.sax.*;
import javax.xml.parsers.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import org.apache.xerces.jaxp.*;

import javax.xml.transform.*;
import javax.xml.transform.stream.*;
import javax.xml.transform.dom.*;

public class MyTransform implements EntityResolver {

  public static void main(String[] args)
  {

      String xmlfile = args[0];
      String xslfile = args[1];

MyTransform obj = new MyTransform();

      try {
        DocumentBuilderFactoryImpl factory = new DocumentBuilderFactoryImpl();
        DocumentBuilder builder = factory.newDocumentBuilder();
        builder.setEntityResolver(obj);

Document document = builder.parse(xmlfile);

        TransformerFactory tfactory = TransformerFactory.newInstance();
        Transformer transformer = tfactory.newTransformer(new
StreamSource(xslfile));
        transformer.transform(new DOMSource(document), new
StreamResult(new OutputStreamWriter(System.out)));

      }
      catch(Exception ex) {
          ex.printStackTrace();
      }
  }

  public InputSource resolveEntity(java.lang.String publicId,
                                     java.lang.String systemId)
  {

InputSource is = new InputSource(new StringReader(""));

     return is;
  }

}

When this is run as:

java MyTransform dtdexp.xml dtdexp.xsl

The output produced is:

<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>simple document</title>
</head>
<body>
<p>a simple paragraph</p>
</body>
</html>

The Java class gets rid of DTD reference before applying the XSLT
transformation.

Regards,
Mukul

On 7/11/06, dvint@xxxxxxxxx <dvint@xxxxxxxxx> wrote:
This is the first time I've had to process XHTML files with XSLT. I'm
using saxon and getting an error that it can't find the DTD referenced in
the file that I'm processing. File has:

<!DOCTYPE html
 PUBLIC "-//W3//DTD XHTML 1.0 Transitional//EN"
         "http://www.w3.org/tr/xhtml1/DTD/xhtml1-transitional.dtd";>

Result is:

Error on line 4 column 107 of
file:/C:/dev/LanguageDetection/RM0000010ZQ000X.htm
l:
 Error reported by XML parser: Cannot read from
http://www.w3.org/tr/xhtml1/DTD
/xhtml1-transitional.dtd
Transformation failed: Run-time errors were reported

This problem goes away as soon as I delete the DOCTYPE info, but I don't
want to (can't) do this for every file. Is there some way around this
error? Note that the DTD does exist at the URL provided, but the default
setup in Saxon doens't seem to find it.

This stylesheet is doing basically an identiy transformation with one
change in the body element to insert a new comment. Here is the stylesheet
in case there might be a way to work around this problem:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet
 version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
 xmlns="http://www.w3.org/1999/xhtml";
 xmlns:t3="http://tms.toyota.com/t3";
>

<xsl:param name="language" select="'en'" />

<xsl:variable name="commentText">
<xsl:choose>
       <xsl:when test="$language='en'">
               text 1 goes here        </xsl:when>
       <xsl:when test="$language='fr'">
               text 2 goes here
       </xsl:when>
       <xsl:when test="$language='sp'">
               Text 3 goes here        </xsl:when>
       <xsl:otherwise>UNRECOGNIZED LANGUAGE SPECIFIED</xsl:otherwise>
</xsl:choose>
</xsl:variable>

<xsl:output method="html"
       omit-xml-declaration="no"
       doctype-public="-//W3//DTD XHTML 1.0 Transitional//EN"
       doctype-system="http://www.w3.org/tr/xhtml1/DTD/xhtml1-transitional.dtd";
       indent="no"/>


<xsl:template match="*"> <xsl:choose> <xsl:when test="name(.)='body'"> <xsl:element name="{name(.)}"> <xsl:for-each select="@*"> <xsl:attribute name="{name(.)}" namespace="{namespace-uri(.)}"><xsl:value-of select="."/></xsl:attribute> </xsl:for-each> <xsl:comment> <xsl:value-of select="$commentText"/> </xsl:comment> <xsl:apply-templates/> </xsl:element></xsl:when> <xsl:otherwise> <xsl:element name="{name(.)}"> <xsl:for-each select="@*"> <xsl:attribute name="{name(.)}" namespace="{namespace-uri(.)}"><xsl:value-of select="."/></xsl:attribute> </xsl:for-each> <xsl:apply-templates/> </xsl:element> </xsl:otherwise> </xsl:choose> </xsl:template>

<xsl:template match="comment()">
       <xsl:comment><xsl:value-of select="."/></xsl:comment>
</xsl:template>

</xsl:stylesheet>

Current Thread