[xsl] tidy / not tidy?: how to eliminate the first line <!DOCTYPE html PUBLIC....

Subject: [xsl] tidy / not tidy?: how to eliminate the first line <!DOCTYPE html PUBLIC....
From: "Smirnov, Anatoliy" <anatoliy.smirnov@xxxxxxxxxxx>
Date: Thu, 8 May 2003 10:10:29 -0400
Hello everybody.

Please forgive me for what might be too simple thing to ask on this forum.
I have 3 questions.

Question 1.

My html files have as the first line
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd";>

When converting this file to XML with HTML-Kit from
http://www.chami.com/html-kit/ 
is there any way NOT to have this line in the result XML file?

Question 2.

Does anybody know if above mentioned HTML-Kit allows any batch processing
since I have to process quite a few HTML files at a time? (or command line
procesing?)

Question 3.

I tried to use another TIDY tool (sorry, don't remember where I downloaded
it from). The tool allows command line processing with this syntax

tidy -asxml -indent <source html file name>   >   <result xml file name>

The good thing about this tool is that I can do batch processing here though
I still have to delete the first line
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd";>
manually either from the HTML file or XML result file after processing by
TIDY before I can run my stylesheet. 
But there is another problem that I thought someone can clarify for me.

The second line in the generated XML file has this form:

<html xmlns="http://www.w3.org/1999/xhtml";>

Before processing the file with my stylesheet I have to remove the string 
xmlns="http://www.w3.org/1999/xhtml";
from this line.
  
It is the the attribute xmlns in this line that is in the way and prevents
my stylesheet from generating the result.
My stylesheet looks like

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
<xsl:template match="html">
  <ROWSET>
    <xsl:apply-templates select="body/h"/>
  </ROWSET>
</xsl:template> 
<xsl:template match="body/h">
 <ROW>
   <HEADER>
     <xsl:apply-templates/></HEADER>
   <BODY>
     <xsl:apply-templates
select="following-sibling::*[generate-id(following-sibling::h[1])=
	
generate-id(current()/following-sibling::h[1])]"/>
   </BODY>
 </ROW>
</xsl:template> 
</xsl:stylesheet> 

I tried in my stylesheet

<xsl:template match="html[@xmlns='http://www.w3.org/1999/xhtml']">

but this didn't help. What puzzles me is if there would be any other
attribute name instead of 'xmlns' for example 'xmls' then this approach
would work. Please can someone clarify what is the secret behind this
'xmlns' attribute and/or is there any way I can change my stylesheet to
process the file.
For referenece the source XML look like this:

<html xmlns="http://www.w3.org/1999/xhtml";>
  <body>
    <h> Header </h>
    <p> A </p>
    <dl>
      <dt> B </dt>
      <dt> C </dt>
    </dl>
    <h>Header </h>
    <p> D </p>
    <dl>
      <dt>
        <dt> E </dt>
      </dt>
    </dl>
    <p> F </p>
  </body>
</html>

Thank you for your time.

Anatoliy Smirnov
Department of Veterans Affairs


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread