Re: [xsl] XHTML DTD aware transformation and indentation behaviour

Subject: Re: [xsl] XHTML DTD aware transformation and indentation behaviour
From: Matthieu Ricaud-Dussarget <matthieu.ricaud@xxxxxxxxx>
Date: Thu, 02 Feb 2012 14:03:42 +0100
Hi Ganesh,

Thanks for you reply. I already set up a local XML catalog wich contains xhtml11.dtd, if not i could even not tranform the input.
I'm affraid to set indent to yes on the xsl:output in an production environnement cause I don't want extra blanks to be added between word anywhere.


I tested it anyway and I don't get the desired output :
- many blank lines are generated (inside div element for example)
- only a part of the <head> element is indented as desired, I still get :
<link href="my.css" rel="stylesheet" type="text/css" /><script type="text/javascript" src="my.js"></script></head>
on the same line.


As Michael said, I think it has to do with the DTD model itself.
I continue investigate.

Thanks,

Matthieu

Le 02/02/2012 12:06, Ganesh Babu N a C)crit :
Dear Matthieu,

You can achieve this by downloading all the modules of xhtml11.dtd and
place them in local and using catalogs and changing the indent to
"yes" which align your XHTML output in a tree structure. There is not
need to comment the DOCTYPE in the source file.

Regards,
Ganesh


On Thu, Feb 2, 2012 at 4:18 PM, Matthieu Ricaud-Dussarget <matthieu.ricaud@xxxxxxxxx> wrote:
Hi all,

In my project I concatenate multiple xhtml files in one xml files. This
aggregate file has to be edited by hand, that means indentation is important
here for convenience.

Before I discovered XML Catalog, I used to delete all DOCTYPE declarations
within source XHTML file with a perl script (which also remplace named
entities with UTF-8 ones). This worked fine : the concatenated files were
indented exactly like the XHTML sources.

But this was a bit dangerous in case I didn't match a special entity to
replace with perl. And this was not a really good XML practice.

Now that I'm using a local XML Catalog and run my tranformation with Saxon
in command line with this options :
-r:org.apache.xml.resolver.tools.CatalogResolver
-x:org.apache.xml.resolver.tools.ResolvingXMLReader
-y:org.apache.xml.resolver.tools.ResolvingXMLReader

Lets go in the probleme, my XSL is a simple identity template :

<xsl:output method="xhtml" indent="no" encoding="UTF-8"
omit-xml-declaration="no" doctype-public="-//W3C//DTD XHTML 1.1//EN"
doctype-system="http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"/>

<xsl:template match="* | @* | processing-instruction() | comment()"
mode="copy">
<xsl:copy copy-namespaces="no">
<xsl:apply-templates select="node()|@*" mode="copy"/>
</xsl:copy>
</xsl:template>

<xsl:template match="/">
<xsl:apply-templates mode="copy"/>
</xsl:template>

this is my XML source :
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd";>
<html xmlns="http://www.w3.org/1999/xhtml";>
<head>
<title>title</title>
<link href="my.css" rel="stylesheet" type="text/css" />
<script type="text/javascript" src="my.js"></script>
</head>
<body>
<div class="body">
<div class="pageTitre_container">
<h1>
<span>Title 1</span>
</h1>
<p><span class="big">This</span>  is<span class="little">a
paragraphe</span></p>
<p><span class="big">This</span>  is<span class="little">a
paragraphe</span></p>
</div>
</div>
<table>
<caption>This is a table</caption>
<thead>
<tr>
<td>Col 1</td>
<td>Col 2</td>
<td>Col 3</td>
<td>Col 4</td>
<td>Col 5</td>
</tr>
</thead>
<tbody>
<tr>
<td>  </td>
<td colspan="3" rowspan="7">
<p class="entitre-en-savoir-">C savoir</p>
<p class="no">
<span class="no-style-override-5">Certains grands magasins proposent des
comparatifs trC(s complets, prenez le temps de les parcourir. Vous pouvez
C)galement chercher des infos sur Internet via les sites des fabricants, ou
sur les forums&#160;: rien ne vaut lbavis dbun consommateur pour se faire
une idC)e prC)cise du produit&#160;!</span>
</p>
</td>
<td>  </td>
</tr>
<tr>
<td>  </td>
<td>  </td>
</tr>
<tr>
<td>  </td>
<td>  </td>
</tr>
<tr>
<td>  </td>
<td>  </td>
</tr>
<tr>
<td>  </td>
<td>  </td>
</tr>
<tr>
<td>  </td>
<td>  </td>
</tr>
<tr>
<td>  </td>
<td>  </td>
</tr>
<tr>
<td>  </td>
<td>  </td>
<td>  </td>
<td>  </td>
<td>  </td>
</tr>
</tbody>
</table>
</body>
</html>

Which gives as output :

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
  PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd";>
<html xmlns="http://www.w3.org/1999/xhtml";><head><meta
http-equiv="Content-Type" content="text/html; charset=UTF-8"
/><title>title</title><link href="my.css" rel="stylesheet" type="text/css"
/><script type="text/javascript" src="my.js"></script></head><body><div
class="body">
<div class="pageTitre_container">
<h1>
<span>Title 1</span>
</h1>
<p><span class="big">This</span>  is<span class="little">a
paragraphe</span></p>
<p><span class="big">This</span>  is<span class="little">a
paragraphe</span></p>
</div>
</div><table><caption>This is a table</caption><thead><tr><td>Col
1</td><td>Col 2</td><td>Col 3</td><td>Col 4</td><td>Col
5</td></tr></thead><tbody><tr><td>  </td><td colspan="3" rowspan="7">
<p class="entitre-en-savoir-">C savoir</p>
<p class="no">
<span class="no-style-override-5">Certains grands magasins proposent des
comparatifs trC(s complets, prenez le temps de les parcourir. Vous pouvez
C)galement chercher des infos sur Internet via les sites des fabricants, ou
sur les forums : rien ne vaut lbavis dbun consommateur pour se faire une
idC)e prC)cise du produit !</span>
</p>
</td><td>  </td></tr><tr><td>  </td><td>  </td></tr><tr><td>  </td><td>
</td></tr><tr><td>  </td><td>  </td></tr><tr><td>  </td><td>  </td></tr><tr><td>
</td><td>  </td></tr><tr><td>  </td><td>  </td></tr><tr><td>  </td><td>
</td><td>  </td><td>  </td><td>  </td></tr></tbody></table></body></html>

If I comment the DOCTYPE in the source I get :

<?xml version="1.0" encoding="UTF-8"?><!--<!DOCTYPE html PUBLIC "-//W3C//DTD
XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd";>-->
<!DOCTYPE html
  PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd";>
<html xmlns="http://www.w3.org/1999/xhtml";>
<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>title</title>
<link href="my.css" rel="stylesheet" type="text/css" />
<script type="text/javascript" src="my.js"></script>
</head>
<body>
<div class="body">
<div class="pageTitre_container">
<h1>
<span>Title 1</span>
</h1>
<p><span class="big">This</span>  is<span class="little">a
paragraphe</span></p>
<p><span class="big">This</span>  is<span class="little">a
paragraphe</span></p>
</div>
</div>
<table>
<caption>This is a table</caption>
<thead>
<tr>
<td>Col 1</td>
<td>Col 2</td>
<td>Col 3</td>
<td>Col 4</td>
<td>Col 5</td>
</tr>
</thead>
<tbody>
<tr>
<td>  </td>
<td colspan="3" rowspan="7">
<p class="entitre-en-savoir-">C savoir</p>
<p class="no">
<span class="no-style-override-5">Certains grands magasins proposent des
comparatifs trC(s complets, prenez le temps de les parcourir. Vous pouvez
C)galement chercher des infos sur Internet via les sites des fabricants, ou
sur les forums : rien ne vaut lbavis dbun consommateur pour se faire une
idC)e prC)cise du produit !</span>
</p>
</td>
<td>  </td>
</tr>
<tr>
<td>  </td>
<td>  </td>
</tr>
<tr>
<td>  </td>
<td>  </td>
</tr>
<tr>
<td>  </td>
<td>  </td>
</tr>
<tr>
<td>  </td>
<td>  </td>
</tr>
<tr>
<td>  </td>
<td>  </td>
</tr>
<tr>
<td>  </td>
<td>  </td>
</tr>
<tr>
<td>  </td>
<td>  </td>
<td>  </td>
<td>  </td>
<td>  </td>
</tr>
</tbody>
</table>
</body>
</html>


the head element is now indented and the table too, this is what i would like... but I don't want to comment the doctype in the source.

Has it something to do with the XHTML DTD model ? Any Idea how to achieve
what I'd like ?

Thanks,

Matthieu.




--
Matthieu Ricaud
05 45 37 08 90
NeoLibris

Current Thread