Re: [xsl] XHTML DTD aware transformation and indentation behaviour

Subject: Re: [xsl] XHTML DTD aware transformation and indentation behaviour
From: Matthieu Ricaud-Dussarget <matthieu.ricaud@xxxxxxxxx>
Date: Thu, 02 Feb 2012 14:49:41 +0100
Hi Michael, thanks for your reply.

My apologies for the mail, I actually did not thought about different mail ready system. But fortunately your guessing skills helps :-) !
I had -strip:none to the command line and it does the trick : I get exactly the same indentation result as when commenting the DOCTYPE on the source file !


I'm using Saxon-HE 9.2.1.1J

I've just tried with the brand new "Saxon-HE 9.4.0.1J", and it is the same behaviour.
I guess -strip:none is not the default option.


Thanks a lot anyway, this is what I was looking for :-) !!

Just one more question about the output :
A <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> element is added to the <head> element, this is the only residual unindented element in the file. I'm not sure why saxon add this ? It doesn't seems to depend on the xsl:output settings : even when xsl:output is absent I get the <meta> added, with no attribute @encoding="UTF-8" on the xsl:output I also get the meta....



Regards,


Matthieu




Le 02/02/2012 12:09, Michael Kay a C)crit :
On 02/02/2012 10:48, Matthieu Ricaud-Dussarget wrote:
Hi all,

In my project I concatenate multiple xhtml files in one xml files. This aggregate file has to be edited by hand, that means indentation is important here for convenience.

Before I discovered XML Catalog, I used to delete all DOCTYPE declarations within source XHTML file with a perl script (which also remplace named entities with UTF-8 ones). This worked fine : the concatenated files were indented exactly like the XHTML sources.

But this was a bit dangerous in case I didn't match a special entity to replace with perl. And this was not a really good XML practice.

Now that I'm using a local XML Catalog and run my tranformation with Saxon in command line with this options :
-r:org.apache.xml.resolver.tools.CatalogResolver -x:org.apache.xml.resolver.tools.ResolvingXMLReader -y:org.apache.xml.resolver.tools.ResolvingXMLReader


I can't see exactly what's happening here because your mail client and mine have conspired to ignore the whitespace which was critical to understanding your message.

Generally, if you validate against a DTD, then whitespace in elements whose content model is defined as element-only (for example head and body) will be treated as ignorable, which means it's liable to be lost in a copy operation. Perhaps this is what is happening.

Try the option -strip:none on the command line to prevent this behaviour. The documentation says this is the default, but I'm not convinced it is correct: I seem to remember it changing some time ago in response to a W3C change.

Michael Kay
Saxonica




--
Matthieu Ricaud
05 45 37 08 90
NeoLibris

Current Thread