[xsl] odf2xhtml: Processing nested element content seperatly ?

Subject: [xsl] odf2xhtml: Processing nested element content seperatly ?
From: "Andreas M." <sfamix@xxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 27 Oct 2006 15:24:50 +0200
Hi,

I am trying to create an OASIS ODF -> XHTML XSL-T. I want it to be 
as much 1:1 as possible. I ran into some problems, that I find no 
way to solve.

I am using XSLT v1.0 and currently parse with MSXML.NET on oXygen.

A quick outline of the problem:

ODF has a different approach to lining out text than HTML. HTML is 
sensible: Within html:p there may be no other block-elements. Only 
inline-elements are allowed. The same is valid for inline elements 
(ie: html:span, html:img, html:a). They may contain no 
block-elements (html:div, html:h*, etc.)

ODF can intermix paragraphs with tables and frames (that would 
translate to html:div as the most logical advice)

Now, if you have a source document with a paragraph and inside 
this paragraph you have a frame with an image, and this 
image-frame itself contains a paragraph of text 
(a description to the image), then the problems start.

It seems, at least to my knowledge and skills, impossible to create a
clean ODF -> XHTML translation. Check this horrible result out. Of
course, this results in completly invalid XHTML.


"content.xml" (the source):

<text:h text:style-name="Heading_20_1" text:outline-level="1">
	<draw:frame draw:style-name="fr2" draw:name="Grafik1"
	   text:anchor-type="paragraph" svg:x="2.27cm"
	   svg:y="2.057cm" svg:width="5.689cm" style:rel-width="22%"
	   svg:height="5.539cm" style:rel-height="scale"
	   draw:z-index="11">
		<draw:image 
		   xlink:href="Pictures/100000000000012C0000012CBED4AE2D.jpg"
		   xlink:type="simple"
		   xlink:show="embed" xlink:actuate="onLoad"/>
		</draw:frame>TITLE_TEXT
</text:h>
<text:p text:style-name="Text_20_body">
	<draw:frame draw:style-name="fr3"
	   draw:name="KaratekaPrincess" text:anchor-type="paragraph"
	   svg:x="15.727cm" svg:y="0.279cm"
	   svg:width="10.16cm" svg:height="7.17cm" draw:z-index="4">
	 	<draw:image xlink:href="Pictures/10000201000001800000010FE410B668.png"
		   xlink:type="simple"
		   xlink:show="embed" xlink:actuate="onLoad" />
	</draw:frame>
	SOME_PARAGRAPH_TEXT
		<text:span
		   text:style-name="Emphasis">THIS_WILL_BE_EMPHASIZED
		</text:span>.
	PARAGRAPH_TEXT_CONTINUES
</text:p>
			

"content.html" (the result):

<div
style="top:2.27cm;left:2.057cm;height:5.539cm;width:5.689cm;border:1px
solid black;">
         <img src="Pictures/100000000000012C0000012CBED4AE2D.jpg"
alt="Pictures/100000000000012C0000012CBED4AE2D.jpg"/>
</div>TITLE_TEXT
<p>
	<div
style="top:15.727cm;left:0.279cm;height:7.17cm;width:10.16cm;border:1px
solid black;">
            <img src="Pictures/10000201000001800000010FE410B668.png"
alt="Pictures/10000201000001800000010FE410B668.png"/>

</div>SOME_PARAGRAPH_TEXT<span>THIS_WILL_BE_EMPHASIZED</span>.PARAGRAPH_TEXT_CONTINUES.
</p>


This is completly crazy!

Please note, that both images are outlined "at paragraph" in OpenOffice.
So it should not happen, imo, that the first image gets put into the
<text:h>, since there is clearly a new paragraph following the
heading. I mean, the title comes _before_ the image in the document,
which is aligned at the side to the paragraph following the heading.

I also have no clue as to what technique to use in order to get the
<text:p> and the <draw:frame> correct. In HTML the only element,
that would match a draw-frame would be a <div>, but a <div> is not
allowed within <p>. So, for the ODF this is perfectly fitting, also
it is perfectly legal to have an <img> within a <p> in HTML, but as
soon we get the frame, there seems to be a problem.

I would be very glad if someone would know of a solution, since right
now, I make all a <div> and this is surley not, how HTML should be
marked up.


I also checked the XSL FAQ, especially the point about xsl:copy. I had 
hoped, that I, somehow, could do a programmatic rearrangement of the
elements in question. First I would extract all the text from the
text:p element and remember all other, that is contained within, which
then I would process seperatly, after the text:p has been transformed
neatly into html:p. However, if I use the text() function I get only 
the first fragment of the text and, since I need to issue an 
xsl:apply-templates I get the text even twice.

Thanks.

-- 
Bye, 
Andreas M.

Current Thread