[xsl] Transforming HTML to NITF

Subject: [xsl] Transforming HTML to NITF
From: Adam Van Den Hoven <Adam.Hoven@xxxxxxxxxxxx>
Date: Fri, 16 Feb 2001 15:23:26 -0800
Since the body of NITF (News Industry Text Format, a standard format for
News content) is alot like HTML (in the simplest form), I'm allowing my
users to create NITF using an HTML parser. I then pass the HTML through HTML
Tidy to make it well formed XML and then through an XSL to make it NITF.

I have come across a problem that I dont know how to fix and I need the
communities help. 

the NITF has a <content.body> tag which is equivilant to HTMLs <body> tag.
However, its children are far more rigidly defined in that it only allows
elements as children. For my purposes, I'm allowed <p> <table> <ul> and <ol>
tags (there are others but we don't use them yet). 

After passing the HTML through HTML Tidy, I might get something like:

<body>
<p> this is some text</p>
<ul>
<li>item 1</li>
</ul>
this is <em>emphasis</em> some more <b>text</b></br></br>
<p>This is a new paragraph</p>
</body>
This would occur if I started with:
<body>
<p> this is some text
<ul>
<li>item 1</li>
</ul>
this is <em>emphasis</em> some more <b>text</b></p>
<p>This is a new paragraph</p>
</body>

> I need to get the line:
this is <em>emphasis</em> some more <b>text</b></br></br>
> to end up wrapped in <p> tags (preferably without the <br>s)
> 
> For clarity, the children of the body are:
     p
     ul 
|    text()	
|    em	
|    text()	
|    b	
|    br	
|    br	
     p	

> I need to work with thos tags that  have the | beside them as a single
> block so that I can wrap the entire thing in a <p> tag. Since I don't know
> the placement or the order or even the frequency of such situations (there
> is no reason why I couldn't have more blocks that need to be grouped
> together). The solution needs to be general. 
> 
> I really don't want to have to use scripting but if the best solution
> requires it, I'm running MSXML 3. 
> 
> 
> Adam van den Hoven
> Internet Application Developer
> Blue Zone
> tel. 604.685.4310
> fax. 604.685.4391
> Blue Zone makes you interactive.(tm) http://www.bluezone.net/
> 

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread