Re: [xsl] Flat to Structured: Handling List Items with Subordinate Paragraphs

Subject: Re: [xsl] Flat to Structured: Handling List Items with Subordinate Paragraphs
From: Robert Koberg <rob@xxxxxxxxxx>
Date: Tue, 26 May 2009 17:17:25 -0400
On May 26, 2009, at 5:11 PM, Robert Koberg wrote:

Hi,

(gone through this many times.) Usually, I find the easiest thing to do is open in Open Office and export as XHTML. That way you get the structure you want and then whittle the rest of the junk away till you get it to conform to some schema, maybe splitting out content pieces based off of H1s (we sometimes get whole websites written in Word).

I might not have been clear - I meant that we use XSL to remove the unnecessaries. (don't want to get yelled at :) )


Start out with the identity template and any obvious matches. Remove Ps that only contain whitespace. Remove pretty much all attributes. Remove many unnecessary SPANs. Doesn't take long: edit the xsl, run the transform, check validity, rinse, wash, repeat.

best,
-Rob



Another thing we do is just paste the Word content into our web based editor - Xopus - and it does the work to convert it to the current XML Schema. Does a really good job, but there is usually some clean up which is done by the author.

best,
-Rob

Current Thread