RE: [xsl] One texdocument in and several xmldocuments out?

Subject: RE: [xsl] One texdocument in and several xmldocuments out?
From: "Stuart Celarier" <stuart@xxxxxxxxxxx>
Date: Mon, 6 May 2002 09:33:20 -0700
You can convert a Word document to HTML using File / Save As... and
selecting HTML or Filtered HTML. The difference between these two is
HTML preserves all of Word's information such as <span> tags to mark
spelling and grammar issues, whereas Filtered HTML drops the
Word-specific tags. Then follow the advice already provided here (e.g.,
Tidy) to ensure that the HTML is well-formed XML.


-----Original Message-----
From: owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx
[mailto:owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx] On Behalf Of Robert
Sent: Monday, May 06, 2002 06:43
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [xsl] One texdocument in and several xmldocuments out?


Zack Brown wrote:

>On Mon, May 06, 2002 at 01:28:51PM +0200, Tove Nilstun wrote:
>>I am a total beginner when it comes to XML, but in order to start
>>with it, there are two things I need to sort out.
>>I have a user guide (written in MS Word) with both text and pictures.
>>would like to 1. convert this document to several xml documents, one
>>headline and 2. create an additional xml file containing an index of
>>files created in step one.
>>Is this possible?
>Absolutely. Just create one XSLT file for each output file you desire.
>Then run the XML through your parser once for each XSLT file you've

You do not need an XSLT file for each page.

First you have to get the MSWord doc into XML. THere are a few products 
out there that convert Word to docbook or some other XML. A neat trick 
we found when building our MSIE-based editor was that you could paste a 
MSWord doc into an element that has contentEditable="true". IE converts 
this to HTML. We use JS to convert it to XML on the client, but you 
could use Tidy to get well-formed HTML (XML). Then hopefully there are 
clean separations to indicate where a new page should start. 
Apply-templates (loop) on each page division and (you can) use extension

functions built into Saxon or Xalan to create multiple output documents 
from one source.


 XSL-List info and archive:

 XSL-List info and archive:

Current Thread