Re: [xsl] Advice on dictionary conversion

Subject: Re: [xsl] Advice on dictionary conversion
From: Terry Badger <terry_badger@xxxxxxxxx>
Date: Wed, 19 Jan 2011 11:14:50 -0800 (PST)
If you would make available your letter D Word file I would be
interesting in 
taking a small crack at it. Everything being in normal may
make it difficult to 
do as most of the Word to XML stuff I have done had lots
of styles so you could 
get a handle in the structure. 

From: Ciaran S Duibhmn
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Wed, January 19, 2011 1:01:15 PM
Subject: Re: [xsl] Advice on dictionary

Grateful thanks to all who responded to my enquiry on this
subject.  I am 
encouraged to persevere.

Several people advised to do the
conversion as a series of small steps, and I 
will keep that in mind. 
However, I am developing my conversion on a non-final 
version of a part of
the dictionary (the letter D), so it must be re-runnable on 

the other
letters, as well as on the final version of D!

I will see how far I can get
with a simple series of  "template match" 
operations, before I think about
anything more advanced.  Whatever can't be done 

that way may be feasible
manually; and in any case I expect to do manual 
tidying-up of odd cases and
data errors which would not be covered by a 
programmed conversion.

there are plenty of webpages about XSLT, it is difficult to find the
information you want.  For example, I spent most of a day putting this
  <xsl:template match="my_element[@font-weight='bold' and
contains(preceding-sibling::node()[1], '=')]">
But I suppose there are a few
things in it that I will not need to learn a 
second time.

There were some
interesting comments on the initial doc/rtf to xml conversion. I 

several, and here are some figures, for the letter D.
  .doc file 772 KB
Word save as rtf  711 KB
  Word save as txt  286 KB
  Word 2003 save as xml 
4592 KB
  Novosoft rtf-to-fo  2624 KB  (
rtf-to-xml  400 KB  (
rtf-to-xml with -p parameter  536 KB
  Yawc doc-to-xml  433 KB 
Unfortunately (for me) the Walter and Yawc
conversions both discarded small-caps 

info.  I haven't noticed anything
important discarded in the Novosoft 
conversion, but it is still a lot smaller
than the MS one.

Incidentally, the original Word files make no use of styles,
but are all in 

Many thanks again,
Ciaran S Duibhmn

Current Thread