Word to other XML conversion. [ Re: [xsl] where to look for xsl folk..]

Subject: Word to other XML conversion. [ Re: [xsl] where to look for xsl folk..]
From: "Steven D Majewski steve.majewski@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 20 Jun 2016 20:18:10 -0000
We have an application that was used to interactively convert Word document
finding aids into EAD XML.

  https://github.com/uvalib/transmog

and I believe it can be adapted to convert to TEI XML instead.


The templates here are a set of rules that use regular expressions on the
headings to guess what XML elements those paragraphs should be assigned to,
and it looks like it could probably be reconfigured to output TEI instead of
EAD.

  https://github.com/uvalib/transmog/tree/master/src/main/resources


The webapp display those guesses and allows you to rearrange or reassign those
assignments.
So it doesnbt solve the problem of writing XSLT conversion rules, but it
does help with conversion of documents that may not exactly follow those
rules.

Typically the converted documents still require some manual QA and editing.


b Steve Majewski / UVA Alderman Library





> On Jun 20, 2016, at 3:30 PM, G. Ken Holman g.ken.holman@xxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Indeed hard does not mean impossible.  The Inera folks have a strong product
named eXtyles for going from Word to various JATS derivatives including ISOSTS
that I am personally interested in:
>
>  http://www.inera.com/resources/extyles-related-technologies
>
> I haven't heard much of any other Word-based products ... but I post this to
point out that it has been done successfully commercially.
>
> . . . . . . . Ken
>
> At 2016-06-20 18:58 +0000, Wendell Piez wapiez@xxxxxxxxxxxxxxx wrote:
>
>> Hi,
>>
>> On Mon, Jun 20, 2016 at 10:36 AM, Christopher R. Maden crism@xxxxxxxxx
>> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>> > On 06/19/2016 04:17 PM, adam adam@xxxxxxxxxxxxxxx wrote:
>> >>
>> >> We are working with docx files that need to be translated into HTML.
The
>> >> docx files are chapters of scholarly content that constitute a book. We
>> >> need to translate the docx into a tidy HTML version with direct
>> >> translation of semantic elements but with the elimination of styles.
>> >
>> > There are a few tools to do this kind of thing.  The Public Knowledge
>> > Project is working on integrating them into a pipeline; it's not ready
for
>> > prime time *quite* yet, but it's getting there, and the individual
>> > components may be useful to you on their own.  Check out <URL:
>> > https://github.com/pkp/xmlps > for source and more info.
>>
>> Indeed there are a number of different such initiatives some of them
>> including XSLT and so on topic. :-)
>>
>> (In fact didn't Eliot recently mention his thing for a Word -> DITA
pathway?)
>>
>> Whether using XSLT (and on topic) or not -- converting from Word (what
>> I like to call a 'paintbrush' application) into strong markup is going
>> to be a hard problem, largely because its boundaries are not in an
>> obvious place, plus they move. It will always be contested what is in
>> scope vs what is not, and there will be a tradeoff between generic and
>> specialized capabilities.
>>
>> Hard doesn't mean impossible, however, and what would be nice would be
>> a toolkit that could be adapted for local use....
>>
>> Cheers, Wendell
>>
>> --
>> Wendell Piez | http://www.wendellpiez.com
>> XML | XSLT | electronic publishing
>> Eat Your Vegetables
>> _____oo_________o_o___ooooo____ooooooo_^
>>
>
>
> --
> Check our site for free XML, XSLT, XSL-FO and UBL developer resources |
> Streaming hands-on XSLT/XPath 2 training @US$45: http://goo.gl/Dd9qBK |
> Crane Softwrights Ltd. _ _ _ _ _ _ http://www.CraneSoftwrights.com/s/ |
> G Ken Holman _ _ _ _ _ _ _ _ _ _ mailto:gkholman@xxxxxxxxxxxxxxxxxxxx |
> Google+ blog _ _ _ _ _ http://plus.google.com/+GKenHolman-Crane/posts |
> Legal business disclaimers: _ _ http://www.CraneSoftwrights.com/legal |
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus

Current Thread