Re: [xsl] how to extract text, translate and re-insert it in XHTML

Subject: Re: [xsl] how to extract text, translate and re-insert it in XHTML
From: Ken Starks <ken@xxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 08 Jan 2009 12:49:30 +0000
Well, here is the approach used in the Internationalisation ('I18n') transformer of the Apache Cocoon project.
It usually uses a namespace prefix of I18n: This 'transformer' is part of the Cocoon pipeline and works on the fly.


Stage One, you wrap up any fragments of text that you want translated in an <I18n:text> element, which has
an optional attribute I18n:key. This is the hard part, and very difficult to automate. It needs a certain amount of
judgement about how large to make the fragments, and whether to include punctuation, among other things.

<I18n:text>Good morning</I18n>, <I18n:text I18n:key = "everyone" >Ladies and Gentlemen</I18n>

Stage Two. Your translators, for whatever language, must create a catalog file, which is a kind
of phrase book. The 'I18n:key' used here corresponds either to the content above ('Good morning') or the
i18m:key already specified.

<catalog xml:lang="fr">
<message key="Good morning">Bonjour</message>
<message key="everyone">Monseiurs et Madames</message>

You would have a catalog file for kilingon and any other languages you want.

The transformation system has other elements to help: translation of stock
... phrases with parameters, translation of dates.

The various files also have to follow a naming convention and be saved in a specific location
in the cocoon setup.

Robert P. J. Day wrote:
  it's been a while since i've written anything in XSLT so i'm going
to try to explain what a colleague is trying to do, assuming *i*
understand it.

  1) start with an involved XHTML document
  2) "extract" just those (english) parts that involve translatable
     text, and hand it to a translator
  3) translator translates english to, say, klingon
  4) rebuild original document with klingon content instead of english

as i understand it, the point of the extraction is that no one wants
to burden the translator with all of the XHTML tagging -- the
translator wants to get the text stripped of all the "clutter", at
which point, after translation, someone needs to be able to put the
document back together.

  is this even a reasonable thing to ask?  in order to reassemble the
document, i'm assuming one is going to have to ID every single bit of
text to have a reference to build backwards.

  thoughts on this?  has anyone done something like this?  or are you
all too busy laughing hysterically by now?


Current Thread