[xsl] hyphenator in xsl implementing LIANG's algorithm

Subject: [xsl] hyphenator in xsl implementing LIANG's algorithm
From: Bruno Mascret <bmascret@xxxxxxx>
Date: Wed, 26 Aug 2009 22:27:30 +0200
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

I have just finished the first version of an xsl hyphenator (not xsl-fo,
all in xsl 2.O) implementing Liang's algorithm (the one used in tex). (1)

It requires 2 files: the main code file (hyphenation.xsl) which contains
the hyphenate's function, and the pattern file (hyphen.xsl) containing
the sequence of patterns.

Files can be found here:
https://svn.liris.cnrs.fr/nat/trunk/xsl/hyphenation.xsl
http://liris.cnrs.fr/~bmascret/nat/xsl/hyphens.xsl (French rules) (2)

I also added a sample test file here:
http://liris.cnrs.fr/~bmascret/nat/xsl/testHyph.xml (3)

I personally used saxon 9 as xslt processor.

hyphenation.xsl has a boolean parameter (debug) which allows to switch
to debug mode detailing the hyphenation process.

Sample outputs:
$~> java -jar saxon9.jar -s:testHyph.xml -xsl:hyphenation.xsl

 bon-jour
 le
 mon-de

$~> java -jar saxon9.jar -s:testHyph.xml -xsl:hyphenation.xsl  debug=true

{word: bonjour} -------
 * 564: {pattern used: 1bo} {result:1b0o0n0j0o0u0r0}
 * 657: {pattern used: 1j} {result:b0o0n01j0o0u0r0}
after LIANG: b0o0n1j0o0u0r0 : bon-jour
{word: le} -------
 * 674: {pattern used: 1le} {result:1l0e0}
after LIANG: l0e0 : le
{word: monde} -------
 * 595: {pattern used: 1de} {result:m0o0n01d0e0}
 * 693: {pattern used: 1mo} {result:1m0o0n0d0e0}
after LIANG: m0o0n1d0e0 : mon-de

    bon-jour
    le
    mon-de

I hope that can help, otherwise if you have any suggestion or comment to
improve it, please let me know.
The comments are in French, but if you wish, they can be translated into
English.

Best regards,
Bruno Mascret

NAT's project: a free universal Braille translator
http://natbraille.free.fr

Footnotes:
(1) My goal was to use it in a more complicated way (Braille
transcription, http://natbraille.free.fr or
https://svn.liris.cnrs.fr/nat for source code)

(2) The pattern file is auto-generated from a compatible hyphenation
dictionary (tex, openoffice) thanks to a java code.
If you find it useful, you can use the following files
https://svn.liris.cnrs.fr/nat/trunk/outils/HyphenationToolkit.java
(model) and https://svn.liris.cnrs.fr/nat/trunk/ui/ConfDictCoup.java
(view), or ask for a standalone software if needed.
Non-French dictionaries can also be used.

(3) the hyphenate function can of course be used in a different xml
structure ;-)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkqVmrIACgkQaOubDsBUvbtw7gCfZPRkNSTecgBmJryZYnnsRgvO
wwoAnRG2TZwRbLWvtJqK9b7wCZxfucXh
=dRjN
-----END PGP SIGNATURE-----

Current Thread