Re: [xsl] Hyphenation in XSL FO

Subject: Re: [xsl] Hyphenation in XSL FO
From: David Carlisle <davidc@xxxxxxxxx>
Date: Thu, 11 Jan 2001 13:13:20 GMT
> I wouldn't say English is the hardest. In Swedish, there are many compound
> words, like "sidlayout" (page layout) or "dokumenthantering" (document
> management). In these two examples, there are only one natural place for a
> hyphen.

Ah yes languages with compound words, they are always fun:-)
TeX actually doesn't cope with them too well, you have to rely on the
patterns being able to spot such words. However the TeX algorithm
could cope, several people have proposed modifying things so the
algorithm is run with two passes, the first with a set of patterns
designed to allow breaks at compound word breaks, the second (if the
language allows it) to allow less desirable hyphens obtained by
hyphenating each of the constituent parts. (I'd report what hypheantion
TeX gives for your examples, but strangely enough I don't have the
swedish patterns loaded into my teX setup)

The reason why I said English was hard is that unlike some other
languages, it is almost all exceptions to rules, especially using
UK traditions, where hyphenation is based largely on etymology
rather than spelling or punctuation.

> Smart, I didn't think of that. Hope implementors do. There's no obvious
> reason for a special treatment of no-break spaces or soft hyphens in FO, is
> it?

One assumes they will (xmltex implementation certainly does) It doesn't
need special mention in the FO spec as FO files can contain any
character data, the meaning of those characters are specified by Unicode.
So to help with breaking you have non breaking spaces, zero width
spaces, hypen(2010), non-breaking hyphen(2011), soft hyphen (AD)

> But generally, I
> hope that some implementation of FO will be able to do the same work in the
> near future.

well clearly the xmltex implementation has access to (and uses) TeX's
hyphenation rules.

Some people have asked about Liang's algorithm being online.
His PhD thesis is not available I believe.
The algorithm is described in appendix H of the TeXBook, the source of
which is available at any TeX archive, eg

it might be a bit hard to read (it's TeX source), but it is just a
straight ascii file.

(You won't find the typeset version online as that would break
the distribution conditions; you are supposed to get the book:-)


This message has been checked for all known viruses by Star Internet delivered
through the MessageLabs Virus Control Centre. For further information visit

 XSL-List info and archive:

Current Thread