Subject: Re: [xsl] Hyphenation in XSL FO From: David Carlisle <davidc@xxxxxxxxx> Date: Thu, 11 Jan 2001 13:13:20 GMT |
> I wouldn't say English is the hardest. In Swedish, there are many compound > words, like "sidlayout" (page layout) or "dokumenthantering" (document > management). In these two examples, there are only one natural place for a > hyphen. Ah yes languages with compound words, they are always fun:-) TeX actually doesn't cope with them too well, you have to rely on the patterns being able to spot such words. However the TeX algorithm could cope, several people have proposed modifying things so the algorithm is run with two passes, the first with a set of patterns designed to allow breaks at compound word breaks, the second (if the language allows it) to allow less desirable hyphens obtained by hyphenating each of the constituent parts. (I'd report what hypheantion TeX gives for your examples, but strangely enough I don't have the swedish patterns loaded into my teX setup) The reason why I said English was hard is that unlike some other languages, it is almost all exceptions to rules, especially using UK traditions, where hyphenation is based largely on etymology rather than spelling or punctuation. > Smart, I didn't think of that. Hope implementors do. There's no obvious > reason for a special treatment of no-break spaces or soft hyphens in FO, is > it? One assumes they will (xmltex implementation certainly does) It doesn't need special mention in the FO spec as FO files can contain any character data, the meaning of those characters are specified by Unicode. So to help with breaking you have non breaking spaces, zero width spaces, hypen(2010), non-breaking hyphen(2011), soft hyphen (AD) etc. > But generally, I > hope that some implementation of FO will be able to do the same work in the > near future. well clearly the xmltex implementation has access to (and uses) TeX's hyphenation rules. Some people have asked about Liang's algorithm being online. His PhD thesis is not available I believe. The algorithm is described in appendix H of the TeXBook, the source of which is available at any TeX archive, eg ftp://ftp.tex.ac.uk/tex-archive/systems/knuth/tex/texbook.tex it might be a bit hard to read (it's TeX source), but it is just a straight ascii file. (You won't find the typeset version online as that would break the distribution conditions; you are supposed to get the book:-) David _____________________________________________________________________ This message has been checked for all known viruses by Star Internet delivered through the MessageLabs Virus Control Centre. For further information visit http://www.star.net.uk/stats.asp XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Hyphenation in XSL FO, Gustaf Liljegren | Thread | Re: [xsl] Hyphenation in XSL FO, Michel Goossens |
Re: [xsl] Testing by counting or po, Jeni Tennison | Date | Re: [xsl] Testing by counting or po, Jeni Tennison |
Month |