Re: [xsl] BIDI problem in XSL-FO

Subject: Re: [xsl] BIDI problem in XSL-FO
From: "Geert Bormans geert@xxxxxxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 29 Apr 2016 21:48:25 -0000
oh, I forgot this one
I you are tempted to add control characters in your data... don't
(https://www.w3.org/International/questions/qa-bidi-unicode-controls#basedire
ction)


----- Oorspronkelijk bericht -----
Van: "Michael MC<ller-Hillebrand mmh@xxxxxxxxx"
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Aan: "xsl-list@xxxxxxxxxxxxxxxxxxxxxx" <XSL-List@xxxxxxxxxxxxxxxxxxxxxx>
Verzonden: Vrijdag 29 april 2016 20:05:07
Onderwerp: [xsl] BIDI problem in XSL-FO

Dear experts,

The processing done by an FO formatter for right-to-left (RTL) languages is
nearly magic, considering what happens if you just set

writing-mode="rl-tb"

I really enjoy my first project with Arabic text. Interestingly the problem at
hand are English words. In the glossary of an RTL document I suddenly have a
full paragraph full of latin characters:

<fo:block>Brand name (Former name)</fo:block>

This is visually rendered like this:

(Brand name (Former name

I have looked at

* Unicode BIDI Processing <http://www.w3.org/TR/xsl/#d0e4879>
* Unicode BIDI algorithm <http://www.unicode.org/reports/tr9/>

I now understand that there are strong and weak characters. The sequence of
strong Latin characters with embedded 'weak' spacing and punctuation is
rendered LTR, the closing 'weak' parenthesis is treated as RTL, because this
is the default orientation of the paragraph.

My first idea is to add <fo:bidi-override direction="ltr"> to each block or
maybe only each text node that consist of solely non-Arabic characters. I
guess this could be done using a regular expression like

not(matches($text, '\p{Arabic}'))

Do you have any other recommendations or best practices?

Thanks,

- Michael

Current Thread