Subject: Re: [xsl] BIDI problem in XSL-FO|
From: "Tony Graham tgraham@xxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 3 May 2016 12:20:32 -0000
oh, I forgot this one I you are tempted to add control characters in your data... don't (https://www.w3.org/International/questions/qa-bidi-unicode-controls#basedirection)
Yes, and no. (Why should anything about the Unicode Bidi Algorithm ever be simple?)
The 'embedding controls' that set direction for a run of text really shouldn't be mixed with something, such as XML, that the Unicode Standard considers a 'higher-level protocol' and that also has a mechanism for setting text direction. In "Unicode in XML and other Markup Languages" , Section 3, "Characters not Suitable for use With Markup" includes Section 3.3, "Bidi Embedding Controls (LRE, RLE, LRO, RLO, PDF), U+202A..U+202E" . However, the 'implicit directional controls' (U+200E LEFT-TO-RIGHT MARK and U+200F RIGHT-TO-LEFT MARK) are listed in Section 4, "4. Format Characters Suitable for Use with Markup" .
As Michael notes below, some characters, such as Latin letters, have a 'strong' directionality, and some have a 'weak' or 'neutral' directionality. The closing ')' is a 'neutral', and because it's at the end of the string, it takes the 'embedding direction' , which is RTL in Michael's example. You can see this with the bidi utility at http://www.unicode.org/cldr/utility/bidi.jsp?a=Brand+name+%28Former+name%E2%80%8E%29&p=RTL
The 'implicit directional controls' are zero-width characters that have strong directionality but that 'should be ignored for other text processes, such as sorting and searching' and 'They are intended to be used to resolve cases of ambiguous directionality in the context of bidirectional texts; they are not paired.'  UAX #9, Unicode Bidirectional Algorithm, introduces them and then ignores them 'because their effect on bidirectional ordering is exactly the same as a corresponding strong directional character; the only difference is that they do not appear in the display.' So, if you put ‎ after the ')', then it is between two strong left-to-right characters (even though you can't see one of them), and it will display to the right of the character to its left. See FO below.
...----- Oorspronkelijk bericht ----- Van: "Michael MC<ller-Hillebrand mmh@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
<fo:block>Brand name (Former name)</fo:block>
This is visually rendered like this:
(Brand name (Former name
I have looked at
* Unicode BIDI Processing <http://www.w3.org/TR/xsl/#d0e4879> * Unicode BIDI algorithm <http://www.unicode.org/reports/tr9/>
I now understand that there are strong and weak characters. The
sequence of strong Latin characters with embedded 'weak' spacing and punctuation is rendered LTR, the closing 'weak' parenthesis is treated as RTL, because this is the default orientation of the paragraph.
My first idea is to add <fo:bidi-override direction="ltr"> to each
block or maybe only each text node that consist of solely non-Arabic characters. I guess this could be done using a regular expression like
-- Senior Architect XML Division Antenna House, Inc. ---- Skerries, Ireland tgraham@xxxxxxxxxxxxx
<?xml version="1.0" encoding="UTF-8"?> <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:axf="http://www.antennahouse.com/names/XSL/Extensions" xml:lang="ab" writing-mode="rl-tb"> <fo:layout-master-set> <fo:simple-page-master master-name="a"> <fo:region-body/> </fo:simple-page-master> </fo:layout-master-set> <fo:page-sequence master-reference="a"> <fo:flow flow-name="xsl-region-body"> <fo:block>Brand name (Former name)</fo:block> <fo:block direction="ltr">Brand name (Former name)</fo:block> <fo:block writing-mode="lr">Brand name (Former name)</fo:block> <fo:block>Brand name (Former name)‎</fo:block> </fo:flow> </fo:page-sequence> </fo:root>