Re: [xsl] BIDI problem in XSL-FO

Subject: Re: [xsl] BIDI problem in XSL-FO
From: "Tony Graham tgraham@xxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 3 May 2016 12:20:32 -0000
tldr: Put &#x200E; after the ')'.

On 29/04/2016 22:48, Geert Bormans geert@xxxxxxxxxxxxxxxxxxx wrote:
oh, I forgot this one
I you are tempted to add control characters in your data... don't
(https://www.w3.org/International/questions/qa-bidi-unicode-controls#basedirection)

Yes, and no. (Why should anything about the Unicode Bidi Algorithm ever be simple?)

The 'embedding controls' that set direction for a run of text really
shouldn't be mixed with something, such as XML, that the Unicode
Standard considers a 'higher-level protocol' and that also has a
mechanism for setting text direction. In "Unicode in XML and other
Markup Languages" [1], Section 3, "Characters not Suitable for use With
Markup" includes Section 3.3, "Bidi Embedding Controls (LRE, RLE, LRO,
RLO, PDF), U+202A..U+202E" [2]. However, the 'implicit directional
controls' (U+200E LEFT-TO-RIGHT MARK and U+200F RIGHT-TO-LEFT MARK) are
listed in Section 4, "4. Format Characters Suitable for Use with Markup"
[3].

As Michael notes below, some characters, such as Latin letters, have a
'strong' directionality, and some have a 'weak' or 'neutral'
directionality. The closing ')' is a 'neutral', and because it's at the
end of the string, it takes the 'embedding direction' [5], which is RTL
in Michael's example. You can see this with the bidi utility at
http://www.unicode.org/cldr/utility/bidi.jsp?a=Brand+name+%28Former+name%E2%80%8E%29&p=RTL

The 'implicit directional controls' are zero-width characters that have
strong directionality but that 'should be ignored for other text
processes, such as sorting and searching' and 'They are intended to be
used to resolve cases of ambiguous directionality in the context of
bidirectional texts; they are not paired.' [6] UAX #9, Unicode
Bidirectional Algorithm, introduces them and then ignores them 'because
their effect on bidirectional ordering is exactly the same as a
corresponding strong directional character; the only difference is that
they do not appear in the display.' So, if you put &#x200E; after the
')', then it is between two strong left-to-right characters (even though
you can't see one of them), and it will display to the right of the
character to its left. See FO below.

----- Oorspronkelijk bericht -----
Van: "Michael MC<ller-Hillebrand mmh@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
...
<fo:block>Brand name (Former name)</fo:block>

This is visually rendered like this:

(Brand name (Former name

I have looked at

* Unicode BIDI Processing <http://www.w3.org/TR/xsl/#d0e4879> *
Unicode BIDI algorithm <http://www.unicode.org/reports/tr9/>

I now understand that there are strong and weak characters. The
sequence of strong Latin characters with embedded 'weak' spacing and
punctuation is rendered LTR, the closing 'weak' parenthesis is treated
as RTL, because this is the default orientation of the paragraph.

My first idea is to add <fo:bidi-override direction="ltr"> to each
block or maybe only each text node that consist of solely non-Arabic
characters. I guess this could be done using a regular expression like

That will 'un-mirror' the ')' but not change its position. See FO below.

Regards,


Tony Graham.


--
Senior Architect
XML Division
Antenna House, Inc.
----
Skerries, Ireland
tgraham@xxxxxxxxxxxxx

[1] A joint W3C Note and a Unicode Technical Report that has been
withdrawn on the Unicode side (http://www.unicode.org/reports/tr20/)
because of 'complications of joint publication' but that hasn't been
republished since becoming disjoint.
[2] https://www.w3.org/TR/unicode-xml/#Bidi
[3] https://www.w3.org/TR/unicode-xml/#Format
[4] http://www.unicode.org/reports/tr9/tr9-33.html#Implicit_Directional_Marks
[5] http://unicode.org/reports/tr9/#N2
[6] Unicode 8.0, Section 23.2, "Layout Controls", page 820
http://www.unicode.org/versions/Unicode8.0.0/ch23.pdf




<?xml version="1.0" encoding="UTF-8"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format";
    xmlns:axf="http://www.antennahouse.com/names/XSL/Extensions";
    xml:lang="ab" writing-mode="rl-tb">
    <fo:layout-master-set>
        <fo:simple-page-master master-name="a">
            <fo:region-body/>
        </fo:simple-page-master>
    </fo:layout-master-set>
    <fo:page-sequence master-reference="a">
        <fo:flow flow-name="xsl-region-body">
            <fo:block>Brand name (Former name)</fo:block>
            <fo:block direction="ltr">Brand name (Former name)</fo:block>
            <fo:block writing-mode="lr">Brand name (Former name)</fo:block>
            <fo:block>Brand name (Former name)&#x200E;</fo:block>
        </fo:flow>
    </fo:page-sequence>
</fo:root>

Current Thread