Re: [xsl] BIDI problem in XSL-FO

Subject: Re: [xsl] BIDI problem in XSL-FO
From: "Michael Müller-Hillebrand mmh@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sun, 1 May 2016 19:39:38 -0000
Hi Geert and Ken,

Thanks a lot for the reminder to look for some context.

We are using Antennahouse, so we get a huge amount of correct solutions out of
the box. As we are doing automated publishing, there is no good way to add
markup later, just for publishing reasons.

But, we happen to have an element <nt> available which is used to tag
non-translateable content. In my shortened example

<fo:block>Brand name (Former name)</fo:block>

this element was used to tag both brand names in the source, similar to this:

<p><nt>Brand name</nt> (<nt>Former name</nt>)</p>

If I would now use <fo:bidi-override direction="ltr"> for all those <nt>, i.e.
excluding the parentheses, I get it rendered like this:

(Former name) Brand name

This - as far as I am concerned - makes a lot of sense, as the general reading
direction is from right to left, and this way the less important information
in parentheses comes 'after' the main information. It is fun to know, that
both parentheses are now mirrored glyphs.

I have to wait for some feedback from my customerbs proofreaders.

Thanks for being able to discuss this.

- Michael


> Am 29.04.2016 um 23:42 schrieb Geert Bormans geert@xxxxxxxxxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>:
>
> Hi Michael,
>
> It is late on a Friday here, so I ll keep my post very brief
> (an excuse for not exactly brief but rather unstructured :-)
> Over the past couple of months I have been tackling quiet a few issues
similar to what you describe.
> The visual rendering "(Brand name (Former name" is often the correct
behaviour, but I learned from Arabic proof readers it is not pleasing anyway
>
> The brackets around English text are one infamous known issue
> another one I had issues with is registered trademarks appearing randomly
before or after an english word.
> And keeping 1-25 as page number in the footer for chapter-page numbering
instead of 25-1 has been hard too
>
> The algorythms for switching rtl and ltr are complex and although there is
very good support in some FO processors,
> behaviour is not always predictable
>
> We learned there is an advantage in creating inner context to tune the
algorythms our way
> our documents are DITA, so I pulled the DITA files out of the CMS and added
a <term> element context around english text in arabic
> but ONLY when there are potential issues (cases are rather isolated so if
there are no brackets eg. just leave the english text as it is in order to not
make mistakes)
> We had cases like this
> "A arrow B"
> no arabic characters in there
> in arabic the result needed to show
> "B flipped arrow A"
> That will not happen correctly if you create your bidi override to large
(your suggested regex would break this example)
>
> from learning the hard way: advice no 1: be conservative in creating bidi
overrides because most often the FO processor does the right thing
>
> I am revising my regular expressions over the next week or two because of
toolset version changes
>
> From my experience you are doing the right thing using bidi override (happy
to learn otherwise from this thread)
> I am confident that depending on the tools you use AND the proofreaders
(opinions differ) that you should experiment your own best matching regex
>
> So advice no 2: test different versions of your toolset and test them well.
> When I first started working on arabic manuals with lots of english terms in
them, Antenna House was my best option and did a very very good job already.
> For fixing the issues left in the manuals we added a bidi override context
(<term> element)
> In the 6.3 release and the 6.3 Maintenance Release 1, Antenna House largely
improved the handling of bidi overrides.
> We soon realised that Antenna House did fix some of our issues for us, so we
are in the process of undoing some of our context fixes
>
> Hope this helps at least a little
> Depending on the popularity of this topic, happy to discuss more details of
findings on this forum or outside of it
>
> Best regards
>
> Geert
>
> ----- Oorspronkelijk bericht -----
> Van: "Michael MC<ller-Hillebrand mmh@xxxxxxxxx"
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
> Aan: "xsl-list@xxxxxxxxxxxxxxxxxxxxxx" <XSL-List@xxxxxxxxxxxxxxxxxxxxxx>
> Verzonden: Vrijdag 29 april 2016 20:05:07
> Onderwerp: [xsl] BIDI problem in XSL-FO
>
> Dear experts,
>
> The processing done by an FO formatter for right-to-left (RTL) languages is
nearly magic, considering what happens if you just set
>
> writing-mode="rl-tb"
>
> I really enjoy my first project with Arabic text. Interestingly the problem
at hand are English words. In the glossary of an RTL document I suddenly have
a full paragraph full of latin characters:
>
> <fo:block>Brand name (Former name)</fo:block>
>
> This is visually rendered like this:
>
> (Brand name (Former name
>
> I have looked at
>
> * Unicode BIDI Processing <http://www.w3.org/TR/xsl/#d0e4879>
> * Unicode BIDI algorithm <http://www.unicode.org/reports/tr9/>
>
> I now understand that there are strong and weak characters. The sequence of
strong Latin characters with embedded 'weak' spacing and punctuation is
rendered LTR, the closing 'weak' parenthesis is treated as RTL, because this
is the default orientation of the paragraph.
>
> My first idea is to add <fo:bidi-override direction="ltr"> to each block or
maybe only each text node that consist of solely non-Arabic characters. I
guess this could be done using a regular expression like
>
> not(matches($text, '\p{Arabic}'))
>
> Do you have any other recommendations or best practices?
>
> Thanks,
>
> - Michael

[demime 1.01d removed an attachment of type application/pgp-signature which had a name of signature.asc]

Current Thread