Subject: Re: [xsl] BIDI problem in XSL-FO From: "Michael Müller-Hillebrand mmh@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Sun, 1 May 2016 19:39:38 -0000 |
Hi Geert and Ken, Thanks a lot for the reminder to look for some context. We are using Antennahouse, so we get a huge amount of correct solutions out of the box. As we are doing automated publishing, there is no good way to add markup later, just for publishing reasons. But, we happen to have an element <nt> available which is used to tag non-translateable content. In my shortened example <fo:block>Brand name (Former name)</fo:block> this element was used to tag both brand names in the source, similar to this: <p><nt>Brand name</nt> (<nt>Former name</nt>)</p> If I would now use <fo:bidi-override direction="ltr"> for all those <nt>, i.e. excluding the parentheses, I get it rendered like this: (Former name) Brand name This - as far as I am concerned - makes a lot of sense, as the general reading direction is from right to left, and this way the less important information in parentheses comes 'after' the main information. It is fun to know, that both parentheses are now mirrored glyphs. I have to wait for some feedback from my customerbs proofreaders. Thanks for being able to discuss this. - Michael > Am 29.04.2016 um 23:42 schrieb Geert Bormans geert@xxxxxxxxxxxxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>: > > Hi Michael, > > It is late on a Friday here, so I ll keep my post very brief > (an excuse for not exactly brief but rather unstructured :-) > Over the past couple of months I have been tackling quiet a few issues similar to what you describe. > The visual rendering "(Brand name (Former name" is often the correct behaviour, but I learned from Arabic proof readers it is not pleasing anyway > > The brackets around English text are one infamous known issue > another one I had issues with is registered trademarks appearing randomly before or after an english word. > And keeping 1-25 as page number in the footer for chapter-page numbering instead of 25-1 has been hard too > > The algorythms for switching rtl and ltr are complex and although there is very good support in some FO processors, > behaviour is not always predictable > > We learned there is an advantage in creating inner context to tune the algorythms our way > our documents are DITA, so I pulled the DITA files out of the CMS and added a <term> element context around english text in arabic > but ONLY when there are potential issues (cases are rather isolated so if there are no brackets eg. just leave the english text as it is in order to not make mistakes) > We had cases like this > "A arrow B" > no arabic characters in there > in arabic the result needed to show > "B flipped arrow A" > That will not happen correctly if you create your bidi override to large (your suggested regex would break this example) > > from learning the hard way: advice no 1: be conservative in creating bidi overrides because most often the FO processor does the right thing > > I am revising my regular expressions over the next week or two because of toolset version changes > > From my experience you are doing the right thing using bidi override (happy to learn otherwise from this thread) > I am confident that depending on the tools you use AND the proofreaders (opinions differ) that you should experiment your own best matching regex > > So advice no 2: test different versions of your toolset and test them well. > When I first started working on arabic manuals with lots of english terms in them, Antenna House was my best option and did a very very good job already. > For fixing the issues left in the manuals we added a bidi override context (<term> element) > In the 6.3 release and the 6.3 Maintenance Release 1, Antenna House largely improved the handling of bidi overrides. > We soon realised that Antenna House did fix some of our issues for us, so we are in the process of undoing some of our context fixes > > Hope this helps at least a little > Depending on the popularity of this topic, happy to discuss more details of findings on this forum or outside of it > > Best regards > > Geert > > ----- Oorspronkelijk bericht ----- > Van: "Michael MC<ller-Hillebrand mmh@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> > Aan: "xsl-list@xxxxxxxxxxxxxxxxxxxxxx" <XSL-List@xxxxxxxxxxxxxxxxxxxxxx> > Verzonden: Vrijdag 29 april 2016 20:05:07 > Onderwerp: [xsl] BIDI problem in XSL-FO > > Dear experts, > > The processing done by an FO formatter for right-to-left (RTL) languages is nearly magic, considering what happens if you just set > > writing-mode="rl-tb" > > I really enjoy my first project with Arabic text. Interestingly the problem at hand are English words. In the glossary of an RTL document I suddenly have a full paragraph full of latin characters: > > <fo:block>Brand name (Former name)</fo:block> > > This is visually rendered like this: > > (Brand name (Former name > > I have looked at > > * Unicode BIDI Processing <http://www.w3.org/TR/xsl/#d0e4879> > * Unicode BIDI algorithm <http://www.unicode.org/reports/tr9/> > > I now understand that there are strong and weak characters. The sequence of strong Latin characters with embedded 'weak' spacing and punctuation is rendered LTR, the closing 'weak' parenthesis is treated as RTL, because this is the default orientation of the paragraph. > > My first idea is to add <fo:bidi-override direction="ltr"> to each block or maybe only each text node that consist of solely non-Arabic characters. I guess this could be done using a regular expression like > > not(matches($text, '\p{Arabic}')) > > Do you have any other recommendations or best practices? > > Thanks, > > - Michael [demime 1.01d removed an attachment of type application/pgp-signature which had a name of signature.asc]
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Thread | Re: [xsl] BIDI problem in XSL-FO, Tony Graham tgraham@ | |
Date | [xsl] XSL-List Guidelines, B Tommie Usdin btusd | |
Month |