Re: [xsl] BIDI problem in XSL-FO

Subject: Re: [xsl] BIDI problem in XSL-FO
From: "Eliot Kimber ekimber@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 3 May 2016 15:58:52 -0000
As it happens I just implemented some code to generate text-level analysis
based on configured character ranges.

The generated template looks like this:

<xsl:template match="text()" mode="epub:textToCharSet-ja_jp">
      <xsl:param name="doDebug" as="xs:boolean" tunnel="yes"
select="false()"/>
      <!-- Handle language ja_jp-->
      <xsl:if test="$doDebug">
         <xsl:message>+ [DEBUG] epub:textToCharSet-ja_jp:
text="<xsl:value-of select="."/>"</xsl:message>
      </xsl:if>
      <xsl:analyze-string select="." regex="([c-o>]+)">
         <xsl:matching-substring>
            <xsl:sequence select="."/>
         </xsl:matching-substring>
         <xsl:non-matching-substring>
            <span class="non-native-text">
               <xsl:sequence select="."/>
            </span>
         </xsl:non-matching-substring>
      </xsl:analyze-string>
   </xsl:template>

In this case I'm identifying text *not* in the national language in
question but the same approach can be applied to other business logic of
course.

In an earlier version of this code I had multiple groups in the regular
expression and used a choice group to determine which group had matched by
checking each group to see if it was empty and using the one that was not.

Cheers,

Eliot


----
Eliot Kimber, Owner
Contrext, LLC
http://contrext.com




On 5/3/16, 10:42 AM, "Michael MC<ller-Hillebrand mmh@xxxxxxxxx"
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

>Hi Tony,
>
>Wow, what an interesting tool this is:
>http://www.unicode.org/cldr/utility/bidi.jsp
>
>Unfortunately, in my case the parentheses are likely to be just regular
>text and I have no direct way of knowing whether they surround Arabic or
>Western text (other than trying to find some all-purpose magic XPath
>analyzing basically every text() node). But the content inside the
>parentheses is tagged as non-translateable and I can take advantage of
>that.
>
><p>ARABIC <nt>Brand name</nt> (<nt>Former name</nt>) TEXT.</p>
>
>By playing around with the tool (and without proper understanding of the
>rules) I find some options that would make the parentheses correct, but
>the preceding or following Arabic text will be ordered in the wrong way.
>
>I have the impression that direction control characters in this situation
>do not as well as <fo:bidi-override> would work. Unfortunately I have not
>heard back, whether the presentation as
>
>.TXET (Former name) Brand name CIBARA
>
>is accepted by the client.
>
>- Michael
>
>BTW: I hope this is still on topic enough. That's why I mentioned XPath.
>
>
>> Am 03.05.2016 um 14:21 schrieb Tony Graham tgraham@xxxxxxxxxxxxx
>><xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>:
>>
>> tldr: Put &#x200E; after the ')'.
>
>> As Michael notes below, some characters, such as Latin letters, have a
>> 'strong' directionality, and some have a 'weak' or 'neutral'
>> directionality. The closing ')' is a 'neutral', and because it's at the
>> end of the string, it takes the 'embedding direction' [5], which is RTL
>> in Michael's example. You can see this with the bidi utility at
>>
>>http://www.unicode.org/cldr/utility/bidi.jsp?a=Brand+name+%28Former+name%
>>E2%80%8E%29&p=RTL

Current Thread