Subject: Re: [xsl] Breaking paragraphs one linebreaks From: "Eliot Kimber ekimber@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Thu, 9 May 2019 14:25:21 -0000 |
The DITA Community org.dita-community.i18n project provides general Saxon extensions for doing locale-aware word and line breaking. It requires either Saxon PE/EE or custom Java code to register the extension functions for use with HE (you can do with DITA Open Toolkit automatically starting with version 3.3.1). https://github.com/dita-community/org.dita-community.i18n Cheers, Eliot -- Eliot Kimber http://contrext.com o;?On 5/9/19, 9:01 AM, "Imsieke, Gerrit, le-tex gerrit.imsieke@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: Hi Manuel, You can use XSLT. It will be easier if a) you can use at least XSLT 2.0 and b) the text nodes with the escaped breaks are immediately below the <seg> elements, without any other highlighting etc. elements around them. Are these two conditions satisfied? Gerrit On 09.05.2019 15:44, Manuel Souto Pico terminolator@xxxxxxxxx wrote: > Dear all, > > I have a bilingual TMX file containing many tu elements like this, > containing full paragraphs: > > <?xml version="1.0" encoding="UTF-8"?> > <tmx version="1.4"> > <header segtype="paragraph" adminlang="en"/> > <body> > <tu tuid="1"> > <tuv xml:lang="es"> > <seg>El PSOE ganarC-a en 10 de las 12 comunidades donde > habrC! elecciones autonC3micas el 26 de mayo, segC:n el C:ltimo barC3metro > del CIS. <br>Las excepciones serC-an Cantabria, donde el PRC, el > partido de Miguel Cngel Revilla, serC-a primera fuerza. > <br><br>Navarra Suma, la coaliciC3n de PP, Ciudadanos y UPN, > serC-a primera fuerza en la comunidad foral.</seg> > </tuv> > <tuv xml:lang="uz"> > <seg>PSOE, MDHning eng so'nggi barometri bo'yicha 26 mayda > bo'lib o'tadigan mintaqaviy saylovlarda 12 ta jamoaning 10tasida g'olib > chiqadi.<br>Istisnolarga ko'ra, Cantabria, XXR, Migel Anxel > Revilla partiyasi birinchi kuch bo'ladi.<br><br>"Navarra > Suma", PP, Cuudadanos va UPN koalitsiyasi mintaqaviy hamjamiyatning > birinchi kuchi bo'ladi.</seg> > </tuv> > </tu> > </body> > </tmx> > > As you can see there are a few (escaped) line break tags between sentences. > > I would like to transform that into something like this, where every tu > element contains only sentences: > > <?xml version="1.0" encoding="UTF-8"?> > <tmx version="1.4"> > <header segtype="paragraph" adminlang="en"/> > <body> > <tu tuid="1"> > <tuv xml:lang="es"> > <seg>El PSOE ganarC-a en 10 de las 12 comunidades donde habrC! elecciones > autonC3micas el 26 de mayo, segC:n el C:ltimo barC3metro del CIS.</seg> > </tuv> > <tuv xml:lang="uz"> > <seg>PSOE, MDHning eng so'nggi barometri bo'yicha 26 mayda bo'lib > o'tadigan mintaqaviy saylovlarda 12 ta jamoaning 10tasida g'olib > chiqadi.</seg> > </tuv> > </tu> > <tu tuid="2"> > <tuv xml:lang="es"> > <seg>Las excepciones serC-an Cantabria, donde el PRC, el partido de > Miguel Cngel Revilla, serC-a primera fuerza. </seg> > </tuv> > <tuv xml:lang="uz"> > <seg>Istisnolarga ko'ra, Cantabria, XXR, Migel Anxel Revilla partiyasi > birinchi kuch bo'ladi.</seg> > </tuv> > </tu> > <tu tuid="3"> > <tuv xml:lang="es"> > <seg>Navarra Suma, la coaliciC3n de PP, Ciudadanos y UPN, serC-a primera > fuerza en la comunidad foral.</seg> > </tuv> > <tuv xml:lang="uz"> > <seg>"Navarra Suma", PP, Cuudadanos va UPN koalitsiyasi mintaqaviy > hamjamiyatning birinchi kuchi bo'ladi.</seg> > </tuv> > </tu> > </body> > </tmx> > > Do you think I can use XSLT to do this more or less easily? > > I wrote a few XSLT stylesheets years ago but I'm far from being a savvy > user. > > Thanks in advance for any tips. > > Cheers, Manuel > XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list> > EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/225679> > (by email <>) -- Gerrit Imsieke GeschC$ftsfC<hrer / Managing Director le-tex publishing services GmbH Weissenfelser Str. 84, 04229 Leipzig, Germany Phone +49 341 355356 110, Fax +49 341 355356 510 gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de Registergericht / Commercial Register: Amtsgericht Leipzig Registernummer / Registration Number: HRB 24930 GeschC$ftsfC<hrer / Managing Directors: Gerrit Imsieke, Svea Jelonek, Thomas Schmidt
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Breaking paragraphs one l, Imsieke, Gerrit, le- | Thread | Re: [xsl] Breaking paragraphs one l, Martin Honnen martin |
Re: [xsl] Breaking paragraphs one l, Imsieke, Gerrit, le- | Date | Re: [xsl] Breaking paragraphs one l, Martin Honnen martin |
Month |