Re: [xsl] Breaking paragraphs one linebreaks

Subject: Re: [xsl] Breaking paragraphs one linebreaks
From: "Imsieke, Gerrit, le-tex gerrit.imsieke@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 9 May 2019 14:01:04 -0000
Hi Manuel,

You can use XSLT. It will be easier if

a) you can use at least XSLT 2.0 and

b) the text nodes with the escaped breaks are immediately below the <seg> elements, without any other highlighting etc. elements around them.

Are these two conditions satisfied?

Gerrit

On 09.05.2019 15:44, Manuel Souto Pico terminolator@xxxxxxxxx wrote:
Dear all,

I have a bilingual TMX file containing many tu elements like this, containing full paragraphs:

<?xml version="1.0" encoding="UTF-8"?>
<tmx version="1.4">
B B <header segtype="paragraph" adminlang="en"/>
B B <body>
B B B B B <tu tuid="1">
B B B B B B B B <tuv xml:lang="es">
B B B B B B B B B B B <seg>El PSOE ganarC-a en 10 de las 12 comunidades donde habrC! elecciones autonC3micas el 26 de mayo, segC:n el C:ltimo barC3metro del CIS. &lt;br&gt;Las excepciones serC-an Cantabria, donde el PRC, el partido de Miguel Cngel Revilla, serC-a primera fuerza. &lt;br&gt;&lt;br&gt;Navarra Suma, la coaliciC3n de PP, Ciudadanos y UPN, serC-a primera fuerza en la comunidad foral.</seg>
B B B B B B B B </tuv>
B B B B B B B B <tuv xml:lang="uz">
B B B B B B B B B B B <seg>PSOE, MDHning eng so'nggi barometri bo'yicha 26 mayda bo'lib o'tadigan mintaqaviy saylovlarda 12 ta jamoaning 10tasida g'olib chiqadi.&lt;br&gt;Istisnolarga ko'ra, Cantabria, XXR, Migel Anxel Revilla partiyasi birinchi kuch bo'ladi.&lt;br&gt;&lt;br&gt;"Navarra Suma", PP, Cuudadanos va UPN koalitsiyasi mintaqaviy hamjamiyatning birinchi kuchi bo'ladi.</seg>
B B B B B B B B </tuv>
B B B B B </tu>
B B </body>
</tmx>


As you can see there are a few (escaped) line break tags between sentences.

I would like to transform that into something like this, where every tu element contains only sentences:

<?xml version="1.0" encoding="UTF-8"?>
<tmx version="1.4">
B B <header segtype="paragraph" adminlang="en"/>
B B <body>
B B B B B <tu tuid="1">
B B B B B B B B <tuv xml:lang="es">
<seg>El PSOE ganarC-a en 10 de las 12 comunidades donde habrC! elecciones autonC3micas el 26 de mayo, segC:n el C:ltimo barC3metro del CIS.</seg>
B B B B B B B B </tuv>
B B B B B B B B <tuv xml:lang="uz">
<seg>PSOE, MDHning eng so'nggi barometri bo'yicha 26 mayda bo'lib o'tadigan mintaqaviy saylovlarda 12 ta jamoaning 10tasida g'olib chiqadi.</seg>
B B B B B B B B </tuv>
B B B B B </tu>
B B B B B <tu tuid="2">
B B B B B B B B <tuv xml:lang="es">
<seg>Las excepciones serC-an Cantabria, donde el PRC, el partido de Miguel Cngel Revilla, serC-a primera fuerza. </seg>
B B B B B B B B </tuv>
B B B B B B B B <tuv xml:lang="uz">
<seg>Istisnolarga ko'ra, Cantabria, XXR, Migel Anxel Revilla partiyasi birinchi kuch bo'ladi.</seg>
B B B B B B B B </tuv>
B B B B B </tu>
B B B B B <tu tuid="3">
B B B B B B B B <tuv xml:lang="es">
<seg>Navarra Suma, la coaliciC3n de PP, Ciudadanos y UPN, serC-a primera fuerza en la comunidad foral.</seg>
B B B B B B B B </tuv>
B B B B B B B B <tuv xml:lang="uz">
<seg>"Navarra Suma", PP, Cuudadanos va UPN koalitsiyasi mintaqaviy hamjamiyatning birinchi kuchi bo'ladi.</seg>
B B B B B B B B </tuv>
B B B B B </tu>
B B </body>
</tmx>


Do you think I can use XSLT to do this more or less easily?

I wrote a few XSLT stylesheets years ago but I'm far from being a savvy user.

Thanks in advance for any tips.

Cheers, Manuel
XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/225679> (by email <>)

-- Gerrit Imsieke GeschC$ftsfC<hrer / Managing Director le-tex publishing services GmbH Weissenfelser Str. 84, 04229 Leipzig, Germany Phone +49 341 355356 110, Fax +49 341 355356 510 gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

GeschC$ftsfC<hrer / Managing Directors:
Gerrit Imsieke, Svea Jelonek, Thomas Schmidt

Current Thread