Re: [xsl] Breaking paragraphs one linebreaks

Subject: Re: [xsl] Breaking paragraphs one linebreaks
From: "Manuel Souto Pico terminolator@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 14 May 2019 00:46:58 -0000
Thanks, Terry.

I get: Stylesheet compilation failed with 1 error(s):
Error 1 at line 27:48 : xsl:result-document is disabled when extension
functions are disabled

https://xsltfiddle.liberty-development.net/ej9EGcD/10

Cheers, Manuel


Terry Badger terry_badger@xxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
escreveu no dia sC!bado, 11/05/2019 C (s) 17:12:

> Try this. It is easier for me to understand.
> <?xml version="1.0"?>
> <!-- terry badger 2019-05-11 use regex to separate types of text then
> repackage in new collection order -->
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
> version="2.0">
>     <xsl:output encoding="utf-8" indent="yes"/>
>     <xsl:strip-space elements="*"/>
>     <!--
>
==========================================================================-->
>
>     <!--variable with content regrouped into multiple parts for each seg
> -->
>     <xsl:variable name="packaged">
>         <xsl:element name="wrapper">
>             <xsl:for-each select="//seg">
>                 <xsl:copy>
>                     <xsl:attribute name="xml:lang"
> select="parent::*/@xml:lang"/>
>                     <xsl:analyze-string select="." regex="&lt;br&gt;">
>                         <xsl:non-matching-substring>
>                             <xsl:element name="part">
>                                 <xsl:copy-of select="."/>
>                             </xsl:element>
>                         </xsl:non-matching-substring>
>                     </xsl:analyze-string>
>                 </xsl:copy>
>             </xsl:for-each>
>         </xsl:element>
>     </xsl:variable>
>     <!--
>
==========================================================================-->
>
>     <!-- start at root and output a result document to make it easier to
> see -->
>     <xsl:template match="/">
>         <xsl:result-document href="output.xml">
>             <xsl:apply-templates/>
>         </xsl:result-document>
>     </xsl:template>
>     <!--
>
==========================================================================-->
>
>     <xsl:template match="tmx | body | header">
>         <xsl:copy>
>             <xsl:copy-of select="@*"/>
>             <xsl:apply-templates/>
>         </xsl:copy>
>     </xsl:template>
>     <!--
>
==========================================================================-->
>
>     <xsl:template match="tu">
>         <xsl:for-each select="$packaged/wrapper/seg[1]/part">
>             <xsl:variable name="part-order" select="position()"/>
>             <xsl:element name="tu">
>                 <xsl:attribute name="tuid" select="position()"/>
>                 <xsl:for-each select="$packaged/wrapper/seg">
>                     <xsl:element name="tuv">
>                         <xsl:attribute name="xml:lang"
> select="@xml:lang"/>
>                         <xsl:element name="seg">
>                             <xsl:value-of
> select="normalize-space(part[position() = $part-order])"/>
>                         </xsl:element>
>                     </xsl:element>
>                 </xsl:for-each>
>             </xsl:element>
>         </xsl:for-each>
>     </xsl:template>
> </xsl:stylesheet>
>
> Terry
>
>
>
>
>
>
> On bThursdayb, bMayb b9b, b2019b b04b:b16b:b36b
bPMb bEDT, Martin Honnen
> martin.honnen@xxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
>
>
>
>
> Am 09.05.2019 um 21:55 schrieb Martin Honnen martin.honnen@xxxxxx:
> > Am 09.05.2019 um 21:42 schrieb Manuel Souto Pico terminolator@xxxxxxxxx:
> >>
> >>
> >> @Martin, your example works really well. I had to edit the expression,
> >> as in my real files sometimes they have used lists instead of
> >> linebreaks:
> >>
> >> <xsl:param name="lb"
> >> as="xs:string">&lt;/?(li|ul|br)\s*/?&gt;</xsl:param>
> >>
> >> However, I can see what I would also need to split at the end of
> >> sentences when there's no escaped tag but just final punctuation. To
> >> avoid the transformation eating the punctuation, I have tried with a
> >> lookbehind assertion but it seems it's not supported:
> >>
> >> <xsl:param name="lb"
> >> as="xs:string">(?<=[.!?])\s|&lt;/?(li|ul|br)\s*/?&gt;</xsl:param>
> >>
> >> Any ideas?
> >>
> >
> > In general, if there is markup, it might be better to try to parse it,
> > in your initial sample you seemed to have simple HTML empty element
> > syntax with <br> elements, now with the adapted regular expression it
> > seems you expect opening and closing tags.
> >
> > If you know the escaped markup is an XML fragment then I would try to
> > parse it with the "parse-xml-fragment" function, if it is HTML, then I
> > would look into using David Carlisle's HTML parser implementation done
> > in pure XSLT 2 or use an extension function like the commercial editions
> > of Saxon offer.
> >
> > After parsing, you can then apply normal templates or grouping
> > constructs.
> >
> An adaption of the previous suggestion, but now with escaped XML syntax
> in the sample input, to then use parse-xml-fragment, is at
>
> https://xsltfiddle.liberty-development.net/ej9EGcD/5
>
> and does
>
>   <xsl:template match="tu">
>       <xsl:variable name="split">
>           <xsl:apply-templates mode="split"/>
>       </xsl:variable>
>       <xsl:for-each-group select="$split/tuv/seg" group-by="position()
> mod count($split/tuv[1]/seg)">
>           <tu tuid="{position()}">
>               <xsl:apply-templates select="current-group()/snapshot()/.."/>
>           </tu>
>       </xsl:for-each-group>
>   </xsl:template>
>
>   <xsl:mode name="split" on-no-match="shallow-copy"/>
>
>   <xsl:template match="seg" expand-text="yes" mode="split">
>       <xsl:for-each-group select="parse-xml-fragment(.)/node()"
> group-ending-with="br">
>           <seg>{.}</seg>
>       </xsl:for-each-group>
>   </xsl:template>
>
> For HTML parsing you would need to use an extension or David Carlisle's
> HTML parser available on Github, but the approach then is the same. Of
> course handling different elements like various list constructs needs
> more code but once you have a tree you can process the "normal" XSLT way
> you can write more templates and/or more modes for various processing
> steps to address more complex input structures.

Current Thread