Subject: Re: [xsl] Breaking paragraphs one linebreaks From: "Manuel Souto Pico terminolator@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Tue, 14 May 2019 00:46:58 -0000 |
Thanks, Terry. I get: Stylesheet compilation failed with 1 error(s): Error 1 at line 27:48 : xsl:result-document is disabled when extension functions are disabled https://xsltfiddle.liberty-development.net/ej9EGcD/10 Cheers, Manuel Terry Badger terry_badger@xxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> escreveu no dia sC!bado, 11/05/2019 C (s) 17:12: > Try this. It is easier for me to understand. > <?xml version="1.0"?> > <!-- terry badger 2019-05-11 use regex to separate types of text then > repackage in new collection order --> > <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" > version="2.0"> > <xsl:output encoding="utf-8" indent="yes"/> > <xsl:strip-space elements="*"/> > <!-- > ==========================================================================--> > > <!--variable with content regrouped into multiple parts for each seg > --> > <xsl:variable name="packaged"> > <xsl:element name="wrapper"> > <xsl:for-each select="//seg"> > <xsl:copy> > <xsl:attribute name="xml:lang" > select="parent::*/@xml:lang"/> > <xsl:analyze-string select="." regex="<br>"> > <xsl:non-matching-substring> > <xsl:element name="part"> > <xsl:copy-of select="."/> > </xsl:element> > </xsl:non-matching-substring> > </xsl:analyze-string> > </xsl:copy> > </xsl:for-each> > </xsl:element> > </xsl:variable> > <!-- > ==========================================================================--> > > <!-- start at root and output a result document to make it easier to > see --> > <xsl:template match="/"> > <xsl:result-document href="output.xml"> > <xsl:apply-templates/> > </xsl:result-document> > </xsl:template> > <!-- > ==========================================================================--> > > <xsl:template match="tmx | body | header"> > <xsl:copy> > <xsl:copy-of select="@*"/> > <xsl:apply-templates/> > </xsl:copy> > </xsl:template> > <!-- > ==========================================================================--> > > <xsl:template match="tu"> > <xsl:for-each select="$packaged/wrapper/seg[1]/part"> > <xsl:variable name="part-order" select="position()"/> > <xsl:element name="tu"> > <xsl:attribute name="tuid" select="position()"/> > <xsl:for-each select="$packaged/wrapper/seg"> > <xsl:element name="tuv"> > <xsl:attribute name="xml:lang" > select="@xml:lang"/> > <xsl:element name="seg"> > <xsl:value-of > select="normalize-space(part[position() = $part-order])"/> > </xsl:element> > </xsl:element> > </xsl:for-each> > </xsl:element> > </xsl:for-each> > </xsl:template> > </xsl:stylesheet> > > Terry > > > > > > > On bThursdayb, bMayb b9b, b2019b b04b:b16b:b36b bPMb bEDT, Martin Honnen > martin.honnen@xxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > > > > > Am 09.05.2019 um 21:55 schrieb Martin Honnen martin.honnen@xxxxxx: > > Am 09.05.2019 um 21:42 schrieb Manuel Souto Pico terminolator@xxxxxxxxx: > >> > >> > >> @Martin, your example works really well. I had to edit the expression, > >> as in my real files sometimes they have used lists instead of > >> linebreaks: > >> > >> <xsl:param name="lb" > >> as="xs:string"></?(li|ul|br)\s*/?></xsl:param> > >> > >> However, I can see what I would also need to split at the end of > >> sentences when there's no escaped tag but just final punctuation. To > >> avoid the transformation eating the punctuation, I have tried with a > >> lookbehind assertion but it seems it's not supported: > >> > >> <xsl:param name="lb" > >> as="xs:string">(?<=[.!?])\s|</?(li|ul|br)\s*/?></xsl:param> > >> > >> Any ideas? > >> > > > > In general, if there is markup, it might be better to try to parse it, > > in your initial sample you seemed to have simple HTML empty element > > syntax with <br> elements, now with the adapted regular expression it > > seems you expect opening and closing tags. > > > > If you know the escaped markup is an XML fragment then I would try to > > parse it with the "parse-xml-fragment" function, if it is HTML, then I > > would look into using David Carlisle's HTML parser implementation done > > in pure XSLT 2 or use an extension function like the commercial editions > > of Saxon offer. > > > > After parsing, you can then apply normal templates or grouping > > constructs. > > > An adaption of the previous suggestion, but now with escaped XML syntax > in the sample input, to then use parse-xml-fragment, is at > > https://xsltfiddle.liberty-development.net/ej9EGcD/5 > > and does > > <xsl:template match="tu"> > <xsl:variable name="split"> > <xsl:apply-templates mode="split"/> > </xsl:variable> > <xsl:for-each-group select="$split/tuv/seg" group-by="position() > mod count($split/tuv[1]/seg)"> > <tu tuid="{position()}"> > <xsl:apply-templates select="current-group()/snapshot()/.."/> > </tu> > </xsl:for-each-group> > </xsl:template> > > <xsl:mode name="split" on-no-match="shallow-copy"/> > > <xsl:template match="seg" expand-text="yes" mode="split"> > <xsl:for-each-group select="parse-xml-fragment(.)/node()" > group-ending-with="br"> > <seg>{.}</seg> > </xsl:for-each-group> > </xsl:template> > > For HTML parsing you would need to use an extension or David Carlisle's > HTML parser available on Github, but the approach then is the same. Of > course handling different elements like various list constructs needs > more code but once you have a tree you can process the "normal" XSLT way > you can write more templates and/or more modes for various processing > steps to address more complex input structures.
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Breaking paragraphs one l, Terry Badger terry_b | Thread | Re: [xsl] Breaking paragraphs one l, Martin Honnen martin |
Re: [xsl] What is the most efficien, Mukul Gandhi gandhi. | Date | Re: [xsl] Breaking paragraphs one l, Martin Honnen martin |
Month |