Subject: Re: [xsl] Seeking a smarter tokenize for augmented text From: "Trevor Nicholls trevor@xxxxxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Fri, 7 May 2021 09:41:15 -0000 |
I have made some progress on this, not to a working point yet but I'm more confident than I was, so thanks to all for the suggestions which have been helpful. I also found some hints in a stackoverflow answer of Martin Honnen's which reinforced the advice to work on this by adding a line marker element and using grouping. The original statement of the requirement was a bit vague, and the content model currently in use is a bit too flexible. So I think I can stipulate that inline elements will not run across line breaks (and if they do I should be able to run a pre-fix which splits them), nor will the content include any nested inline elements. At the moment I'm assuming that in the step where I insert line marker elements, I also have to use modal templates to insert inline element markers, then run another pass to restore the inline elements. Something like this, correct? <xsl:variable name="brokenlines"> <xsl:element name="textlines"> <xsl:element name="linemarker"/> <xsl:analyze-string select="." regex="(\r\n?|\n\r?)"> <xsl:matching-substring> <xsl:element name="linemarker"/> </xsl:matching-substring> <xsl:non-matching-substring> <xsl:apply-templates mode="break"/> </xsl:non-matching-substring> </xsl:analyze-string> <xsl:element> </xsl:variable> <xsl:variable name="textlines"> <xsl:call-template name="rebuild"> <xsl:with-param name="lines" select="$brokenlines"/> </xsl:call-template> <xsl:variable> <-- $textlines/textlines is now the original textlines with line children --> ... <xsl:template match="textlines/*/text()" mode="break"> <xsl:value-of select="concat('[[{', name(..), '}', ., ']]')" /> </xsl:template> <xsl:template name="rebuild"> <xsl:param name="lines" as="document-node()" /> <xsl:element name="textlines"> <xsl:for-each select="$lines/textlines"> <xsl:for-each-group select="node()" group-starting-with="linemarker"> <xsl:element name="line"> <xsl:apply-templates select="current-group()[not(self::linemarker)]" mode="rebuild" /> </xsl:element> </xsl:for-each-group> </xsl:element> </xsl:template> <xsl:template match="text()" mode="rebuild"> <xsl:analyze-string select="." regex="something matching [[{name}content]]"> <xsl:matching-substring> <xsl:element name="the name in the regex"> the content in the regex </xsl:element> </xsl:matching-substring> <xsl:non-matching-substring> <xsl-value-of select="." /> </xsl:non-matching-substring> </xsl:analyze-string> </xsl:template> Am I going along the right lines? I'd prefer to be set straight sooner rather than later! Cheers T -----Original Message----- From: Michael MC<ller-Hillebrand mmh@xxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Sent: Friday, 7 May 2021 20:26 To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx Subject: Re: [xsl] Seeking a smarter tokenize for augmented text Hi,
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Seeking a smarter tokeniz, Geert Bormans geert@ | Thread | Re: [xsl] Seeking a smarter tokeniz, Geert Bormans geert@ |
Re: [xsl] Seeking a smarter tokeniz, Geert Bormans geert@ | Date | Re: [xsl] Seeking a smarter tokeniz, Martin Honnen martin |
Month |