|
Subject: Re: [xsl] Seeking a smarter tokenize for augmented text From: "Trevor Nicholls trevor@xxxxxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Fri, 7 May 2021 09:41:15 -0000 |
I have made some progress on this, not to a working point yet but I'm more
confident than I was, so thanks to all for the suggestions which have been
helpful. I also found some hints in a stackoverflow answer of Martin Honnen's
which reinforced the advice to work on this by adding a line marker element
and using grouping.
The original statement of the requirement was a bit vague, and the content
model currently in use is a bit too flexible. So I think I can stipulate that
inline elements will not run across line breaks (and if they do I should be
able to run a pre-fix which splits them), nor will the content include any
nested inline elements.
At the moment I'm assuming that in the step where I insert line marker
elements, I also have to use modal templates to insert inline element markers,
then run another pass to restore the inline elements. Something like this,
correct?
<xsl:variable name="brokenlines">
<xsl:element name="textlines">
<xsl:element name="linemarker"/>
<xsl:analyze-string select="." regex="(\r\n?|\n\r?)">
<xsl:matching-substring>
<xsl:element name="linemarker"/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:apply-templates mode="break"/>
</xsl:non-matching-substring>
</xsl:analyze-string>
<xsl:element>
</xsl:variable>
<xsl:variable name="textlines">
<xsl:call-template name="rebuild">
<xsl:with-param name="lines" select="$brokenlines"/>
</xsl:call-template>
<xsl:variable>
<-- $textlines/textlines is now the original textlines with line children
-->
...
<xsl:template match="textlines/*/text()" mode="break">
<xsl:value-of select="concat('[[{', name(..), '}', ., ']]')" />
</xsl:template>
<xsl:template name="rebuild">
<xsl:param name="lines" as="document-node()" />
<xsl:element name="textlines">
<xsl:for-each select="$lines/textlines">
<xsl:for-each-group select="node()" group-starting-with="linemarker">
<xsl:element name="line">
<xsl:apply-templates
select="current-group()[not(self::linemarker)]" mode="rebuild" />
</xsl:element>
</xsl:for-each-group>
</xsl:element>
</xsl:template>
<xsl:template match="text()" mode="rebuild">
<xsl:analyze-string select="." regex="something matching
[[{name}content]]">
<xsl:matching-substring>
<xsl:element name="the name in the regex">
the content in the regex
</xsl:element>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl-value-of select="." />
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
Am I going along the right lines? I'd prefer to be set straight sooner rather
than later!
Cheers
T
-----Original Message-----
From: Michael MC<ller-Hillebrand mmh@xxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Sent: Friday, 7 May 2021 20:26
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [xsl] Seeking a smarter tokenize for augmented text
Hi,
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| Re: [xsl] Seeking a smarter tokeniz, Geert Bormans geert@ | Thread | Re: [xsl] Seeking a smarter tokeniz, Geert Bormans geert@ |
| Re: [xsl] Seeking a smarter tokeniz, Geert Bormans geert@ | Date | Re: [xsl] Seeking a smarter tokeniz, Martin Honnen martin |
| Month |