[xsl] whitespace normalization around keyword-like phrases

Subject: [xsl] whitespace normalization around keyword-like phrases
From: "Imsieke, Gerrit, le-tex gerrit.imsieke@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 6 Mar 2017 14:10:38 -0000
Hi Folks,

Over the weekend Ibve been writing a whitespace normalization stylesheet that transforms input like
<emphasis>Nested<glossterm> phrases </glossterm>with whitespace </emphasis>
into
<emphasis>Nested <glossterm>phrases</glossterm> with whitespace</emphasis>


This is often useful when converting content that was created by word processing or DTP applications. Probably for typographic reasons, they tend to include a trailing space in the wrapping element (or rather, they suggest that you include trailing spaces when you select words). When converting such a styled phrase to a keyword, a glossary term, or another semantically significant element, this extra whitespace should be moved away from its original location and placed after the inline element.

The challenges have been:

b Dealing with nested elements.
b Not only dealing with whitespace on the right-hand side, but also on the left-hand side and on both sides of an inline element
b Also considering punctuation and space-like characters in addition to whitespace.
b Making sure that any trailing punctuation is not extracted from the footnote paragraph (and placed into the surrounding paragraph) if the footnote is wrapped in a styling phrase. DTP applications often put footnote markers b and with them the whole footnote b in styled phrases.
b Making it customizable for different vocabularies (it currently supports DocBook, TEI, and JATS).


The XSLT has some features that may be of general interest, in particular passing the relevant text nodes as tunneled parameters and the footnote scoping.

But read for yourselves: https://github.com/gimsieke/emphasis-normalize-space

Gerrit

Current Thread