[xsl] Looking for a cleaner way of auditing table cell data than this

Subject: [xsl] Looking for a cleaner way of auditing table cell data than this
From: "Trevor Nicholls trevor@xxxxxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 29 Aug 2022 14:36:49 -0000
Hi

 

I have a substantial library of XML documents which include a great number
of tables. As it happens the content model for table cells is promiscuous; a
table cell may contain "block" data:

 

<td>

  <para>blah blah.</para>

</td>

 

even to the extent of nested tables:

 

<td>

  <para>..</para>

  <table>

    <tb>

      ..

    </tb>

  </table>

<td>

 

or, in the case of very many simple tables, just simple text content:

 

<td>Y</td>

<td>N</td>

 

I would like to identify cases where table cells have exploited the
promiscuous schema and mixed both text and block content, for example:

 

<td>For example:<para>This is a bad table cell.</para></td>

 

I can't construct the schema so that this is illegal while the earlier
examples are valid. At least I don't think I can. But I would like to
identify these cells (and correct them, but at the moment just reporting
them is sufficient).

 

This is the XSL fragment I have come up with (using XSL 2), but I imagine
there is a much cleaner way of doing it and I might learn a useful technique
if I ask.

 

<xsl:template name="mixed-cells">

  <xsl:for-each select="//table">

    <xsl:for-each select="descendant::td[child::text()[normalize-space() !=
'']]">

      <xsl:if test="count(*[self::para | self::note | self::cnote |
self::critical | self::headline | self::error | self::define | self::qanda |
self::inset | self::ihead | self::steps | self::list | self::ol |
self::inlist | self::syntax| self::fragment | self::table]) &gt; 0">

        <xsl:text>Table cell with mixed content: </xsl:text>

        <xsl:call-template name="get-source" />

        <xsl:value-of select="$nl" />

        <xsl:text> content=</xsl:text>

        <xsl:value-of select="normalize-space(.)" />

        <xsl:value-of select="$nl" />

      </xsl:if>

    </xsl:for-each>

  </xsl:for-each>

</xsl:template>

 

The normalize-space() in the third line is necessary because otherwise it
picks up newlines in a sequence of block children.

The list of "block" elements in the fourth line above is incomplete, and
should probably be sourced from a variable rather than given as a literal
condition the way I have done it here.

The get-source template outputs the input document name and current line
number, and $nl is what you would expect it to be.

 

As it stands this template is going to report nested table cells multiple
times; there might be a clever fix for this but at the moment my focus is on
the best way to identify these troublesome cells in the first place.

 

cheers

T

Current Thread