Re: [xsl] Looking for a cleaner way of auditing table cell data than this

Subject: Re: [xsl] Looking for a cleaner way of auditing table cell data than this
From: "Bauman, Syd s.bauman@xxxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 10 Mar 2023 00:38:19 -0000
> I can't construct the schema so that this is illegal while the earlier
examples are valid. At least I don't think I can.

Why not? (Because you are not allowed to change the schema, or because your
schema language cannot express this constraint?)

Cant do this in the XML DTD language, of course, but it seems easy enough in
RELAX NG:

 start = element table { row+ }
 row = element row { cell+ }
 cell = element cell { text | ( block | bigBlock | littleBlock )* }
 block = element block { text }
 bigBlock = element bigBlock { ( text | block )* }
 littleBlock = element littleBlock { empty }


________________________________


I have a substantial library of XML documents which include a great number of
tables. As it happens the content model for table cells is promiscuous; a
table cell may contain "block" data:



<td>

  <para>blah blah.</para>

</td>



even to the extent of nested tables:



<td>

  <para>..</para>

  <table>

    <tb>

      ..

    </tb>

  </table>

<td>



or, in the case of very many simple tables, just simple text content:



<td>Y</td>

<td>N</td>



I would like to identify cases where table cells have exploited the
promiscuous schema and mixed both text and block content, for example:



<td>For example:<para>This is a bad table cell.</para></td>



I can't construct the schema so that this is illegal while the earlier
examples are valid. At least I don't think I can. But I would like to identify
these cells (and correct them, but at the moment just reporting them is
sufficient).



This is the XSL fragment I have come up with (using XSL 2), but I imagine
there is a much cleaner way of doing it and I might learn a useful technique
if I ask.



<xsl:template name="mixed-cells">

  <xsl:for-each select="//table">

    <xsl:for-each select="descendant::td[child::text()[normalize-space() !=
'']]">

      <xsl:if test="count(*[self::para | self::note | self::cnote |
self::critical | self::headline | self::error | self::define | self::qanda |
self::inset | self::ihead | self::steps | self::list | self::ol | self::inlist
| self::syntax| self::fragment | self::table]) &gt; 0">

        <xsl:text>Table cell with mixed content: </xsl:text>

        <xsl:call-template name="get-source" />

        <xsl:value-of select="$nl" />

        <xsl:text> content=</xsl:text>

        <xsl:value-of select="normalize-space(.)" />

        <xsl:value-of select="$nl" />

      </xsl:if>

    </xsl:for-each>

  </xsl:for-each>

</xsl:template>



The normalize-space() in the third line is necessary because otherwise it
picks up newlines in a sequence of block children.

The list of "block" elements in the fourth line above is incomplete, and
should probably be sourced from a variable rather than given as a literal
condition the way I have done it here.

The get-source template outputs the input document name and current line
number, and $nl is what you would expect it to be.



As it stands this template is going to report nested table cells multiple
times; there might be a clever fix for this but at the moment my focus is on
the best way to identify these troublesome cells in the first place.

Current Thread