Re: [xsl] Looking for a cleaner way of auditing table cell data than this

Subject: Re: [xsl] Looking for a cleaner way of auditing table cell data than this
From: "Wendell Piez wapiez@xxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 10 Mar 2023 21:10:38 -0000
 Hi,

Just noting, this is crying out for Schematron, which takes care of the
infrastructure and leaves the test.

Indeed, thinking about how the Schematron looks may also lead to a nicer
XSLT solution.

Cheers, Wendell

On Thu, Mar 9, 2023 at 7:38b/PM Bauman, Syd s.bauman@xxxxxxxxxxxxxxxx <
xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

> > I can't construct the schema so that this is illegal while the earlier
> examples are valid. At least I don't think I can.
>
> Why not? (Because you are not allowed to change the schema, or because
> your schema language cannot express this constraint?)
>
> Canbt do this in the XML DTD language, of course, but it seems easy
enough
> in RELAX NG:
>
>  start = element table { row+ }
>  row = element row { cell+ }
>  cell = element cell { text | ( block | bigBlock | littleBlock )* }
>  block = element block { text }
>  bigBlock = element bigBlock { ( text | block )* }
>  littleBlock = element littleBlock { empty }
>
>
> ------------------------------
>
>
> I have a substantial library of XML documents which include a great number
> of tables. As it happens the content model for table cells is promiscuous;
> a table cell may contain "block" data:
>
>
>
> <td>
>
>   <para>blah blah.</para>
>
> </td>
>
>
>
> even to the extent of nested tables:
>
>
>
> <td>
>
>   <para>..</para>
>
>   <table>
>
>     <tb>
>
>       ..
>
>     </tb>
>
>   </table>
>
> <td>
>
>
>
> or, in the case of very many simple tables, just simple text content:
>
>
>
> <td>Y</td>
>
> <td>N</td>
>
>
>
> I would like to identify cases where table cells have exploited the
> promiscuous schema and mixed both text and block content, for example:
>
>
>
> <td>For example:<para>This is a bad table cell.</para></td>
>
>
>
> I can't construct the schema so that this is illegal while the earlier
> examples are valid. At least I don't think I can. But I would like to
> identify these cells (and correct them, but at the moment just reporting
> them is sufficient).
>
>
>
> This is the XSL fragment I have come up with (using XSL 2), but I imagine
> there is a much cleaner way of doing it and I might learn a useful
> technique if I ask.
>
>
>
> <xsl:template name="mixed-cells">
>
>   <xsl:for-each select="//table">
>
>     <xsl:for-each select="descendant::td[child::text()[normalize-space()
> != '']]">
>
>       <xsl:if test="count(*[self::para | self::note | self::cnote |
> self::critical | self::headline | self::error | self::define | self::qanda
> | self::inset | self::ihead | self::steps | self::list | self::ol |
> self::inlist | self::syntax| self::fragment | self::table]) &gt; 0">
>
>         <xsl:text>Table cell with mixed content: </xsl:text>
>
>         <xsl:call-template name="get-source" />
>
>         <xsl:value-of select="$nl" />
>
>         <xsl:text> content=</xsl:text>
>
>         <xsl:value-of select="normalize-space(.)" />
>
>         <xsl:value-of select="$nl" />
>
>       </xsl:if>
>
>     </xsl:for-each>
>
>   </xsl:for-each>
>
> </xsl:template>
>
>
>
> The normalize-space() in the third line is necessary because otherwise it
> picks up newlines in a sequence of block children.
>
> The list of "block" elements in the fourth line above is incomplete, and
> should probably be sourced from a variable rather than given as a literal
> condition the way I have done it here.
>
> The get-source template outputs the input document name and current line
> number, and $nl is what you would expect it to be.
>
>
>
> As it stands this template is going to report nested table cells multiple
> times; there might be a clever fix for this but at the moment my focus is
> on the best way to identify these troublesome cells in the first place.
>
>
> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
> EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/174322> (by
> email <>)
>


--
...Wendell Piez... ...wendell -at- nist -dot- gov...
...wendellpiez.com... ...pellucidliterature.org... ...pausepress.org...
...github.com/wendellpiez... ...gitlab.coko.foundation/wendell...

Current Thread