Subject: [xsl] Looking for a cleaner way of auditing table cell data than this From: "Trevor Nicholls trevor@xxxxxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Mon, 29 Aug 2022 14:36:49 -0000 |
Hi I have a substantial library of XML documents which include a great number of tables. As it happens the content model for table cells is promiscuous; a table cell may contain "block" data: <td> <para>blah blah.</para> </td> even to the extent of nested tables: <td> <para>..</para> <table> <tb> .. </tb> </table> <td> or, in the case of very many simple tables, just simple text content: <td>Y</td> <td>N</td> I would like to identify cases where table cells have exploited the promiscuous schema and mixed both text and block content, for example: <td>For example:<para>This is a bad table cell.</para></td> I can't construct the schema so that this is illegal while the earlier examples are valid. At least I don't think I can. But I would like to identify these cells (and correct them, but at the moment just reporting them is sufficient). This is the XSL fragment I have come up with (using XSL 2), but I imagine there is a much cleaner way of doing it and I might learn a useful technique if I ask. <xsl:template name="mixed-cells"> <xsl:for-each select="//table"> <xsl:for-each select="descendant::td[child::text()[normalize-space() != '']]"> <xsl:if test="count(*[self::para | self::note | self::cnote | self::critical | self::headline | self::error | self::define | self::qanda | self::inset | self::ihead | self::steps | self::list | self::ol | self::inlist | self::syntax| self::fragment | self::table]) > 0"> <xsl:text>Table cell with mixed content: </xsl:text> <xsl:call-template name="get-source" /> <xsl:value-of select="$nl" /> <xsl:text> content=</xsl:text> <xsl:value-of select="normalize-space(.)" /> <xsl:value-of select="$nl" /> </xsl:if> </xsl:for-each> </xsl:for-each> </xsl:template> The normalize-space() in the third line is necessary because otherwise it picks up newlines in a sequence of block children. The list of "block" elements in the fourth line above is incomplete, and should probably be sourced from a variable rather than given as a literal condition the way I have done it here. The get-source template outputs the input document name and current line number, and $nl is what you would expect it to be. As it stands this template is going to report nested table cells multiple times; there might be a clever fix for this but at the moment my focus is on the best way to identify these troublesome cells in the first place. cheers T
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] emulating sprintf in XSLT, Chris Papademetrious | Thread | Re: [xsl] Looking for a cleaner way, Chris Papademetrious |
Re: [xsl] emulating sprintf in XSLT, Liam R. E. Quin liam | Date | Re: [xsl] Looking for a cleaner way, Chris Papademetrious |
Month |