Subject: Re: [xsl] Looking for a cleaner way of auditing table cell data than this From: "Wendell Piez wapiez@xxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Fri, 10 Mar 2023 21:10:38 -0000 |
Hi, Just noting, this is crying out for Schematron, which takes care of the infrastructure and leaves the test. Indeed, thinking about how the Schematron looks may also lead to a nicer XSLT solution. Cheers, Wendell On Thu, Mar 9, 2023 at 7:38b/PM Bauman, Syd s.bauman@xxxxxxxxxxxxxxxx < xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > I can't construct the schema so that this is illegal while the earlier > examples are valid. At least I don't think I can. > > Why not? (Because you are not allowed to change the schema, or because > your schema language cannot express this constraint?) > > Canbt do this in the XML DTD language, of course, but it seems easy enough > in RELAX NG: > > start = element table { row+ } > row = element row { cell+ } > cell = element cell { text | ( block | bigBlock | littleBlock )* } > block = element block { text } > bigBlock = element bigBlock { ( text | block )* } > littleBlock = element littleBlock { empty } > > > ------------------------------ > > > I have a substantial library of XML documents which include a great number > of tables. As it happens the content model for table cells is promiscuous; > a table cell may contain "block" data: > > > > <td> > > <para>blah blah.</para> > > </td> > > > > even to the extent of nested tables: > > > > <td> > > <para>..</para> > > <table> > > <tb> > > .. > > </tb> > > </table> > > <td> > > > > or, in the case of very many simple tables, just simple text content: > > > > <td>Y</td> > > <td>N</td> > > > > I would like to identify cases where table cells have exploited the > promiscuous schema and mixed both text and block content, for example: > > > > <td>For example:<para>This is a bad table cell.</para></td> > > > > I can't construct the schema so that this is illegal while the earlier > examples are valid. At least I don't think I can. But I would like to > identify these cells (and correct them, but at the moment just reporting > them is sufficient). > > > > This is the XSL fragment I have come up with (using XSL 2), but I imagine > there is a much cleaner way of doing it and I might learn a useful > technique if I ask. > > > > <xsl:template name="mixed-cells"> > > <xsl:for-each select="//table"> > > <xsl:for-each select="descendant::td[child::text()[normalize-space() > != '']]"> > > <xsl:if test="count(*[self::para | self::note | self::cnote | > self::critical | self::headline | self::error | self::define | self::qanda > | self::inset | self::ihead | self::steps | self::list | self::ol | > self::inlist | self::syntax| self::fragment | self::table]) > 0"> > > <xsl:text>Table cell with mixed content: </xsl:text> > > <xsl:call-template name="get-source" /> > > <xsl:value-of select="$nl" /> > > <xsl:text> content=</xsl:text> > > <xsl:value-of select="normalize-space(.)" /> > > <xsl:value-of select="$nl" /> > > </xsl:if> > > </xsl:for-each> > > </xsl:for-each> > > </xsl:template> > > > > The normalize-space() in the third line is necessary because otherwise it > picks up newlines in a sequence of block children. > > The list of "block" elements in the fourth line above is incomplete, and > should probably be sourced from a variable rather than given as a literal > condition the way I have done it here. > > The get-source template outputs the input document name and current line > number, and $nl is what you would expect it to be. > > > > As it stands this template is going to report nested table cells multiple > times; there might be a clever fix for this but at the moment my focus is > on the best way to identify these troublesome cells in the first place. > > > XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list> > EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/174322> (by > email <>) > -- ...Wendell Piez... ...wendell -at- nist -dot- gov... ...wendellpiez.com... ...pellucidliterature.org... ...pausepress.org... ...github.com/wendellpiez... ...gitlab.coko.foundation/wendell...
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Looking for a cleaner way, Bauman, Syd s.bauman | Thread | Re: [xsl] Looking for a cleaner way, Chris Papademetrious |
Re: [xsl] Looking for a cleaner way, Bauman, Syd s.bauman | Date | Re: [xsl] Looking for a cleaner way, Chris Papademetrious |
Month |