Subject: Re: [xsl] Looking for a cleaner way of auditing table cell data than this From: "Trevor Nicholls trevor@xxxxxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Mon, 29 Aug 2022 18:09:01 -0000 |
Thanks Chris, that looks like something I can work with; I hadn't thought of functions. You say your stylesheet was intended to modify <li> elements; funnily enough that was something I asked them to do to the schema earlier in the project and now simple content list items are not allowed. But tables are a bigger ask, apparently. I am sure I can use something adapted from your code to fix the input before my other stylesheets have to work with it. cheers T From: Chris Papademetrious christopher.papademetrious@xxxxxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Sent: Tuesday, 30 August 2022 03:13 To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx Subject: Re: [xsl] Looking for a cleaner way of auditing table cell data than this Hi Trevor, I have an existing stylesheet for DITA source that wraps plaintext in <li> elements in <p>; perhaps you can adjust it to your needs: <!-- define what elements are considered inline elements --> <xsl:function name="mine:is-inline" as="xs:boolean"> <xsl:param name="node" as="node()"/> <xsl:sequence select="exists($node[self::text()[mine:is-not-whitespace(.)] or self::cite or self::codeph or self::command or self::default or self::emphasis or self::filename or self::fn or self::foreign or self::image[not(@placement='break')] or self::imagemap or self::infotip or self::keyword or self::mathml or self::menucascade or self::message or self::ph or self::required-cleanup or self::sub or self::sup or self::term or self::text or self::uicontrol or self::unknown or self::user-defined or self::user-input or self::variable or self::xref])"/> </xsl:function> <!-- define what elements do not allow inline elements --> <xsl:function name="mine:disallow-inline" as="xs:boolean"> <xsl:param name="node" as="element()"/> <xsl:sequence select="exists($node[self::arguments-section or self::command-group-section or self::datatypes-section or self::description-section or self::description-section or self::description-subsection or self::example-section or self::example-subsection or self::explain or self::glossdef or self::gui-section or self::instruct or self::li or self::library-section or self::license-section or self::result or self::returns-section or self::short-description or self::shortcut-section or self::step or self::syntax-default or self::syntax-section or self::usageerrors-section or self::whatnext-section])"/> </xsl:function> <!-- wrap disallowed inline elements in <p> --> <xsl:template match="*[mine:disallow-inline(.)][node()[mine:is-inline(.)]]"> <xsl:variable name="indent"> <xsl:for-each select="1 to count(ancestor-or-self::*)-1"><xsl:text> </xsl:text></xsl:for-each> </xsl:variable> <xsl:variable name="results"> <xsl:next-match/> <!-- apply other templates first, just in case --> </xsl:variable> <xsl:variable name="grouped-contents"> <xsl:for-each-group select="$results/*/node()" group-adjacent="mine:is-inline(.) or self::text() or self::indexterm or self::draft-comment"> <!-- prefer to include whitespace text() and other elements in inline groups --> <xsl:choose> <xsl:when test="current-grouping-key() and not(exists(current-group()[not(self::text()[mine:is-whitespace(.)])]))"> <!-- skip whitespace-only text between block elements --> </xsl:when> <xsl:when test="current-grouping-key() and not(exists(current-group()[mine:is-inline(.)]))"> <xsl:text>
 </xsl:text> <xsl:value-of select="$indent"/> <xsl:copy-of select="current-group()[self::*]"/> <!-- if nothing *strictly* requires <p>, pass through as-is --> </xsl:when> <xsl:when test="current-grouping-key()"> <xsl:text>
 </xsl:text> <xsl:value-of select="$indent"/> <p><xsl:copy-of select="current-group()"/></p> <!-- wrap inline content in <p> --> </xsl:when> <xsl:otherwise> <xsl:for-each select="current-group()"> <xsl:text>
 </xsl:text> <xsl:value-of select="$indent"/> <xsl:copy-of select="."/> </xsl:for-each> </xsl:otherwise> </xsl:choose> </xsl:for-each-group> </xsl:variable> <xsl:copy select="$results/*"> <xsl:apply-templates select="$results/*/@*"/> <xsl:apply-templates select="$grouped-contents/node()"/> <!-- apply templates again (mostly for whitespace trimming on new <p> elements) --> <xsl:text>
</xsl:text> <xsl:value-of select="$indent"/> </xsl:copy> </xsl:template> The first function defines what are considered to be inline elements. The second function defines what elements do not permit inline elements. I should probably rewrite these to use templates with function accessors at some point. The template matches elements that disallow inline elements but contain them; and wraps inline content in <p>. It attempts to add XML indenting, which is something you might or might not want to keep. This template is part of a larger stylesheet that performs all source of formatting and structural updates, so feel free to keep or remove anything you want. I'm sure I have some embarrassing XSLT code constructs in there, but hopefully this is close enough to help you make progress. * Chris From: Trevor Nicholls trevor@xxxxxxxxxxxxxxxxxx <mailto:trevor@xxxxxxxxxxxxxxxxxx> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx <mailto:xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> > Sent: Monday, August 29, 2022 10:37 AM To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx Subject: [xsl] Looking for a cleaner way of auditing table cell data than this Hi I have a substantial library of XML documents which include a great number of tables. As it happens the content model for table cells is promiscuous; a table cell may contain "block" data: <td> <para>blah blah.</para> </td> even to the extent of nested tables: <td> <para>..</para> <table> <tb> .. </tb> </table> <td> or, in the case of very many simple tables, just simple text content: <td>Y</td> <td>N</td> I would like to identify cases where table cells have exploited the promiscuous schema and mixed both text and block content, for example: <td>For example:<para>This is a bad table cell.</para></td> I can't construct the schema so that this is illegal while the earlier examples are valid. At least I don't think I can. But I would like to identify these cells (and correct them, but at the moment just reporting them is sufficient). This is the XSL fragment I have come up with (using XSL 2), but I imagine there is a much cleaner way of doing it and I might learn a useful technique if I ask. <xsl:template name="mixed-cells"> <xsl:for-each select="//table"> <xsl:for-each select="descendant::td[child::text()[normalize-space() != '']]"> <xsl:if test="count(*[self::para | self::note | self::cnote | self::critical | self::headline | self::error | self::define | self::qanda | self::inset | self::ihead | self::steps | self::list | self::ol | self::inlist | self::syntax| self::fragment | self::table]) > 0"> <xsl:text>Table cell with mixed content: </xsl:text> <xsl:call-template name="get-source" /> <xsl:value-of select="$nl" /> <xsl:text> content=</xsl:text> <xsl:value-of select="normalize-space(.)" /> <xsl:value-of select="$nl" /> </xsl:if> </xsl:for-each> </xsl:for-each> </xsl:template> The normalize-space() in the third line is necessary because otherwise it picks up newlines in a sequence of block children. The list of "block" elements in the fourth line above is incomplete, and should probably be sourced from a variable rather than given as a literal condition the way I have done it here. The get-source template outputs the input document name and current line number, and $nl is what you would expect it to be. As it stands this template is going to report nested table cells multiple times; there might be a clever fix for this but at the moment my focus is on the best way to identify these troublesome cells in the first place. cheers T XSL-List info and archive <https://urldefense.com/v3/__http:/www.mulberrytech.com/xsl/xsl-list__;!!A4F 2R9G_pg!fZGDpfsWuNykDhdwNDcdE1Ysg9iCiLDCEpsMLiF088Q-ByEPbDjfF63x-kAwwq9XxMZr 85NUgcn66tY1E32ihTChtzzBoZ2e9RGESFeWXOWE2wI757bX$> <https://urldefense.com/v3/__http:/lists.mulberrytech.com/unsub/xsl-list/338 0743__;!!A4F2R9G_pg!fZGDpfsWuNykDhdwNDcdE1Ysg9iCiLDCEpsMLiF088Q-ByEPbDjfF63x -kAwwq9XxMZr85NUgcn66tY1E32ihTChtzzBoZ2e9RGESFeWXOWE20wKlpZi$> EasyUnsubscribe (by email) XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list> <http://lists.mulberrytech.com/unsub/xsl-list/1349719> EasyUnsubscribe ( <> by email)
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Looking for a cleaner way, Chris Papademetrious | Thread | Re: [xsl] Looking for a cleaner way, Chris Papademetrious |
Re: [xsl] Looking for a cleaner way, Chris Papademetrious | Date | Re: [xsl] Looking for a cleaner way, Chris Papademetrious |
Month |