Re: [xsl] Looking for a cleaner way of auditing table cell data than this

Subject: Re: [xsl] Looking for a cleaner way of auditing table cell data than this
From: "Trevor Nicholls trevor@xxxxxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 29 Aug 2022 18:09:01 -0000
Thanks Chris, that looks like something I can work with; I hadn't thought of
functions.

 

You say your stylesheet was intended to modify <li> elements; funnily enough
that was something I asked them to do to the schema earlier in the project
and now simple content list items are not allowed. But tables are a bigger
ask, apparently. I am sure I can use something adapted from your code to fix
the input before my other stylesheets have to work with it.

 

cheers

T

 

From: Chris Papademetrious christopher.papademetrious@xxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> 
Sent: Tuesday, 30 August 2022 03:13
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [xsl] Looking for a cleaner way of auditing table cell data
than this

 

Hi Trevor,

 

I have an existing stylesheet for DITA source that wraps plaintext in <li>
elements in <p>; perhaps you can adjust it to your needs:

 

 

<!-- define what elements are considered inline elements -->

  <xsl:function name="mine:is-inline" as="xs:boolean">

    <xsl:param name="node" as="node()"/>

    <xsl:sequence
select="exists($node[self::text()[mine:is-not-whitespace(.)] or self::cite
or self::codeph or self::command or self::default or self::emphasis or
self::filename or self::fn or self::foreign or
self::image[not(@placement='break')] or self::imagemap or self::infotip or
self::keyword or self::mathml or self::menucascade or self::message or
self::ph or self::required-cleanup or self::sub or self::sup or self::term
or self::text or self::uicontrol or self::unknown or self::user-defined or
self::user-input or self::variable or self::xref])"/>

  </xsl:function>

 

  <!-- define what elements do not allow inline elements -->

  <xsl:function name="mine:disallow-inline" as="xs:boolean">

    <xsl:param name="node" as="element()"/>

    <xsl:sequence select="exists($node[self::arguments-section or
self::command-group-section or self::datatypes-section or
self::description-section or self::description-section or
self::description-subsection or self::example-section or
self::example-subsection or self::explain or self::glossdef or
self::gui-section or self::instruct or self::li or self::library-section or
self::license-section or self::result or self::returns-section or
self::short-description or self::shortcut-section or self::step or
self::syntax-default or self::syntax-section or self::usageerrors-section or
self::whatnext-section])"/>

  </xsl:function>

 

  <!-- wrap disallowed inline elements in <p> -->

  <xsl:template
match="*[mine:disallow-inline(.)][node()[mine:is-inline(.)]]">

    <xsl:variable name="indent">

      <xsl:for-each select="1 to count(ancestor-or-self::*)-1"><xsl:text>
</xsl:text></xsl:for-each>

    </xsl:variable>

    <xsl:variable name="results">

      <xsl:next-match/>  <!-- apply other templates first, just in case -->

    </xsl:variable>

    <xsl:variable name="grouped-contents">

      <xsl:for-each-group select="$results/*/node()"
group-adjacent="mine:is-inline(.) or self::text() or self::indexterm or
self::draft-comment">  <!-- prefer to include whitespace text() and other
elements in inline groups -->

        <xsl:choose>

          <xsl:when test="current-grouping-key() and
not(exists(current-group()[not(self::text()[mine:is-whitespace(.)])]))">

            <!-- skip whitespace-only text between block elements -->

          </xsl:when>

         <xsl:when test="current-grouping-key() and
not(exists(current-group()[mine:is-inline(.)]))">

            <xsl:text>&#xa;  </xsl:text>

            <xsl:value-of select="$indent"/>

            <xsl:copy-of select="current-group()[self::*]"/>  <!-- if
nothing *strictly* requires <p>, pass through as-is -->

          </xsl:when>

          <xsl:when test="current-grouping-key()">

            <xsl:text>&#xa;  </xsl:text>

            <xsl:value-of select="$indent"/>

            <p><xsl:copy-of select="current-group()"/></p>  <!-- wrap inline
content in <p> -->

          </xsl:when>

          <xsl:otherwise>

            <xsl:for-each select="current-group()">

              <xsl:text>&#xa;  </xsl:text>

              <xsl:value-of select="$indent"/>

              <xsl:copy-of select="."/>

            </xsl:for-each>

          </xsl:otherwise>

        </xsl:choose>

      </xsl:for-each-group>

    </xsl:variable>

    <xsl:copy select="$results/*">

      <xsl:apply-templates select="$results/*/@*"/>

      <xsl:apply-templates select="$grouped-contents/node()"/>  <!-- apply
templates again (mostly for whitespace trimming on new <p> elements) -->

      <xsl:text>&#xa;</xsl:text>

      <xsl:value-of select="$indent"/>

    </xsl:copy>

  </xsl:template>

 

 

The first function defines what are considered to be inline elements. The
second function defines what elements do not permit inline elements. I
should probably rewrite these to use templates with function accessors at
some point.

 

The template matches elements that disallow inline elements but contain
them; and wraps inline content in <p>. It attempts to add XML indenting,
which is something you might or might not want to keep. This template is
part of a larger stylesheet that performs all source of formatting and
structural updates, so feel free to keep or remove anything you want.

 

I'm sure I have some embarrassing XSLT code constructs in there, but
hopefully this is close enough to help you make progress.

 

*	Chris 

 

 

From: Trevor Nicholls trevor@xxxxxxxxxxxxxxxxxx
<mailto:trevor@xxxxxxxxxxxxxxxxxx>  <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx
<mailto:xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> > 
Sent: Monday, August 29, 2022 10:37 AM
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: [xsl] Looking for a cleaner way of auditing table cell data than
this

 

Hi

 

I have a substantial library of XML documents which include a great number
of tables. As it happens the content model for table cells is promiscuous; a
table cell may contain "block" data:

 

<td>

  <para>blah blah.</para>

</td>

 

even to the extent of nested tables:

 

<td>

  <para>..</para>

  <table>

    <tb>

      ..

    </tb>

  </table>

<td>

 

or, in the case of very many simple tables, just simple text content:

 

<td>Y</td>

<td>N</td>

 

I would like to identify cases where table cells have exploited the
promiscuous schema and mixed both text and block content, for example:

 

<td>For example:<para>This is a bad table cell.</para></td>

 

I can't construct the schema so that this is illegal while the earlier
examples are valid. At least I don't think I can. But I would like to
identify these cells (and correct them, but at the moment just reporting
them is sufficient).

 

This is the XSL fragment I have come up with (using XSL 2), but I imagine
there is a much cleaner way of doing it and I might learn a useful technique
if I ask.

 

<xsl:template name="mixed-cells">

  <xsl:for-each select="//table">

    <xsl:for-each select="descendant::td[child::text()[normalize-space() !=
'']]">

      <xsl:if test="count(*[self::para | self::note | self::cnote |
self::critical | self::headline | self::error | self::define | self::qanda |
self::inset | self::ihead | self::steps | self::list | self::ol |
self::inlist | self::syntax| self::fragment | self::table]) &gt; 0">

        <xsl:text>Table cell with mixed content: </xsl:text>

        <xsl:call-template name="get-source" />

        <xsl:value-of select="$nl" />

        <xsl:text> content=</xsl:text>

        <xsl:value-of select="normalize-space(.)" />

        <xsl:value-of select="$nl" />

      </xsl:if>

    </xsl:for-each>

  </xsl:for-each>

</xsl:template>

 

The normalize-space() in the third line is necessary because otherwise it
picks up newlines in a sequence of block children.

The list of "block" elements in the fourth line above is incomplete, and
should probably be sourced from a variable rather than given as a literal
condition the way I have done it here.

The get-source template outputs the input document name and current line
number, and $nl is what you would expect it to be.

 

As it stands this template is going to report nested table cells multiple
times; there might be a clever fix for this but at the moment my focus is on
the best way to identify these troublesome cells in the first place.

 

cheers

T

XSL-List info and archive
<https://urldefense.com/v3/__http:/www.mulberrytech.com/xsl/xsl-list__;!!A4F
2R9G_pg!fZGDpfsWuNykDhdwNDcdE1Ysg9iCiLDCEpsMLiF088Q-ByEPbDjfF63x-kAwwq9XxMZr
85NUgcn66tY1E32ihTChtzzBoZ2e9RGESFeWXOWE2wI757bX$>  

 
<https://urldefense.com/v3/__http:/lists.mulberrytech.com/unsub/xsl-list/338
0743__;!!A4F2R9G_pg!fZGDpfsWuNykDhdwNDcdE1Ysg9iCiLDCEpsMLiF088Q-ByEPbDjfF63x
-kAwwq9XxMZr85NUgcn66tY1E32ihTChtzzBoZ2e9RGESFeWXOWE20wKlpZi$>
EasyUnsubscribe (by email) 

XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>  

 <http://lists.mulberrytech.com/unsub/xsl-list/1349719> EasyUnsubscribe (
<> by email) 

Current Thread