Re: [xsl] Looking for a cleaner way of auditing table cell data than this

Subject: Re: [xsl] Looking for a cleaner way of auditing table cell data than this
From: "Chris Papademetrious christopher.papademetrious@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sat, 11 Mar 2023 13:21:33 -0000
Hi Trevor,

Since my August 2022 reply to you, Ibve been refining my code that deals
with this sort of thing.

I now define bouter-tag-typeb and binner-tag-typeb values for each
DITA element that specifies what type of element it expects at its exterior
and interior boundaries:

<xsl:variable name="outer-tag-type" as="map(xs:string, xs:string*)"
select="map {
  'entry': 'block',
  'image': '',
  'li': 'block',
  'note': 'block',
  'p': 'block',
  'ph': 'inline'
}"/>
<xsl:variable name="inner-tag-type" as="map(xs:string, xs:string*)"
select="map {
  'entry': '',
  'image': 'block',
  'li': 'block',
  'note': 'block',
  'p': 'inline',
  'ph': 'inline'
}"/>

Blank values indicate cases where mixed content is possible. I derive a first
cut from the output of
content_model.pl<https://github.com/chrispy-snps/DITA-plugin-utilities#conten
t_modelpl>, put it in a spreadsheet, then make manual adjustments. For
example, I change the inner-tag-type of <li> and <note> to bblockb because
I donbt want mixed block/inline content there. Then I convert the
spreadsheet to XSLT3 maps.

I define accessor functions to compute the outer/inner tag type for an
element, with logic to infer tag contexts in mixed-content scenarios (block
takes precedence):

<xsl:function name="mine:outer-tag-type" as="xs:string*">
  <xsl:param name="node" as="node()*"/>
  <xsl:choose>
    <xsl:when test="$node[self::text()]">
      <xsl:sequence select="''"/>
    </xsl:when>
    <xsl:when test="$node/@placement = 'break'">
      <xsl:sequence select="'block'"/>
    </xsl:when>
    <xsl:when test="$outer-tag-type(local-name($node))">
      <xsl:sequence select="$outer-tag-type(local-name($node))"/>
    </xsl:when>
    <xsl:when test="$node/../*/$outer-tag-type(local-name(.)) = 'block'">
      <xsl:sequence select="'block'"/>
    </xsl:when>
    <xsl:when test="$node/../*/$outer-tag-type(local-name(.)) = 'inline'">
      <xsl:sequence select="'inline'"/>
    </xsl:when>
    <xsl:when test="$node/../text()[matches(., '\S')]">
      <xsl:sequence select="'inline'"/>
    </xsl:when>
   <xsl:when test="$node/*[$outer-tag-type(local-name(.)) = 'block']">
      <xsl:sequence select="'block'"/>
    </xsl:when>
    <xsl:when test="$node/*[$outer-tag-type(local-name(.)) = 'inline']">
      <xsl:sequence select="'inline'"/>
    </xsl:when>
 </xsl:choose>
</xsl:function>

<xsl:function name="mine:inner-tag-type" as="xs:string*">
  <xsl:param name="node" as="node()*"/>
  <xsl:choose>
    <xsl:when test="$inner-tag-type(local-name($node))">
      <xsl:sequence select="$inner-tag-type(local-name($node))"/>
    </xsl:when>
    <xsl:when test="$node/*/$outer-tag-type(local-name(.)) = 'block'">
      <xsl:sequence select="'block'"/>
    </xsl:when>
    <xsl:when test="$node/*/$outer-tag-type(local-name(.)) = 'inline'">
      <xsl:sequence select="'inline'"/>
    </xsl:when>
    <xsl:when test="$node/*/*[$outer-tag-type(local-name(.)) = 'block']">
      <xsl:sequence select="'block'"/>
    </xsl:when>
    <xsl:when test="$node/*/*[$outer-tag-type(local-name(.)) = 'inline']">
      <xsl:sequence select="'inline'"/>
    </xsl:when>
    <xsl:when test="$node/text()[matches(., '\S')]">
      <xsl:sequence select="'inline'"/>
    </xsl:when>
  </xsl:choose>
</xsl:function>

Now these functions can be used to deal with content scenarios at a higher
level. For example, to match elements that contain a mix of block and inline
content:

*[node()[mine:outer-tag-type(.) = 'block']]
[node()[mine:outer-tag-type(.) = 'inline']]

To match elements that expect block content but contain inline content:

*[mine:inner-tag-type(.) = 'block']
[node()[mine:outer-tag-type(.) = 'inline']]

I build more functions from these basic functions. For example, to build a
function that matches elements at the top of a non-preformatted inline content
node tree where extraneous whitespace can be trimmed (like <author>, <title>,
<p>, etc.):

<!-- mine:is-trim-element() - excludes preformatted text elements (like <pre>)
-->
<xsl:template match="pre" mode="mine:match-is-trim-element" as="xs:boolean"
priority="10">
  <xsl:sequence select="false()"/>
</xsl:template>
<xsl:template match="*
  [mine:outer-tag-type(.) = 'block']
  [mine:inner-tag-type(.) = 'inline']
  [not(mine:is-pre-element(.))]
  [not(descendant::*[mine:is-trim-element(.) or mine:is-pre-element(.)])]"
mode="mine:match-is-trim-element" as="xs:boolean">
  <xsl:sequence select="true()"/>
</xsl:template>

I also use these functions in XSLT templates that clean up indenting where
refactoring modifies content, so it understands where to modify whitespace at
inner/outer tag boundaries.

If our schema changes, I update the spreadsheet, update the XSLT3 maps, and
everything works without touching any of the actual templates.

Itbs a work in progress, but this is where Ibm at right now.


  *   Chris

From: Wendell Piez wapiez@xxxxxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Sent: Friday, March 10, 2023 4:11 PM
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [xsl] Looking for a cleaner way of auditing table cell data than
this

Hi,

Just noting, this is crying out for Schematron, which takes care of the
infrastructure and leaves the test.

Indeed, thinking about how the Schematron looks may also lead to a nicer XSLT
solution.

Cheers, Wendell
XSL-List info and
archive<https://urldefense.com/v3/__http:/www.mulberrytech.com/xsl/xsl-list__
;!!A4F2R9G_pg!ffsI6afFUJeUSeNj6T017v_ShQySXjqwcYFkqtaRK97_FN1tW1B4XUvLSad06xB
I1QnGb5sqvmzqezqMIV-8fzHz5zEEDKJ7nVCn0qo1rF6n1vyoo2rb$>
EasyUnsubscribe<https://urldefense.com/v3/__http:/lists.mulberrytech.com/unsu
b/xsl-list/3380743__;!!A4F2R9G_pg!ffsI6afFUJeUSeNj6T017v_ShQySXjqwcYFkqtaRK97
_FN1tW1B4XUvLSad06xBI1QnGb5sqvmzqezqMIV-8fzHz5zEEDKJ7nVCn0qo1rF6n1qoQoEhj$>
(by email<>)

Current Thread