[xsl] Help With Homework: HTML Tables to CALS

Subject: [xsl] Help With Homework: HTML Tables to CALS
From: "Eliot Kimber ekimber@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 23 Jan 2020 15:29:56 -0000
I have XSLT 1-style code that converts HTML tables to CALS tables. I
discovered that this code fails for certain patterns of HTML tables in that it
miscalculates column spans in the face of row spans earlier in the table. It
doesn't fail for all tables, just specific ones (which is why we didn't notice
this bug earlier). I haven't been able to determine the cause of the bug in
the short time I've had to debug it (found the bug in the course of trying to
prepare a rush publishing job that has about 50 complex tables in it, of
course).

Rather than try to debug and fix the XSLT 1 solution it seemed easier and
better to re-implement the processing using XSLT 3 and I took a stab at doing
it using arrays last night, but quickly got bogged down in my own lack of
facility with such things. The procedural solution in i.e., Java, would be
easy: just populate the 2x2 matrix that represents the table grid to reflect
row and column spans as you process the table cells left-to-right and top to
bottom, using cells projected from earlier rows to determine the starting
column of cells in subsequent rows that get pushed over by row-spanning
cells.

However, I couldn't quickly see how to do this using arrays or maps in XSLT
3--the immutability of arrays and thus the coding patterns that take existing
arrays or maps and return new ones threw me and my feeble brain just wasn't
landing on the right algorithmic pattern.

I know there must be a general pattern for this type of processing but none of
the examples I could find were helpful.

So my request: can someone help me with this challenge and outline how to
solve this kind of problem where you take as input an HTML table where any
cell may span two or more columns and two or more rows and produce a 2x2 array
representing the table's grid, where every grid cell reflects the HTML table
cell that covers it.

Current Thread