Re: [xsl] Dynamically determining line wraps in HTML table cell output

Subject: Re: [xsl] Dynamically determining line wraps in HTML table cell output
From: "Eliot Kimber ekimber@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 23 Apr 2019 14:30:09 -0000
The DITA Community i18n plugin provides general locale-aware code for doing
line breaking, word breaking, and rendered size estimation in XSLT using Java
extensions with Saxon.

The project is here:

While the XSLT has been set up for use in the DITA Open Toolkit, the core bits
are general and it shouldn't be too hard to adapt to other XSLT contexts. The
extension functions will work with Saxon HE if you use the Java API to
register the extension functions per the Saxon documentation (the Open Toolkit
starting with version 3.3 does this automatically). If you are using licensed
Saxon versions you can use the Java reflection support to access the extension

The code includes a general dictionary-based solution for Simplified Chinese
sorting and grouping using an open-source Chinese dictionary.

For the purpose of generating Word documents you may also be interested in my
Wordinator project:

The Wordinator provides a general solution for going from arbitrary XML to
DOCX by using a general "simple word processing" XML that is then converted to
DOCX using the Apache POI library.

Out of the box the Wordinator is optimized for going from HTML to DOCX but it
can be adapted to any source markup of course. To customize it you implement
an XSLT transform that generates the simple word processing XML that is then
used by the Wordinator Java code to generate the DOCX.

For the use case of producing Word tables with formatted text flowed into them
you could adapt the i18n size estimation code along with the word and line
breaking to generate the Word table cells.

The i18n code was originally implemented to support the creation of EPUBs
where each EPUB page was a single HTML page but input was of arbitrary size,
so I had to implement page layout in XSLT (it's not how I would do it today if
I had to do it over again but it did result in some useful general facilities
for doing rough text layout directly in XSLT).



Eliot Kimber

o;?On 4/22/19, 8:47 PM, "Larry Hayashi lhtrees@xxxxxxxxx"
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

    I have a problem that I am not sure how to tackle. I need to transform
    long sentences into multiple HTML tables for inclusion into a
    Microsoft Word document. With short sentences I have no issues, and
    the HTML tables are formatted for inclusion in Word without any
    problems. But with longer sentences, I have to divvy up the sentence
    into fragments. The issue for me is figuring out how to know when to
    divide a longer text sentence into multiple tables so each table fits
    width-wise in the Word document. Are there ways to calculate width
    using XSL other than just string length? The reason I am creating
    separate tables is because each of these will ultimately be
    interlinearized with morphemes and glosses underneath. Refer to
    Leipzig glossing rules
    ( The problem is
    actually much more complex as the glosses in subsequent rows may be
    longer than the words themselves, and the glosses align with the start
    of each word, but I thought I would start with this initial problem
    and see what ideas folks might recommend. I was also wondering if this
    is the kind of thing that XSL-FO might be useful for. I have very
    limited familiarity with XSL-FO.

    I suspect that my easiest course of action is to:
    a. pre-determine the left and right margins, indents, etc. for the
    Word document and define a style for the example sentences.
    b. determine the maximum width of a line based on above.
    c. determine the number of m characters (m being the max width
    character possible) at a specified font-size that can fit within that
    width in (b)
    d. use the number in (c) in the XSLT to ensure that sentence fragments
    are always shorter than this number of characters.

    The above strategy will work most of the time for roman-based
    orthographies but I suspect will be an issue for other non-Roman
    orthographies.  So, another thought: I suppose one could call an
    external function fDetermineWrappedText(cell_width, font, font-size,
    string) that would populate a table cell and then determine the
    portion that wraps, then return that fragment back to the XSLT. The
    XSLT could then put that returned fragment into its own table. I found
    some suggestions on how to find the line wraps here: I have
    minimal experience using external functions in XSLT but I think this
    strategy may be more helpful in the long run.

    Simplified source example:
    <sentence>John went to the store.</sentence>
    <sentence>Lorem ipsum dolor sit amet, ac et et inceptos eget
    sollicitudin, in urna velit et consectetuer eget cras, dictum erat
    turpis sed velit donec blandit, integer volutpat at dictum nullam

    XSLT process.

    Output example:
    <html xmlns="";>
                <tr><td>John went to the store.</td></tr>
                <tr><td>Lorem ipsum dolor sit amet, ac et et inceptos eget
                <tr><td>sollicitudin, in urna velit et consectetuer eget
                <tr><td>dictum erat turpis sed velit donec blandit,
                <tr><td>volutpat at dictum nullam nunc.</td></tr>

    Any suggestions in the overall approach to the problem and what you
    would do if using XSLT?


Current Thread