Re: [xsl] Dynamically determining line wraps in HTML table cell output

Subject: Re: [xsl] Dynamically determining line wraps in HTML table cell output
From: "Larry Hayashi lhtrees@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 23 Apr 2019 22:14:22 -0000
Look very promising! Thank you!

On Tue, Apr 23, 2019 at 7:30 AM Eliot Kimber ekimber@xxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> The DITA Community i18n plugin provides general locale-aware code for doing
line breaking, word breaking, and rendered size estimation in XSLT using Java
extensions with Saxon.
>
> The project is here:
https://github.com/dita-community/org.dita-community.i18n
>
> While the XSLT has been set up for use in the DITA Open Toolkit, the core
bits are general and it shouldn't be too hard to adapt to other XSLT contexts.
The extension functions will work with Saxon HE if you use the Java API to
register the extension functions per the Saxon documentation (the Open Toolkit
starting with version 3.3 does this automatically). If you are using licensed
Saxon versions you can use the Java reflection support to access the extension
functions.
>
> The code includes a general dictionary-based solution for Simplified Chinese
sorting and grouping using an open-source Chinese dictionary.
>
> For the purpose of generating Word documents you may also be interested in
my Wordinator project: https://github.com/drmacro/wordinator
>
> The Wordinator provides a general solution for going from arbitrary XML to
DOCX by using a general "simple word processing" XML that is then converted to
DOCX using the Apache POI library.
>
> Out of the box the Wordinator is optimized for going from HTML to DOCX but
it can be adapted to any source markup of course. To customize it you
implement an XSLT transform that generates the simple word processing XML that
is then used by the Wordinator Java code to generate the DOCX.
>
> For the use case of producing Word tables with formatted text flowed into
them you could adapt the i18n size estimation code along with the word and
line breaking to generate the Word table cells.
>
> The i18n code was originally implemented to support the creation of EPUBs
where each EPUB page was a single HTML page but input was of arbitrary size,
so I had to implement page layout in XSLT (it's not how I would do it today if
I had to do it over again but it did result in some useful general facilities
for doing rough text layout directly in XSLT).
>
> Cheers,
>
> E.
>
> --
> Eliot Kimber
> http://contrext.com
>
>
> o;?On 4/22/19, 8:47 PM, "Larry Hayashi lhtrees@xxxxxxxxx"
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
>     I have a problem that I am not sure how to tackle. I need to transform
>     long sentences into multiple HTML tables for inclusion into a
>     Microsoft Word document. With short sentences I have no issues, and
>     the HTML tables are formatted for inclusion in Word without any
>     problems. But with longer sentences, I have to divvy up the sentence
>     into fragments. The issue for me is figuring out how to know when to
>     divide a longer text sentence into multiple tables so each table fits
>     width-wise in the Word document. Are there ways to calculate width
>     using XSL other than just string length? The reason I am creating
>     separate tables is because each of these will ultimately be
>     interlinearized with morphemes and glosses underneath. Refer to
>     Leipzig glossing rules
>     (https://www.eva.mpg.de/lingua/pdf/Glossing-Rules.pdf). The problem is
>     actually much more complex as the glosses in subsequent rows may be
>     longer than the words themselves, and the glosses align with the start
>     of each word, but I thought I would start with this initial problem
>     and see what ideas folks might recommend. I was also wondering if this
>     is the kind of thing that XSL-FO might be useful for. I have very
>     limited familiarity with XSL-FO.
>
>     I suspect that my easiest course of action is to:
>     a. pre-determine the left and right margins, indents, etc. for the
>     Word document and define a style for the example sentences.
>     b. determine the maximum width of a line based on above.
>     c. determine the number of m characters (m being the max width
>     character possible) at a specified font-size that can fit within that
>     width in (b)
>     d. use the number in (c) in the XSLT to ensure that sentence fragments
>     are always shorter than this number of characters.
>
>     The above strategy will work most of the time for roman-based
>     orthographies but I suspect will be an issue for other non-Roman
>     orthographies.  So, another thought: I suppose one could call an
>     external function fDetermineWrappedText(cell_width, font, font-size,
>     string) that would populate a table cell and then determine the
>     portion that wraps, then return that fragment back to the XSLT. The
>     XSLT could then put that returned fragment into its own table. I found
>     some suggestions on how to find the line wraps here:
>     https://stackoverflow.com/questions/3738490/finding-line-wraps. I have
>     minimal experience using external functions in XSLT but I think this
>     strategy may be more helpful in the long run.
>
>     Simplified source example:
>     <document>
>     <sentence>John went to the store.</sentence>
>     <sentence>Lorem ipsum dolor sit amet, ac et et inceptos eget
>     sollicitudin, in urna velit et consectetuer eget cras, dictum erat
>     turpis sed velit donec blandit, integer volutpat at dictum nullam
>     nunc.</sentence>
>     </document>
>
>     XSLT process.
>
>     Output example:
>     <html xmlns="http://www.w3.org/1999/xhtml";>
>         <head>
>             <title></title>
>         </head>
>         <body>
>             <table>
>                 <tr><td>John went to the store.</td></tr>
>             </table>
>     <hr/>
>             <table>
>                 <tr><td>Lorem ipsum dolor sit amet, ac et et inceptos eget
>     </td></tr>
>             </table>
>             <table>
>                 <tr><td>sollicitudin, in urna velit et consectetuer eget
>     cras,</td></tr>
>             </table>
>             <table>
>                 <tr><td>dictum erat turpis sed velit donec blandit,
>     integer</td></tr>
>             </table>
>             <table>
>                 <tr><td>volutpat at dictum nullam nunc.</td></tr>
>             </table>
>     <hr/>
>         </body>
>     </html>
>
>     Any suggestions in the overall approach to the problem and what you
>     would do if using XSLT?
>
>     Thanks!
>     Larry

Current Thread