[xsl] Re: Line Breaking for Text

Subject: [xsl] Re: Line Breaking for Text
From: Dimitre Novatchev <dnovatchev@xxxxxxxxx>
Date: Sun, 16 Dec 2001 09:27:03 -0800 (PST)
> I'm searching for a template that will let me break a long line into several
> shorter ones.  Specifically,
> 
> * The output is for plain text, not HTML.
> * Each new line not exceed a parameterized breakpoint (e.g., 64 characters)
> 
> Any suggestions?  Thanks.

My first attempt was to find the number of 64 byte chunks and to produce them with
simple iteration (using my "buildListWhile" template). While this was very simple to
implement, the result is quite not appealing, due to line ending words being split
in the middle...

An intelligent solution will not split a word if it overlaps the lines ending
position. Instead, this word will be the first word of the next line.

Bellow I'm presenting such a solution. I'm re-using my functional tokenizer,
published last month:

http://aspn.activestate.com/ASPN/Mail/Message/XSL-List/914654

The idea is to parse the text and obtain a result structured like the following:

<line><word>My</word><word>first</word><word>attempt</word><word>was</word></line>
<line><word>to</word><word>find</word><word>the</word><word>number</word></line>
<line><word>of</word><word>64</word><word>byte</word><word>chunks</word></line>

where the string() of any "line" is the maximum that would fit the given line-length
if words are not split apart.

I'm using once again the str-foldl template, which on every character instantiates
the template matching "str-split2lines-func:*".

This template recognises every new word and accumulates the result, which is a list
of "line" elements, each having a list of "word" children. After the last "line"
element there's a single "word", in which the "current word" is being accumulated.

Whenever the current character is one of the specified delimiters, this signals the
formation of a new word. This word is either added to the last line (if the total
line length will not exceed the specified line-length), or a new line is started and
this word becomes the first in the new line.

There are two possible improvements, which are left as an exercise to you:

 1. If a single word exceeds the specified line-length, then it must be split apart.

 2. Lines could be justified both to the left and to the right.

And here's the code:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:str-split2lines-func="f:str-split2lines-func"
exclude-result-prefixes="xsl msxsl str-split2lines-func"
>

   <xsl:import href="str-foldl.xsl"/>

   <str-split2lines-func:str-split2lines-func/>

   <xsl:output indent="yes" omit-xml-declaration="yes"/>
   
    <xsl:template name="str-split-to-lines">
      <xsl:param name="pStr"/>
      <xsl:param name="pLineLength" select="60"/>
      <xsl:param name="pDelimiters" select="' &#9;&#10;&#13;'"/>
      
      <xsl:variable name="vsplit2linesFun"
                    select="document('')/*/str-split2lines-func:*[1]"/>
                    
      <xsl:variable name="vrtfParams">
       <delimiters><xsl:value-of select="$pDelimiters"/></delimiters>
       <lineLength><xsl:copy-of select="$pLineLength"/></lineLength>
      </xsl:variable>

      <xsl:variable name="vResult">
	      <xsl:call-template name="str-foldl">
	        <xsl:with-param name="pFunc" select="$vsplit2linesFun"/>
	        <xsl:with-param name="pStr" select="$pStr"/>
	        <xsl:with-param name="pA0" select="msxsl:node-set($vrtfParams)"/>
	      </xsl:call-template>
      </xsl:variable>
      
      <xsl:for-each select="msxsl:node-set($vResult)/line">
        <xsl:for-each select="word">
          <xsl:value-of select="concat(., ' ')"/>
        </xsl:for-each>
        <xsl:value-of select="'&#10;'"/>
      </xsl:for-each>
    </xsl:template>

    <xsl:template match="str-split2lines-func:*">
      <xsl:param name="arg1" select="/.."/>
      <xsl:param name="arg2"/>
         
      <xsl:copy-of select="$arg1/*[position() &lt; 3]"/>
      <xsl:copy-of select="$arg1/line[position() != last()]"/>
      
	  <xsl:choose>
	    <xsl:when test="contains($arg1/*[1], $arg2)">
	      <xsl:if test="string($arg1/word)">
	         <xsl:call-template name="fillLine">
	           <xsl:with-param name="pLine" select="$arg1/line[last()]"/>
	           <xsl:with-param name="pWord" select="$arg1/word"/>
	           <xsl:with-param name="pLineLength" select="$arg1/*[2]"/>
	         </xsl:call-template>
	      </xsl:if>
	    </xsl:when>
	    <xsl:otherwise>
	      <xsl:copy-of select="$arg1/line[last()]"/>
	      <word><xsl:value-of select="concat($arg1/word, $arg2)"/></word>
	    </xsl:otherwise>
	  </xsl:choose>
	</xsl:template>
      
      <!-- Test if the new word fits into the last line -->
	<xsl:template name="fillLine">
      <xsl:param name="pLine" select="/.."/>
      <xsl:param name="pWord" select="/.."/>
      <xsl:param name="pLineLength" />
      
      <xsl:variable name="vnWordsInLine" select="count($pLine/word)"/>
      <xsl:variable name="vLineLength" select="string-length($pLine) 
                                             + $vnWordsInLine"/>
      <xsl:choose>
        <xsl:when test="not($vLineLength + string-length($pWord) > $pLineLength)">
          <line>
            <xsl:copy-of select="$pLine/*"/>
            <xsl:copy-of select="$pWord"/>
          </line>
        </xsl:when>
        <xsl:otherwise>
          <xsl:copy-of select="$pLine"/>
          <line>
            <xsl:copy-of select="$pWord"/>
          </line>
          <word/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:template>

</xsl:stylesheet>


When instantiated as follows:

    <xsl:template match="/">
      <xsl:call-template name="str-split-to-lines">
        <xsl:with-param name="pStr" select="/*"/>
        <xsl:with-param name="pLineLength" select="64"/>
        <xsl:with-param name="pDelimiters" select="' &#9;&#10;&#13;'"/>
      </xsl:call-template>
    </xsl:template>

and the source xml document is:

<text>
Dec. 13 ? As always for a presidential inaugural, security and surveillance were
extremely tight in Washington, DC, last January. But as George W. Bush prepared to
take the oath of office, security planners installed an extra layer of protection: a
prototype software system to detect a biological attack. The U.S. Department of
Defense, together with regional health and emergency-planning agencies, distributed
a special patient-query sheet to military clinics, civilian hospitals and even aid
stations along the parade route and at the inaugural balls. Software quickly
analyzed complaints of seven key symptoms ? from rashes to sore throats ? for
patterns that might indicate the early stages of a bio-attack. There was a brief
scare: the system noticed a surge in flulike symptoms at military clinics.
Thankfully, tests confirmed it was just that ? the flu.</text>

with the text occupying just one line, the result of the transformation is:

Dec. 13 ? As always for a presidential inaugural, security and 
surveillance were extremely tight in Washington, DC, last 
January. But as George W. Bush prepared to take the oath of 
office, security planners installed an extra layer of 
protection: a prototype software system to detect a biological 
attack. The U.S. Department of Defense, together with regional 
health and emergency-planning agencies, distributed a special 
patient-query sheet to military clinics, civilian hospitals and 
even aid stations along the parade route and at the inaugural 
balls. Software quickly analyzed complaints of seven key 
symptoms ? from rashes to sore throats ? for patterns that might 
indicate the early stages of a bio-attack. There was a brief 
scare: the system noticed a surge in flulike symptoms at 
military clinics. Thankfully, tests confirmed it was just that ? 
the flu. 

As can be seen, the largest line-length is 64 -- as specified.

I hope that this really helped.

Cheers,
Dimitre Novatchev.

__________________________________________________
Do You Yahoo!?
Check out Yahoo! Shopping and Yahoo! Auctions for all of
your unique holiday gifts! Buy at http://shopping.yahoo.com
or bid at http://auctions.yahoo.com

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread