Re: (dsssl) DSSSL engines for ASCII output

Subject: Re: (dsssl) DSSSL engines for ASCII output
From: "John R. Sheets" <dusk@xxxxxxxxxxxxx>
Date: Wed, 18 Apr 2001 10:14:33 -0500
On Apr 17, 2001, jany.quintard@xxxxxxxxxx wrote:
> 
> On Sun, 15 Apr 2001, John R. Sheets wrote:
> 
> > What's the best way to output straight ASCII, based on a DSSSL
> > stylesheet?  I didn't see any Jade backends that would help.  Any
> > other tools that I can use for SGML->ASCII transformations?  Am I
> > better off sidestepping DSSSL altogether and hacking together a custom
> > perl or python script (or whatever) for ASCII output?
> For this I use the SGML backend with an entity to build the file and I
> output the text using (literal ...).
> I think the result is easy to transform using perl, python.

For posterity, here's what I came up with.  I'm sure my Scheme is a
bit crufty, and could have been done much more elegantly (suggestions
encouraged).  Aside from a little Elisp hacking, this is my first
major Scheme effort.

My biggest hurdle was in reformatting multi-line blocks of text (for
the <para> element).  I ended up using the (words ...)  procedure to
tokenize it, then hacked it back together in splice-words, adding in a
variable column width.  I'm pretty happy with the results.

The one thing I couldn't get to work is child node processing inside
those paragraphs.  I used (data (current-node)) to grab the <para>
contents, which slurps up child text without processing it.  For first
time usage of names in screenplays, you want to display them in ALL
CAPS.  I got this working with the tex backend, but can't figure out
how to capitalize things before (data ...) grabs it.  Would that have
to take place in the transformation phase, or would it be possible to
hack something together in the style phase?  Could I force processing
of (PARA NAME) inside the construction rule of (PARA), _before_ the
call to (data ...)?

Anyway, I hope this will help someone trying to hack together ASCII
output.  I tried to keep the helper procedures as generic as possible.
(Sorry for the long post.)

------------------------------------------------------------

;; Columns for paragraph line wrapping
(define %paragraph-wrap% 40)

;; Hard-coded indentations
(define %speaker-indent%   "                       ")
(define %direction-indent% "                 ")
(define %speech-indent%    "      ")
(define %title-indent% (string-append %direction-indent% %direction-indent%))
;; Hack for newline character
(define %newline% "
")
(define %blankline% (string-append %newline% %newline%))

;;
;; Helper functions
;;

;; How many leading spaces in str?
(define (count-whitespace str)
  (let loop ((count 0))
    (if (string=? (substring str count (+ count 1)) " ") ;; Found whitespace?
        (loop (+ count 1))
        count
    )
  ))

;; Returns string, with leading characters trimmed off
(define (trim-leading string)
  (substring string
             (count-whitespace string)
             (string-length string)))

;; Returns string, with trailing characters trimmed off
(define (trim-trailing string)
    (substring string
               0
               (- (string-length string) (count-whitespace string))))

;; Returns string, with whitespace on both ends trimmed off
(define (trim-string string)
  (trim-leading (trim-trailing string))
  )

;; Predicate for (words ...) invocation
(define (is-whitespace str)
  (or (char=? str #\space)
      ;; Hack to check for newline character (is there a better way?)
      ;; Scheme's #\newline doesn't seem to be available.
      (char=? str #\
)))

;; Concatenate list of words into a single string, wrapped to
;; 'width' columns
(define (splice-words wordlist width)
  (let loop ((remaining-words wordlist)  ;; shrinking word list
             (str "")  ;; Growing string
             (current-width 0))  ;; How many chars in this line so far
    (if (null? remaining-words)
        str
        (let* ((nextword (car remaining-words))
               (new-width (+ (string-length nextword) current-width)))
          (loop (cdr remaining-words)
                (if (<= new-width width)
                    (string-append str nextword " ")  ;; Fits in current line
                    (string-append str nextword " "
                                   ;; Only add newline if we're not at the
                                   ;; end of the paragraph
                                   (if (null? (cdr remaining-words))
                                        "" %newline% )))
                (if (> new-width width)
                    0
                    new-width))
          )
    )))

;; Strip out whitespace and tokenize into a list of words, then splice
;; them back together in paragraph form.
(define (format-paragraph str)
  (splice-words (words is-whitespace str) %paragraph-wrap%))

;; Convert SGML speech direction entities into displayable text
(define (convert-speech-direction abbrev)
    (cond
     ((equal? "CONT" abbrev) " (cont.)")
     ((equal? "VO" abbrev) " (V.O.)")
     ((equal? "OS" abbrev) " (O.S.)")
     ((equal? "OC" abbrev) " (O.C.)")
     ((equal? "FILTER" abbrev) " (filter)")
     ((equal? "PAGEBREAK" abbrev) " (cont'd)")
     (else "")
     ))

;;
;; Element formatting rules
;;

(element TITLE
  (make sequence
    (literal %title-indent%)
    (process-children)
    (literal %blankline%)
    ))

;; Generate slug line (needs to be in all caps)
(element SLUG
  (make sequence
    (literal (attribute-string "where" (current-node)))
    (literal " - ")
    (literal (case-fold-up (data (current-node))))
    (literal " - ")
    (literal (attribute-string "when" (current-node)))
    (literal %blankline%)
    ))

(element (SCENE)
  (make sequence
    ;; Only include "CUT TO:" if slug is missing from scene element
    ;; The DTD requires slug to always be the first child element,
    ;; so we're safe here with first-child-gi.
    (if (equal? (first-child-gi) "SLUG")
     (empty-sosofo)
     (make sequence
       (literal "CUT TO:")
       (literal %blankline%)
    (process-children)
    ))))

(element PARA
  (make sequence
    ;; FIXME: Because we use (data ...) here, we skip over any
    ;; processing we might have done in child nodes, e.g., (PARA NAME)
    ;; below.  Thus, names in PARA's are not capitalized.  Not sure
    ;; how to get around this.  Perhaps a transform rule?
    (literal (format-paragraph (data (current-node))))
    (literal %blankline%)
    ))

(element (PARA NAME)
  ;; Upper case if first usage
  (make sequence
    (if (equal? "FIRST" (inherited-attribute-string "USAGE"
    (current-node)))
        (literal (case-fold-up (data (current-node))))
        (process-children))
    (literal (case-fold-up (data (current-node)ng "STATUS"
    (current-node))))
    (literal %newline%)
    ))

(element (DIALOGUE DIRECTION)
  (make sequence
    (literal %direction-indent%)
    (literal "(")
    (process-children)
    (literal ")")
    (literal %newline%)
    ))

(element (DIALOGUE SPEECH)
  (make sequence
    (literal %speech-indent%)
    ;; FIXME: Should really provide the ability to indent with
    ;; format-paragraph.  Until then, we'll have to quote it verbatim

------------------------------------------------------------
    
John

-- 
dusk@xxxxxxxxxxxxx                            http://www.gnome.org
jsheets@xxxxxxxxxxxxxxx                  http://www.worldforge.org
jsheets@xxxxxxxxxxxxxxxxxxxxx     http://openbooks.sourceforge.net
               http://advogato.org/person/jsheets

                   Writing GNOME Applications
          http://www.aw.com/cseng/titles/0-201-65791-0/

 DSSSList info and archive:  http://www.mulberrytech.com/dsssl/dssslist

Current Thread