Re: processing character entities

Subject: Re: processing character entities
From: Chris Maden <crism@xxxxxxxxxxx>
Date: Fri, 23 Jul 1999 11:15:16 -0400 (EDT)
[Boris Goldowsky]
> What are the various solutions people on this list use for
> processing character entities in SGML->SGML or SGML->HTML
> conversions? In my work I translate a lot of SGML containing
> entities for foreign characters, math symbols, etc. into HTML.  Some
> get turned into HTML entities, some are dumbed down to ASCII, and
> others get turned into inline graphics.
> 
> In the absence of QUERY, there's no obvious way to write a rule to
> deal with these.  I know of two workaround solutions:
> 
> 2. Write a function that does the equivalent of process-children,
> except it also scans PCDATA for entities and process them.  Use this
> function everywhere you would normally use process-children.

This is what I do.  It's really not that bad, just verbose.

(define (process-text #!optional (snl (current-node)))
  ;; this part is inefficient; I need to rewrite this to carry a node
  ;; index instead of an actual list of nodes
  (let p-t-loop ((this-node (node-list-first (children snl)))
		 (other-nodes (node-list-rest (children snl))))
       (if (node-list-empty? this-node)
	   (empty-sosofo)
	   (sosofo-append (case (node-property 'class-name
					       this-node)
			    ;; handle special characters
			    ((data-char) (case (node-property 'char
							      this-node)
					   ;; quotation mark
					   ((#\") (make entity-ref
							name: "quot"))
					   ;;; etc.
					 ))
			    ;; handle SDATA entity references
			    ((sdata) (case (node-property 'system-data
							  this-node)
				       ;; a with acute accent
				       (("[aacute]") (make entity-ref
							   name: "aacute"))
				       ;;; etc.
				     ))
			    ;; child elements
			    ((element) (process-node-list this-node))
			    (else (process-node-list this-node)))
			  (if (node-list-empty? other-nodes)
			      (empty-sosofo)
			      (p-t-loop (node-list-first other-nodes)
					(node-list-rest other-nodes)))))))

-Chris
-- 
<!NOTATION SGML.Geek PUBLIC "-//Anonymous//NOTATION SGML Geek//EN">
<!ENTITY crism PUBLIC "-//O'Reilly//NONSGML Christopher R. Maden//EN"
"<URL>http://www.oreilly.com/people/staff/crism/ <TEL>+1.617.499.7487
<USMAIL>90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek>


 DSSSList info and archive:  http://www.mulberrytech.com/dsssl/dssslist


Current Thread