Subject: Re: character substitution. From: Tom Moertel <tmoertel@xxxxxxxxxx> Date: Tue, 11 Aug 1998 11:36:40 -0400 |
Pawson, David wrote: > > In looking at text->audio preparation using Jade. > > I need to 'clean up' some of the text, such that a > text to speech engine will speak the element content more clearly. > e.g. <tel>(44) 1733-378-777 </tel> > becomes <tel>44, 1733,378,777 </tel> > > I'm looking to examine an element content (CDATA) > and map a function over the output of the data (current-node). > > However. > > What I can't get my tiny mind around is the > sosofo->character->sosofo transformation needed. Sosofos are opaque. You can't go from sosofo->character because you can't inspect the sosofo's contents, and thus your above chain is broken. However, you can still do what you want; you just need to manipulate the datachar nodes in the *grove*, not the character flow object specifications in the output. When I first started working with DSSSL, I was confused between nodes in the grove and flow objects, and this confusion caused me much difficulty. Maybe this explanation will help you avoid some of the struggles I had. Think of your original SGML document as a hierarchy, with each element in the document as a node in the hierarchy and each character within each element attached as a smaller node to the underbelly of its parent element's node. Loosely speaking, that hierarchy becomes the "grove" that Jade hands you after parsing your document. It represents everything Jade knows about your document and everything you can ask Jade. When it comes time to process your document into something more useful, you can ask Jade all about the nodes in the grove -- what kind of nodes they are, what property values they have, and so on. And, based on the answers to what you ask, you can provide Jade with a recipe of sorts for building a sequence of pages or (with some trickery) even another SGML document. This recipe takes the form of a specification of a sequence of flow objects, a sosofo. Flow objects are things like paragraphs, graphics, and so on. Basically, what you're trying to do is build up a sequence of them that represents a decent presentation of the information contained within the grove. In building this sequence (or, more precisely, a "recipe" for this sequence), you'll use construction rules like "(make paragraph ...)" to make tiny flow-object specifications, and these in turn you'll knit into a big sosofo representing what you hope will be a decent presentation of your original document's entire content. Now, with that picture in mind, here are a few key points: 1. Nodes in the grove represent your input content. 2. Flow-Object Specifications (FOSi) represent your recipe for building an output document. 3. Nodes aren't FOSi: The former are there for your inspection; the later aren't. 4. Once a sosofo is created, you can't change it; you can either include it in a larger sosofo or throw it away, but you can't modify it's properties. (I suspect you that you already know most of what I've just written, and so please accept my apologies for not coming up with a more elegant and concise summary.) Now, with that in mind, let's return to your problem. (I'll assume that what you really want is to generate a sosofo corresponding to the content within the TEL element and not, say, use Perl so scrub the source SGML into a richer format.) Let's say you've got the following code in your style sheet: (element tel (let ((tel-children (children (current-node)))) ...)) When it gets called, (current-node) points to a node in the grove that represents the TEL element that's being processed. The children of this node are themselves nodes, most likely representing characters. So, to use your example above, "<tel>(44) 1733-378-777</tel>", the picture from the TEL element down looks like this: [element: gi: "TEL"] | [dchar: #\(]--[dchar: #\4]--[dchar: #\4]-- ... --[dchar: #\7] where dchar is short for "datachar". Thus, in the DSSSL snippet I provided above, tel-children is bound to a nodelist that contains the children of the TEL element, which is the nodelist of datachar nodes corresponding to "(44) 1733-378-777". So, if you want to process the character data, you just need to work with the datachar nodes, in particular their "char" properties. For example, the following DSSSL code generates a sosofo corresponding to "44, 1733,378,777" from the markup "<tel>(44) 1733-378-777</tel>". (element tel (let ((tel-children (children (current-node)))) (tel-char-nodes-to-cleaned-up-sosofo tel-children))) (define (tel-char-nodes-to-cleaned-up-sosofo nl) (let loop ((charnodes nl) (result (empty-sosofo))) (let* ((firstchar (node-list-first charnodes))) (cond ;; are we done? ((node-list-empty? firstchar) result) ;; is the next node really a character? ((not (equal? 'data-char (node-property 'classnm firstchar))) (loop (node-list-rest charnodes) (sosofo-append result firstchar))) ;; it's a character, let's process it (#t (let* ((charval (node-property 'char firstchar)) ;; determine replacement: ;; - -> , ;; ) -> , ;; ( -> (nothing) (replacement (cond ((equal? charval #\-) #\,) ((equal? charval #\)) #\,) ((equal? charval #\() #f) (#t charval)))) (loop (node-list-rest charnodes) (sosofo-append result (if (char? replacement) (make character char: replacement) (empty-sosofo)))))))))) I hope that this (rather lengthy) explanation helps. Cheers, Tom -- Tom Moertel <tmoertel@xxxxxxxxxx> Agnew Moyer Smith Inc. 412.322.6333 tel 412.322.6350 fax DSSSList info and archive: http://www.mulberrytech.com/dsssl/dssslist
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: character substitution., Sebastian Rahtz | Thread | RE: character substitution. Solutio, Pawson, David |
Re: not a character number in the d, Toby Speight | Date | Re: not a character number in the d, Dave Love |
Month |