Subject: Re: processing character entities From: MARK.WROTH@xxxxxxxxxxx (Wroth, Mark) Date: Wed, 21 Jul 1999 17:04:48 -0700 |
Date: Thu, 22 Jul 1999 00:42:56 +0200 From: Matthias Clasen <clasen@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> Subject: Re: processing character entities > Boris Goldowsky <boris@xxxxxxxxxxxxxxxxxxxx> asked: > >> What are the various solutions people on this list use for processing > >> character entities in SGML->SGML or SGML->HTML conversions? In my work > > >>>>> Matthias Clasen answered: > MC> I don't know if this really helps here, but OpenJade implements the > MC> map-sdata-entity declaration, so you can control the > MC> sdata entity --> character map from within your stylesheet now. > > and Boris asked for clarification: > B>Can someone give me an example of how to use this? The spec suggests > B> <map-sdata-entity name="Alpha" text="[Alpha]">greekA</map-sdata-entity> > B>but then what? How do I get from there to > B> <IMG src="/images/alpha.gif">? > I (Mark Wroth) commented > I can't give any clarification, but I'll second his question; I primarily > output in RTF and HTML, but the documents can contain just about any > character defined in the various ISO entity sets. So dealing with the > limitations of the various backends is in my future :-) And Matthias answered MC>It is the job of the backends to choose suitable representations in its MC>output format for any Unicode character it meets. Maybe some backends MC>need improvement in this area. Mark: Chuckle. I suppose you could look at it that way. But it doesn't answer my (or, I think, Boris') question. Nor is it a good answer, IMHO, as what "suitable representation" means may well be application specific. Let me phrase a more specific example. Suppose I have an entity &CHI;, mapped to the greek letter chi. This can be reasonably defined with <!ENTITY khgr "χ" ><!--small chi, Greek, U03C7 --> I think. Now I do a transform on this to the SGML backend. What should the backend do? If it's another SGML document, it may well be to insert the appropriate Unicode character. But if it's HTML, (an instance of an SGML output), this character is likely to vanish (as unrecognized). For one of my applications, I would prefer to bind--in the case of the HTML style output--this entity to "[chi]"; for another (different audience, different conventions) I would prefer to bind it to "{chi}"; for a third, to "Chi" (and "χ<SUP>2</SUP>" to "Chi-squared"). In the RTF backend, it can sort of be handled as a font switch (to Symbol) and appropriate character code -- but not all versions of MSWord/WordView handle this reasonably. At the moment, my kludge involves changing the entity definitions when I change output formats. While this is not a big deal at the moment, I am not a happy camper on this subject DSSSList info and archive: http://www.mulberrytech.com/dsssl/dssslist
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: processing character entities, Matthias Clasen | Thread | Re: processing character entities, Russell Steven Shawn |
Re: OpenJade News - July 8 1999, Brandon Ibach | Date | Shakespeare 2.00 available, Jon Bosak |
Month |