Re: processing character entities

Subject: Re: processing character entities
From: MARK.WROTH@xxxxxxxxxxx (Wroth, Mark)
Date: Wed, 21 Jul 1999 17:04:48 -0700
Date: Thu, 22 Jul 1999 00:42:56 +0200
From: Matthias Clasen <clasen@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: processing character entities


> Boris Goldowsky <boris@xxxxxxxxxxxxxxxxxxxx> asked:
> >> What are the various solutions people on this list use for processing
> >> character entities in SGML->SGML or SGML->HTML conversions? In my work
> 
> >>>>> Matthias Clasen answered:
> MC> I don't know if this really helps here, but OpenJade implements the
> MC> map-sdata-entity declaration, so you can control the 
> MC> sdata entity --> character map from within your stylesheet now.
>  
> and Boris asked for clarification:
> B>Can someone give me an example of how to use this?  The spec suggests
> B>   <map-sdata-entity name="Alpha"
text="[Alpha]">greekA</map-sdata-entity>
> B>but then what?  How do I get from there to
> B>   <IMG src="/images/alpha.gif">?
> 
I (Mark Wroth) commented 
> I can't give any clarification, but I'll second his question; I primarily
> output in RTF and HTML, but the documents can contain just about any
> character defined in the various ISO entity sets.  So dealing with the
> limitations of the various backends is in my future :-)

And Matthias answered
MC>It is the job of the backends to choose suitable representations in its
MC>output format for any Unicode character it meets. Maybe some backends 
MC>need improvement in this area.

Mark:
Chuckle.  I suppose you could look at it that way.  But it doesn't answer my
(or, I think, Boris') question.  Nor is it a good answer, IMHO, as what
"suitable representation" means may well be application specific.  

	Let me phrase a more specific example.  Suppose I have an entity
&CHI;, mapped to the greek letter chi.  This can be reasonably defined with 

	<!ENTITY khgr    "&#967;" ><!--small chi, Greek, U03C7 -->

I think.  Now I do a transform on this to the SGML backend.  What should the
backend do?
If it's another SGML document, it may well be to insert the appropriate
Unicode character.  But if it's HTML, (an instance of an SGML output), this
character is likely to vanish (as unrecognized). For one of my applications,
I would prefer to bind--in the case of the HTML style output--this entity to
"[chi]"; for another (different audience, different conventions) I would
prefer to bind it to "{chi}"; for a third, to "Chi" (and
"&#967;<SUP>2</SUP>" to "Chi-squared"). In the RTF backend, it can sort of
be handled as a font switch (to Symbol) and appropriate character code --
but not all versions of  MSWord/WordView handle this reasonably.

At the moment, my kludge involves changing the entity definitions when I
change output formats.  While this is not a big deal at the moment, I am not
a happy camper on this subject


 DSSSList info and archive:  http://www.mulberrytech.com/dsssl/dssslist


Current Thread