RE: processing character entities

Subject: RE: processing character entities
From: "Steffen Heinrich" <heinrich@xxxxxxxxxxxx>
Date: Tue, 20 Jul 1999 13:39:04 +0100
Boris Goldowsky  asked:
>
>What are the various solutions people on this list use for 
>processing character entities in SGML->SGML or SGML->HTML 
>conversions? In my work I translate a lot of SGML containing 
>entities for foreign characters, math symbols, etc. into HTML.  Some 
>get turned into HTML entities, some are dumbed down to ASCII, and 
>others get turned into inline graphics.
>

Hello Boris, 

the approach that I take to tackle the very same problem consists of  
three different parts: 

1. Top of the DTD, before the declaration of any other entities
a) on validating:
<!ENTITY % qartchars SYSTEM "qartchars">
<!--%qartchars;-->
<!ENTITY % e.sup "" >
...
<!ENTITY % ISOlat1 PUBLIC "ISO 8879:1986//ENTITIES Added Latin 1//EN">
%ISOlat1;
...
<!ENTITY %	b.float	"fachidx | fussnt | verw | xverw 
      | f | dfref |produkt | unklar %e.sup;" -- BODY.floats -->

%qartchars; is excluded. The ordinary float elements are allowed 
only. 

b) on DSSSL transformation:
<!ENTITY % qartchars SYSTEM "qartchars">
%qartchars;
<!ENTITY % e.sup "" >
...
<!ENTITY % ISOlat1 PUBLIC "ISO 8879:1986//ENTITIES Added Latin 1//EN">
%ISOlat1; 
... 
<!ENTITY %	b.float	"fachidx | fussnt | verw | xverw 
      | f | dfref |produkt | unklar %e.sup;" -- BODY.floats -->

%qartchars; gets included and its content takes precedence over any 
following declarations. 
You could also change the catalog or simply use a different catalog 
file that points to other entity files on transformation than the one 
you use on validating. 

2. The qartchars-entity:
<!-- The following Elements will be appended to the 'anywhere' - 
float content. --> 
<!ENTITY %	e.sup	" | FONT | CHARREF | IMG" >

<!-- Using the HTML FONT-tag. --> 
<!ELEMENT FONT - - (#PCDATA) > 
<!ATTLIST FONT size NUMBER #IMPLIED
               face CDATA  #IMPLIED>

<!-- Mapping character references to themselves or to character 
codes. --> 
<!ELEMENT CHARREF - o EMPTY >
<!ATTLIST CHARREF cname CDATA  #IMPLIED       -- used if present --
                  value NUMBER #REQUIRED 
-- else uses code reference -->

<!-- Using the HTML IMG-tag. --> 
<!ELEMENT img            - o  EMPTY>
<!ATTLIST IMG   src            CDATA       #REQUIRED
                alt            CDATA       #IMPLIED
                align          CDATA       #IMPLIED
                border         NUMBER      #IMPLIED          >

<!--Examples of SDATA entities to be overridden. --> 
     <!ENTITY aring  "<CHARREF CNAME='aring' VALUE='229'>" >
     <!ENTITY bdquo  "<CHARREF VALUE='132'>" >
     <!ENTITY ldquo  "<CHARREF VALUE='147'>" >
     <!ENTITY quot   "<CHARREF CNAME='quot' VALUE='22'>" >
     <!ENTITY lt     "<CHARREF CNAME='lt' VALUE='60'>" >
     <!ENTITY ap     "<FONT FACE=Symbol>&#187;</FONT>" > 
    <!ENTITY rArr   "<FONT     FACE=Symbol>&#222;</FONT>" 
-- double right arrow -->
   <!ENTITY rarr       "<FONT FACE=Symbol>&#174;</FONT>" -- right arrow--> 

     <!ENTITY alpha  "<FONT FACE=Symbol>a</FONT>" >
     <!ENTITY beta   "<FONT FACE=Symbol>b</FONT>" >
     <!ENTITY sigma  "<FONT FACE=Symbol>s</FONT>" >
     <!ENTITY tau    "<FONT FACE=Symbol>t</FONT>" >
     <!ENTITY Delta  "<FONT FACE=Symbol>D</FONT>" >

     <!ENTITY xover  '<IMG SRC="../entities/x_ov.gif" 
ALT="x overscore">' > 
<!ENTITY yover      '<IMG SRC="../entities/y_ov.gif" 
ALT="y overscore">' > 
<!ENTITY zover      '<IMG SRC="../entities/z_ov.gif" 
ALT="z overscore">'>
...
you get the idea...

3. The dsl script: 
(element FONT
  (make element gi: "FONT"
        attributes: (copy-attributes)
    (process-children)))

(element IMG
  (make empty-element gi: "IMG"
        attributes: (copy-attributes)))

(element CHARREF
  (make entity-ref name: 
       (if (attribute-string "CNAME")
         (attribute-string "CNAME")
         (string-append 
                 "#"
                 (attribute-string "VALUE")))))

The construction rules copy FONT and IMG elements to the HTML output, 
while processing of CHARREF elements is determined by the presence of 
the CNAME attribute.  

This works very well and I find it more satisfying than the 
choice given between general SDATA-mapping without possibility to 
take influence or general SDATA-preservation. 
Still, I'd love to hear about the workarounds that are used by 
others. 

Regards, Steffen


---------
steffen heinrich, berlin, germany
"When you're chewing on life's gristle 
Don't grumble, give a whistle
And DSSSL helps things turn out for the best..."
(Monty Python overheard)


 DSSSList info and archive:  http://www.mulberrytech.com/dsssl/dssslist


Current Thread