Re: About DSSSL 2 Specifications

Subject: Re: About DSSSL 2 Specifications
From: Matthias Clasen <clasen@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Date: Sun, 15 Aug 1999 22:53:51 +0200
> If I may try in Matthias's place, the time-honoured example of a
> character in the document character set described with a minimum
> literal is "SGML User's Group Logo".  From the SGML Declaration
> example in Section 15.1.1 of ISO 8879 (page 479 of the SGML Handbook):

Thanks. I was considering a conspicuously similar example...
Here are my notes on characters in SGML/DSSSL (I originally intended
to hand this to Didier as input for a FOOTM column on the character FO)


Background
----------

A character is an abstact atom of information. It may have additional
properties besides its name, eg the script it belongs to. A character
repertoire is a set of characters. A character set is a map from
a character repertoire to the integers, ie it assigns numerical
representations to characters. These numerical represesentations are
called coded characters or bit combinations. A coded character is
meaningless without knowledge about the character set it belongs to.

Note: `Numerical representation' should not be confused with `storage
representation', eg UTF-8 and UCS-2 are different storage representations
of the same character set, Unicode.


Characters in SGML
------------------

The SGML declaration defines how bit combinations in the document are
mapped to characters. This is achieved via a description of the bit
combinations in terms of the coded characters of standard character sets
(called basesets) - under the assumption that the mapping from coded
characters to characters is known for basesets. If this is not possible
because no standard character set contains the needed character, SGML also
allows a literal description.

Character set handling in the SGML declaration is further complicated by the
fact that syntax characters are described separatly from document characters.
This has the advantage that a concrete syntax can be specified independently
from the document character set.

In mathematical terminology, the SGML declaration defines a map on
bit combinations whose values are either pairs (baseset, coded character)
or literals.


Characters in DSSSL
-------------------

Characters occur in at least five flavours in DSSSL

- in the document

- in the style sheet

- as expression language objects

- as nodes in the grove exhibiting a value for the char property

- as character flow objects in the flow object tree

In DSSSL, characters are identified by their name. The characters in
the document and the style sheet are given names by concatenating the
map defined by the SGML declaration (actually, document and style sheet
may use different SGML declarations) and a map mapping pairs
(baseset, coded character) and literals to character names. This map
is defined by the baseset-encoding and literal-described-char forms.

DSSSL uses a mechanism similar to the SGML one for associating
characters and their names, namely by describing them with a coded
character in the `universal character set' ISO10646-1 or by simply
naming them. This is done with the standard-chars and other-chars forms.

Since unavailable characters are often represented as sdata entities
in SGML documents, DSSSL provides a facility for mapping sdata entities
to characters with the map-sdata-entity form.


-- 
Matthias Clasen, 
Tel. 0761/203-5606
Email: clasen@xxxxxxxxxxxxxxxxxxxxxxxxxx
Mathematisches Institut, Albert-Ludwigs-Universitaet Freiburg


 DSSSList info and archive:  http://www.mulberrytech.com/dsssl/dssslist


Current Thread