Subject: Re: About DSSSL 2 Specifications From: Matthias Clasen <clasen@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> Date: Sun, 15 Aug 1999 22:53:51 +0200 |
> If I may try in Matthias's place, the time-honoured example of a > character in the document character set described with a minimum > literal is "SGML User's Group Logo". From the SGML Declaration > example in Section 15.1.1 of ISO 8879 (page 479 of the SGML Handbook): Thanks. I was considering a conspicuously similar example... Here are my notes on characters in SGML/DSSSL (I originally intended to hand this to Didier as input for a FOOTM column on the character FO) Background ---------- A character is an abstact atom of information. It may have additional properties besides its name, eg the script it belongs to. A character repertoire is a set of characters. A character set is a map from a character repertoire to the integers, ie it assigns numerical representations to characters. These numerical represesentations are called coded characters or bit combinations. A coded character is meaningless without knowledge about the character set it belongs to. Note: `Numerical representation' should not be confused with `storage representation', eg UTF-8 and UCS-2 are different storage representations of the same character set, Unicode. Characters in SGML ------------------ The SGML declaration defines how bit combinations in the document are mapped to characters. This is achieved via a description of the bit combinations in terms of the coded characters of standard character sets (called basesets) - under the assumption that the mapping from coded characters to characters is known for basesets. If this is not possible because no standard character set contains the needed character, SGML also allows a literal description. Character set handling in the SGML declaration is further complicated by the fact that syntax characters are described separatly from document characters. This has the advantage that a concrete syntax can be specified independently from the document character set. In mathematical terminology, the SGML declaration defines a map on bit combinations whose values are either pairs (baseset, coded character) or literals. Characters in DSSSL ------------------- Characters occur in at least five flavours in DSSSL - in the document - in the style sheet - as expression language objects - as nodes in the grove exhibiting a value for the char property - as character flow objects in the flow object tree In DSSSL, characters are identified by their name. The characters in the document and the style sheet are given names by concatenating the map defined by the SGML declaration (actually, document and style sheet may use different SGML declarations) and a map mapping pairs (baseset, coded character) and literals to character names. This map is defined by the baseset-encoding and literal-described-char forms. DSSSL uses a mechanism similar to the SGML one for associating characters and their names, namely by describing them with a coded character in the `universal character set' ISO10646-1 or by simply naming them. This is done with the standard-chars and other-chars forms. Since unavailable characters are often represented as sdata entities in SGML documents, DSSSL provides a facility for mapping sdata entities to characters with the map-sdata-entity form. -- Matthias Clasen, Tel. 0761/203-5606 Email: clasen@xxxxxxxxxxxxxxxxxxxxxxxxxx Mathematisches Institut, Albert-Ludwigs-Universitaet Freiburg DSSSList info and archive: http://www.mulberrytech.com/dsssl/dssslist
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: About DSSSL 2 Specifications, Didier PH Martin | Thread | RE: About DSSSL 2 Specifications, Didier PH Martin |
Publishing work to date, Chris Maden | Date | Re: Groves processing in OpenJade, Matthias Clasen |
Month |