RE: About DSSSL 2 Specifications

Subject: RE: About DSSSL 2 Specifications
From: Tony Graham <tgraham@xxxxxxxxxxxxxxxx>
Date: Sun, 15 Aug 1999 13:50:02 -0400 (EST)
At 15 Aug 1999 10:50 -0400, Didier PH Martin wrote:
 > I may have done here two mistakes a) wrong example, b) bad interpretation of
 > the spec. Matthias, could you just give a small example to illustrate what
 > you are saying. We would all gain knowledge on how to use this feature. And
 > in the same vein, to document it without interpretation error. Thanks, for
 > fixing the error and thanks in advance providing a concrete example.

If I may try in Matthias's place, the time-honoured example of a
character in the document character set described with a minimum
literal is "SGML User's Group Logo".  From the SGML Declaration
example in Section 15.1.1 of ISO 8879 (page 479 of the SGML Handbook):

CHARSET
BASESET "ISO 646-1983//CHARSET
         International Reference Version (IRV)//ESC 2/5 4/0"
DESCSET   0   9  UNUSED
          9   2  9
         11   2  UNUSED
         13   1  13
         14  18  UNUSED
         32  95  32
        127   1  UNUSED
BASESET "ISO Registration Number 109//CHARSET
         ECMA-94 Right Part of Latin Alphabet Nr. 3//ESC 2/9 4/3"
DESCSET 128  32  UNUSED
        160   5  32
        165   1  "SGML User's Group Logo"
        166  88  38
        254   1  127
        255   1  UNUSED

This tells us many things (including that the character set references
are out of date).  This is a copy of the part of the SGML Declaration
that describes the character set used in documents conforming to that
SGML Declaration.  For the most part, the character numbers (since
you're only playing with numbers at this point) in the document
character set are described in terms of character numbers in a known
character set referenced in the BASESET portion.  The lefthand column
of numbers indicate character numbers in the document character set,
the middle column of numbers indicate the extent of a range, and the
third column of numbers, when present, indicates character numbers in
the previous BASESET character set.  For example, "9 2 9" indicates
that the two characters in the document character set starting
at character number 9 are the same as the two characters in the
previous BASESET character set with character numbers starting at
character number 9.

The other two possibilities are that a character number in the
document character set represents a non-SGML character, in which case
it should be declared UNUSED in the description of the document
character set, or a character number represents a character that isn't
part of the character set referenced in the previous BASESET.

When we can't describe a character number in terms of a character
number in a known character set, we can use a "minimum literal"
(i.e. string with a restricted range of allowed characters) to
describe it.  Hence the example:

        165   1  "SGML User's Group Logo"

where we're saying that character number 165 is something that we're
describing as "SGML User's Group Logo".

All of these machinations are for the benefit of the SGML parser
recognising characters that are significant in markup.  If you have
character number 165 in your document, it's still character number 165
when it comes out of the parser, and it's not magically turned into
the string "SGML User's Group Logo".

You could also do:

        165  50  "SGML User's Group Logo"

(and modify the rest of the DESCSET accordingly) to declare 50
characters with this minimum literal.  It doesn't matter what you do
in the DESCSET provided that all of the characters that are
significant in markup are accounted for once and once only.

The DSSSL engine, since it also reads the SGML Declaration, can make
the connection between character number 165 and the minimum literal
"SGML User's Group Logo".  The <literal-describe-char> mechanism in
the DSSSL engine gets you from the mimimum literal to the character
named "logoSGML".  "logoSGML" is also used in the example
<other-chars> declaration in Section 7.1.5 of the DSSSL standard.  I
don't know how you get from "logoSGML" to a character number in a
font, but that's not today's question.

If you need more information, Robin Cover (surprise, surprise) has a
section about the SGML Declaration among his SGML/XML pages, plus
there's a conference paper of mine about the CHARSET portion of the
SGML Declaration at "http://www.mulberrytech.com/papers/docchar.htm";.

Regards,


Tony Graham
======================================================================
Tony Graham                            mailto:tgraham@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9632
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================


 DSSSList info and archive:  http://www.mulberrytech.com/dsssl/dssslist


Current Thread