RE: Future of DSSSL: What about PDF?

Subject: RE: Future of DSSSL: What about PDF?
From: "Didier PH Martin" <martind@xxxxxxxxxxxxx>
Date: Sat, 6 Mar 1999 15:15:08 -0500
Hi Avi

<YourComment>
On Saturday, March 06, 1999 06:56, Carlos Villegas [SMTP:cav@xxxxxxxxxxxxxx]
wrote:
>
> As I understand the page-sequence FOs are not currently implemented
> by
> Jade. Though, James has said, implementing the front end part of it
> should be relatively easy, however no backend can currently support
> the DSSSL page model.
>
> Where can I get some examples on using the page-sequence FOs? It's
> difficult for me to understand how they should work from the spec.
> And since there are no tools to play with...
>

If you have any specifics in mind, post, though I believe there is only one
person on this mailing list who can give authoritative answers. It would be
interesting to see how others interpret the scriptures.

> I thought about attempting to design such a formatter some time ago.
>
> However, it's been only that, just an attempt!

Was it an actual attempt? or just a thought? :-)

> One of the key points, of course,  is to implement the DSSSL
> page model and the synchronization flow objects from the start,
> then a simple-page-sequence would be implemented in terms of the
> more general page-sequence. There's no gain on not implementing
> the page model, we already have that.
> However, after reading the spec several times about the page
> model, I need some practical examples, to make sure I have
> the correct interpretation.
>
</YourComment>

<Reply>
In fact my own opinion on the subject (and also because I have to live with
jade code maintenance and modifications) is that what is missing is real
versatile grove objects.

When you look closely to what a dsssl engine is doing:

a) it parse the SGML/XML document and construct a grove
b) from this grove it (theoretically) construct a new one. This latter is a
FO grove contrary to the former which is a document grove.

So, basically we have to deal with groves. Most grove interfaces that I saw
makes this equation "concept object" = "code object" instead on creating a
general mechanism based on good software practice like implementing a grove
with the composite pattern (gamma & al.). I'll explain. This brings us the
result that we have to multiply entities.

If you look at a document's hierarchy or the concept of a grove, it is
simply a hierarchy of objects and to each object is associated a property
set. The composite design pattern is simply that all objects inherit from a
basic object used to manipulate a collection or implement an interface to
manipulate a collection. Thus, for example, imagine an object having methods
to manipulate a collection of objects and therefore you can say that this
object is implicitly a collection of objects. Usual collection methods could
be implemented in this object like:
- add an element (the element is an other object and therefore we can create
a tree)
- remove an element
- delete an element
- update an element
- find or get an element

Then if you define this object as an interface (with similar constructs in
Java or C++ languages - virtual members, or CORBA , ILU or DCOM - Interface
definition). For example, if you defined it with a CORBA, ILU or DCOM IDLs
you can map this interface to several languages and thus implement the
interface with a particular language. Any client having to interface to the
grove would do it through this interface. With the right object middleware,
the client could be implemented in any language. Thus, either DCOM, ILU or
CORBA are good candidate for such interface definition.

OK now we have a tree of objects and methods to manipulate the grove. For
the document grove each object is equivalent to a markup and for the
Formatting object grove each object correspond obviously to a formatting
object :-)

However, something is missing for both groves: the property set. Either for
the document grove or the formatting object grove each object is associated
to a property set. Then we need a new kind of object: a property set with
methods like:

- Add a property
- remove a property
- delete a property
- update a property
- find or get a property

We have the same pattern here. A collection pattern. However this is not the
composite pattern because each collection object is not itself a collection
object. Property sets are not hierarchies. The main difference is that the
grove object contains other grove objects and therefore is structured as a
hierarchy and property set contains property values and therefore do not
makes a tree. We then have something like (let's try a graphical
representation here :-)


object........................ property set
  |____ object................ property set
  |____ object................ property set
          |___ object ........ property set

Thus, grove object have methods to manipulate the object's hierarchy and
property set methods to manipulate the properties collection and set/get
properties values. so, in the previous property set interface we have not
included general property set get/set values methods so let's correct this
situation:

- Add a property
- remove a property
- delete a property
- update a property
- find or get a property
- get a property value
- set a property value

Here is the real advantage of such interface. with a minimal set of methods
you can have get or set any property, member of a property set. As an
example, let's say that the grove object is a formatting object and that we
want to set the "Font-size" property, then we would make a call like

PropertySet.Put("Font-Size", Font_size_value)

and get with:
Font_size_value = PropertySet.Get("Font-Size")

Off course the interface to language mapping could differ for any particular
language. The real advantage of this kind of interface is what I would call
the occam razor concept (don't create entities ad finitum!). You don't have
to create a new object for a new kind of conceptual object, the interface is
general enough so any conceptual objects could be mapped to language
implemented objects.

You can also provide collection enumerators to browse a particular
collection.

In fact, we created this kind of grove object and I'll introduce these grove
objects that could be used to manipulate any "grove concept" such as a
directory service or a structured document (both share the same structure
and both could be mapped to the grove concept).

We mapped, at first the object with DCOM interface which I should recall is
not Microsoft solely property but could be also available on Unix platforms
from the Opensource consortium (http://www.opengroup.org). The OpenSource is
a consortium with rules similar to W3C and membership is based on yearly
fees like W3C is also. Workgroup could be formed and specs published by the
workgroup. Like W3C it is not as open as IETF groups and its specs
elaboration is restricted to members.

We are also working to implement "general grove" interface in Java, ILU and
CORBA and I'll post in a near future a document explaining this. Contrary to
W3C DOM, it is not restricted solely to documents but can also be used for
other hierarchical constructs like system directories (EX: LDAP, NDS)

So, first we have the "Grove Object" supporting the IObject interface. And
because it is an interface, it can be implemented into any language.
Concretely speaking with DCOM it could be easily implemented in C++, VB,
Delphi (Pascal)and Java. To map it to other languages would require to
define the interface with ILU which can be mapped to languages like scheme.
ILU is made by Xerox and is freely available (includeing source code)

interface IObject : IUnknown
{
	HRESULT ParseDisplayName([in]BSTR bstrPath,
      [out]ULONg* pchEaten,
      [in]REFIID riid,
      [out] LPVOID* ppvObj);
	HRESULT AddComponent ([in]BSTR bstrKey,[out, retval]IObject** ppObject);
	HRESULT RemoveComponent ([in]BSTR bstrKey)
	HRESULT GetComponent ([in]BSTR bstrKey, [out,retval] IObject** ppObject);
	HRESULT get_Count (ULONG uCount);
	HRESULT get_Name ([out, retval]BSTR* pbstrName);
	HRESULT put_Name ([in]BSTR bstrName);
	HRESULT get_Parent ([in]REFIID riid, [out, retval] LPVOID *ppObject);
	HRESULT put_Parent([in] LPOBJECT parent);
};

ParseDisplayName takes as input a display name representing a hierarchy
element and return an element of the hierarchy. Each object is part of a
particular name space. For example, we create a hierarchical name space
based on the URN - URC schemas and named TNS. So to get a particular object
you would do a call like:
	GroveObject =
Document.ParseDisplayName("urn:tns:MyDocument/Chapter(1)/Paragraph(2)"
and get the second paragraph of the first document's chapter. you can use
IDs instead of numbers (if markups includes IDs). Someone could choose to
implement the interface with the XPointer name space instead of this one.
The idea is that request are made with string like we do with URL. The main
difference is that, in this case we call that a Universal Resource Name and
is location independant (URL are not) To each URN is associated a Universal
Resource Characteristic (URC)which is equivalent to a property contained in
a property set. a URL is,in this case, just a particular URC or property.
Thus, each grove object is uniquely identified with a URN (Universal
Resource Name - RFC 2141) and a grove object as a single property set object
associated to it. Each property part of the property set is also called a
URC (Universal Uniform Characteristic)

We have just merged the IETF name space concept with grove object queries.
we can also merge with  directory services name space concept. It is
plausible to envision that a grove could be mapped on a LDAP or NDS name
space. Thus, a grove object part of a LDAP name space would be defined with
a display name such as:
		GroveObject =
Document.ParseDisplayName("LDAP://MyDomain.com/paragraph=1,chapter=2,documen
t=MyDocument")
Of course LDAP actual implementation do have the concept of non-typed object
and require a specific shema for each contained object (idem for NDS) this
conduct to two choices: a) create a schema for each DTD object, b) create a
LDAP implementation that support non-typed objects. Anyway the point is not
here. What we should retain is that an grove object is also included in a
name space, more particularly a hierarchical name space. That each object is
thus uniquely indentified with a URN and each property is a URC.

Then other members are self explanatory:
- AddComponent = add a grove object to the collection (as a sub grove
object - see previous figure)
- removeComponent = remove a grove object from the collection
- Get_count = return the number of grove objects contained in the collection
- get_Name = return the grove object's name (or name space particular
context)
- put_Name = set the grove object's name
- get_Parent = get the grove object's parent which is also a grove object.
- put_Parent = set the grove object parent which should also be a grove
object.

When you create a grove object you also create its associated property set
object. DCOM allows you to define multiple interfaces (this is particular
today to DCOM but CORBA 3 will allows multiple interface too - Mozilla
group's XPCOM also support the notion of multiple interfaces). So to obtain
a property set interface you do a queryinterface to the grove object and
this latter returns a IPropertySet interface which is defined as:

interface IPropertySet: IDispatch
{
	HRESULT AddProperty([in]BSTR bstrKey, [in]VARIANT Value);
	HRESULT RemoveProperty([in]BSTR bstrKey);
	HRESULT GetProperty([in]BSTR bstrKey, [out, retval]VARIANT* Value);
	HRESULT ModifyProperty([in]BSTR bstrKey, [in]VARIANT Value);

};

If I map this to EcmaScript to set or get a property this would be:

	FontSize = groveObject.GetProperty("Font-Size");
	groveObject.ModifyProperty("Font-size", FontSize);

- AddProperty = Add a new property to the property set (ex:
groveObject.AddProperty("Font-size") )
- RemoveProperty = remove a property form a property set (ex:
groveObject.removeProperty("Font-size") )
- GetProperty = Get a particular property from a property set (ex: FontSize
= groveObjet("Font-size") )
- ModifyProperty = set a particular property from a property set (ex:
groveObject.ModifyProperty("Font-size", FontSize) )

to enumerate both collection, enumeration interfaces are provided:
interface IEnumObject : IUnknown
{
	HRESULT Next ([in]ULONG celt,
			  [out, retval]LPOBJECT* rgelt,
			  [in]ULONG* pceltFetched);
	HRESULT Skip ([in] ULONG celt);
	HRESULT Reset( void);
	HRESULT Clone([out, retval]IEnumObject** ppenum);
};

interface IEnumProperty : IUnknown
{
	HRESULT Next ([in]ULONG celt,
 			  [out, retval]BSTR* rgelt,
 			  [in]ULONG* pceltFetched);
	HRESULT Skip ([in] ULONG celt);
	HRESULT Reset( void);
	HRESULT Clone([out, retval]IEnumPropertyt** ppenum);
};

Thus in VBscript you could enumerate a grove object collection (i.e
equivalent to children-object) - Sorry if I take VB script instead of ECMA
script. it is only that it is easier with the former.
	for each groveObject in ParentGroveObject
		do sometning
	next groveObject

Here it is. When I was still in the research center (before having no life
in our company startup :-), I noticed the conceptual similarities between:

a) the composite pattern which is used to manipulate whole-part structures
b) directory services like LDAP or NDS which are hierarchical structures
c) structured documents which are also hierarchical structures

In fact, it is a common reflex for humans to chunck things (probably because
of the constrains of our short term memory - ref: "the magic seven" or also
known as the Miller principle (1956) - our short term memory can process
simultaneously 7 elements +- 2). Thus, the whole part structure is a basic
reflex for us to chunck things into hierarchies or whole-parts constructs.

Up to now, most grove or DOM implementation got the reflex to map a
conceptual object to a language object and didn't use the generic Composite
design pattern. A design pattern is simply a "best design practice" and this
best pratice seems to be ignored from most grove or whole-parts constructs.
It is also because most of these design are based on a particular language
instead of an interface. Also, because OCCAM is not there to remind not to
create entities ad-finitum :-). However, the intend was also to let the
languge validate the types. The cons: you have to create an object for each
conceptual object and then multiply entities. Because of this, you cannot
use the "processingInstruction" object for a directory service
"organizationalUnit" object. We have the benefit of type validation but
loose the advantage of simplicity an a simple interface that could be used
to browse diverse objects. For example, our groveobject interface could be
used to browse directories from a highlevel object such as Country down to a
particular paragraph in a specific document. to do so with strongly type
objects requires too much complexity (And remember Occam don't like that we
create entities ad finitum!). If you use a composite pattern type of
interface you soon discover that it is easy to remember, easy to use and
quite powerful.

DCOM and, in a near future, CORBA allow to create generic interfaces for the
composite patern interface definition. Especially, the Queryinterface as
used in DCOM and XPCOM removes the constrains of strict inheritance and
allows objects to present or not certain interfaces based on a particualr
context (no, you cannot do that with strict inheritance). Thus, this
mechanism allows you to create objects with multiple personalities or facets
(inheritance force a IS_A relationship for the inherited characteristics,
the mutltiple interface mechanism allows a COULD_BE relationship based on
the client type - as an example: a client may have rights to get access on
to certain interfaces in a particular context such as groveobject
enumeration if there are sub objects or none if there are no sub objects).

Thus, the object you are talking about is in fact a formatting objects
grove. What is missing is good interfaces for this grove object and
preferably language independant and platform independant interfaces. DCOM
ILU, and CORBA make good candidates for this.

I hope I gave you a good response and spoke with enough autority :-)

Regards
Didier PH Martin
mailto:martind@xxxxxxxxxxxxx
http://www.netfolder.com



 DSSSList info and archive:  http://www.mulberrytech.com/dsssl/dssslist


Current Thread