Yet another Forth objects package

After criticizing the Neon model in the last issue, here I present (and expose to criticism) a model that I find better, and its implementation. Its properties (most are advantages IMO) are: I have used the technique, on which this model is based, for implementing Gray [ertl89][ertl97]; we have also used this technique in Gforth.

This paper assumes (in some places) that you have read the paper on structures.

Why object-oriented programming?

Often we have to deal with several data structures (objects), that have to be treated similarly in some respects, but differ in others. Graphical objects are the textbook example: circles, triangles, dinosaurs, icons, and others, and we may want to add more during program development. We want to apply some operations to any graphical object, e.g., draw for displaying it on the screen. However, draw has to do something different for every kind of object.

We could implement draw as a big CASE control structure that executes the appropriate code depending on the kind of object to be drawn. This would be not be very elegant, and, moreover, we would have to change draw every time we add a new kind of graphical object (say, a spaceship).

What we would rather do is: When defining spaceships, we would tell the system: "Here's how you draw a spaceship; you figure out the rest."

This is the problem that all systems solve that (rightfully) call themselves object-oriented, and the object-oriented package I present here also solves this problem (and not much else).

Terminology

This section is mainly for reference, so you don't have to understand all of it right away. I (mostly) use the same Smalltalk-inspired terminology as [mckewan97]. In short:
class
a data structure definition with some extras.
object
an instance of the data structure described by the class definition.
instance variables
fields of the data structure.
selector
(or method selector) a word (e.g., draw) for performing an operation on a variety of data structures (classes). A selector describes what operation to perform. In C++ terminology: a (pure) virtual function.
method
the concrete definition that performs the operation described by the selector for a specific class. A method specifies how the operation is performed for a specific class.
selector invocation
a call of a selector. One argument of the call (the TOS (top-of-stack)) is used for determining which method is used. In Smalltalk terminology: a message (consisting of the selector and the other arguments) is sent to the object.
receiving object
the object used for determining the method executed by a selector invocation. In our model it is the object that is on the TOS when the selector is invoked. (Receiving comes from Smalltalks message terminology.)
child class
a class that has (inherits) all properties (instance variables, selectors, methods) from a parent class. In Smalltalk terminology: The subclass inherits from the superclass. In C++ terminology: The derived class inherits from the base class.

Basic Usage

You can define a class for graphical objects like this:
object class \ "object" is the parent class
  selector draw ( x y graphical -- )
end-class graphical
This code defines a class graphical with an operation draw. We can perform the operation draw on any graphical object, e.g.:
100 100 t-rex draw
where t-rex is a word (say, a constant) that produces a graphical object.

How do we create a graphical object? With the present definitions, we cannot create a useful graphical object. The class graphical describes graphical objects in general, but not any concrete graphical object type (C++ users would call it an abstract class); e.g., there is no method for the selector draw in the class graphical.

For concrete graphical objects, we define child classes of the class graphical, e.g.:

graphical class \ "graphical" is the parent class
  cell% field circle-radius

:noname ( x y circle -- )
  circle-radius @ draw-circle ;
overrides draw

:noname ( n-radius circle -- )
  circle-radius ! ;
overrides construct

end-class circle
Here we define a class circle as a child of graphical, with a field circle-radius (which behaves just like a field in the structure package); it defines new methods for the selectors draw and construct (construct is defined in object, the parent class of graphical).

Now we can create a circle on the heap (i.e., allocated memory) with

50 circle heap-new constant my-circle
heap-new invokes construct, thus initializing the field circle-radius with 50. We can draw this new circle at (100,100) with
100 100 my-circle draw
Note: You can invoke a selector only if the object on the TOS (the receiving object) belongs to the class where the selector was defined or one of its descendents; e.g., you can invoke draw only for objects belonging to graphical or its descendents (e.g., circle). Immediately before end-class, the search order has to be the same as immediately after class.

The object class

When you define a class, you have to specify a parent class. So how do you start defining classes? There is one class available from the start: object. You can use it as ancestor for all classes. It is the only class that has no parent. It has two selectors: construct and print.

Creating objects

You can create and initialize an object of a class on the heap with heap-new ( ... class -- object ) and in the dictionary (allocation with allot) with dict-new ( ... class -- object ). Both words invoke construct, which consumes the stack items indicated by "..." above.

If you want to allocate memory for an object yourself, you can get its alignment and size with class-inst-size 2@ ( class -- align size ). Once you have memory for an object, you can initialize it with init-object ( ... class object -- ); construct does only a part of the necessary work.

Programming style

This section is not exhaustive.

In general, it is a good idea to ensure that all methods for the same selector have the same stack effect: when you invoke a selector, you often have no idea which method will be invoked, so, unless all methods have the same stack effect, you will not know the stack effect of the selector invocation.

One exception to this rule is methods for the selector construct. We know which method is invoked, because we specify the class to be constructed at the same place. Actually, I defined construct as a selector only to give the users a convenient way to specify initialization. The way it is used, a mechanism different from selector invocation would be more natural (but probably would take more code and more space to explain).

Class Binding

Normal selector invocations determine the method at run-time depending on the class of the receiving object (late binding).

Sometimes we want to invoke a different method. E.g., assume that you want to use the simple method for printing objects instead of the possibly long-winded print method of the receiver class. You can achieve this by replacing the invocation of print with

[bind] object print
in compiled code or
bind object print
in interpreted code. Alternatively, you can define the method with a name (e.g., print-object), and then invoke it through the name. Class binding is just a (often more convenient) way to achieve the same effect; it avoids name clutter and allows you to invoke methods directly without naming them first.

A frequent use of class binding is this: When we define a method for a selector, we often want the method to do what the selector does in the parent class, and a little more. There is a special word for this purpose: [parent]; [parent] selector is equivalent to [bind] parent selector, where parent is the parent class of the current class. E.g., a method definition might look like:

:noname
  dup [parent] foo \ do parent's foo on the receiving object
  ... \ do some more
; overrides foo

[mckewan97] presents class binding as an optimization technique. I recommend not using it for this purpose unless you are in an emergency. Late binding is pretty fast with this model anyway, so the benefit of using class binding is small; the cost of using class binding where it is not appropriate is reduced maintainability.

While we are at programming style questions: You should bind selectors only to ancestor classes of the receiving object. E.g., say, you know that the receiving object is of class foo or its descendents; then you should bind only to foo and its ancestors.

Method conveniences

In a method you usually access the receiving object pretty often. If you define the method as a plain colon definition (e.g., with :noname), you may have to do a lot of stack gymnastics. To avoid this, you can define the method with m: ... ;m. E.g., you could define the method for drawing a circle with
m: ( x y circle -- )
  ( x y ) this circle-radius @ draw-circle ;m
When this method is executed, the receiver object is removed from the stack; you can access it with this (admittedly, in this example the use of m: ... ;m offers no advantage). Note that I specify the stack effect for the whole method (i.e. including the receiver object), not just for the code between m: and ;m. You cannot use exit in m:...;m; instead, use exitm. Moreover, for any word that calls catch and was defined before loading objects.fs, you have to redefine it like I redefined catch:
: catch this >r catch r> to-this ;

You will frequently use sequences of the form this field (in the example above: this circle-radius). If you use the field only in this way, you can define it with inst-var and eliminate the this before the field name. E.g., the circle class above could also be defined with:

graphical class
  cell% inst-var radius

m: ( x y circle -- )
  radius @ draw-circle ;m
overrides draw

m: ( n-radius circle -- )
  radius ! ;m
overrides construct

end-class circle
radius can only be used in circle and its descendent classes and inside m:...;m.

You can also define fields with inst-value, which is to inst-var what value is to variable. You can change the value of such a field with [to-inst]. E.g., we could also define the class circle like this:

graphical class
  inst-value radius

m: ( x y circle -- )
  radius draw-circle ;m
overrides draw

m: ( n-radius circle -- )
  [to-inst] radius ;m
overrides construct

end-class circle

Names and Scoping

Inheritance is frequent, unlike structure extension. This exacerbates the problem with the field name convention: One always has to remember in which class the field was originally defined; changing a part of the class structure would require changes for renaming in otherwise unaffected code.

To solve this problem, I added a scoping mechanism (which was not in my original charter): A field defined with inst-var is visible only in the class where it is defined and in the descendent classes of this class. Using such fields only makes sense in m:-defined methods in these classes anyway.

This scoping mechanism allows us to use the unadorned field name, because name clashes with unrelated words become much less likely.

Once we have this mechanism, we can also use it for controlling the visibility of other words: All words defined after protected are visible only in the current class and its descendents. public restores the compilation (i.e. current) wordlist that was in effect before. If you have several protecteds without an intervening public or set-current, public will restore the compilation wordlist in effect before the first of these protecteds.

Interfaces

In this model you can only call selectors defined in the class of the receiving objects or in one of its ancestors. If you call a selector with a receiving object that is not in one of these classes, the result is undefined; if you are lucky, the program crashes immediately.

Now consider the case when you want to have a selector (or several) available in two classes: You would have to add the selector to a common ancestor class, in the worst case to object. You may not want to do this, e.g., because someone else is responsible for this ancestor class.

The solution for this problem is interfaces. An interface is a collection of selectors. If a class implements an interface, the selectors become available to the class and its descendents. A class can implement an unlimited number of interfaces. For the problem discussed above, we would define an interface for the selector(s), and both classes would implement the interface.

As an example, consider an interface storage for writing objects to disk and getting them back, and a class foo foo that implements it. The code for this would look like this:

interface
  selector write ( file object -- )
  selector read1 ( file object -- )
end-interface storage

bar class
  storage implementation

... overrides write
... overrides read
...
end-class foo
(I would add a word read ( file -- object ) that uses read1 internally, but that's beyond the point illustrated here.)

Note that you cannot use protected in an interface; and of course you cannot define fields.

In the Neon model, all selectors are available for all classes; therefore it does not need interfaces. The price you pay in this model is slower late binding, and therefore, added complexity to avoid late binding.

Implementation

An object is a piece of memory, like one of the data structures described with struct...end-struct. It has a field object-map that points to the method map for the object's class.

The method map (this is Self [chambers&ungar89] terminology; in C++ terminology: virtual function table) is an array that contains the execution tokens (XTs) of the methods for the object's class. Each selector contains an offset into the method maps.

selector is a defining word that uses create and does>. The body of the selector contains the offset; the does> action for a class selector is, basically:

( object addr ) @ over object-map @ + @ execute
Since object-map is the first field of the object, it does not generate any code. As you can see, calling a selector has a small, constant cost.

A class is basically a struct combined with a method map. During the class definition the alignment and size of the class are passed on the stack, just as with structs, so field can also be used for defining class fields. However, passing more items on the stack would be inconvenient, so class builds a data structure in memory, which is accessed through the variable current-interface. After its definition is complete, the class is represented on the stack by a pointer (e.g., as parameter for a child class definition).

At the start, a new class has the alignment and size of its parent, and a copy of the parent's method map. Defining new fields extends the size and alignment; likewise, defining new selectors extends the method map. overrides just stores a new XT in the method map at the offset given by the selector.

Class binding just gets the XT at the offset given by the selector from the class's method map and compile,s (in the case of [bind]) it.

I implemented this as a value. At the start of an m:...;m method the old this is stored to the return stack and restored at the end; and the object on the TOS is stored TO this. This technique has one disadvantage: If the user does not leave the method via ;m, but via throw or exit, this is not restored (and exit may crash). To deal with the throw problem, I have redefined catch to save and restore this; the same should be done with any word that can catch an exception. As for exit, I simply forbid it (as a replacement, there is exitm).

inst-var is just the same as field, with a different does> action:

@ this +
Similar for inst-value.

Each class also has a wordlist that contains the words defined with inst-var and inst-value, and its protected words. It also has a pointer to its parent. class pushes the wordlists of the class an all its ancestors on the search order, and end-class drops them.

An interface is like a class without fields, parent and protected words; i.e., it just has a method map. If a class implements an interface, its method map contains a pointer to the method map of the interface. The positive offsets in the map are reserved for class methods, therefore interface map pointers have negative offsets. Interfaces have offsets that are unique throughout the system, unlike class selectors, whose offsets are only unique for the classes where the selector is available (invokable).

This structure means that interface selectors have to perform one indirection more than class selectors to find their method. Their body contains the interface map pointer offset in the class method map, and the method offset in the interface method map. The does> action for an interface selector is, basically:

( object selector-body )
2dup selector-interface @ ( object selector-body object interface-offset )
swap object-map @ + @ ( object selector-body map )
swap selector-offset @ + @ execute
where object-map and selector-offset are first fields and generate no code.

As a concrete example, consider the following code:

interface
  selector if1sel1
  selector if1sel2
end-interface if1

object class
  if1 implementation
  selector cl1sel1
  cell% inst-var cl1iv1

' m1 overrides construct
' m2 overrides if1sel1
' m3 overrides if1sel2
' m4 overrides cl1sel2
end-class cl1

create obj1 object dict-new drop
create obj2 cl1    dict-new drop
The data structure created by this code (including the data structure for object) is shown in the figure, assuming a cell size of 4.

Related Work

For a comparison with the Neon model you just have to compare the properties of the Neon model presented in the last issue with the properties presented here.

Another well-known publication is [pountain87]. However, it is not really about object-oriented programming, because it hardly deals with late binding. Instead, it focuses on features like information hiding and overloading that are characteristic of modular languages like Ada (83).

There are also many other papers on object-oriented Forth extensions; E.g., [rodriguez&poehlman96] lists 17 and [mckewan97] lists 6. In the rest of this section I will discuss two systems that have the implementation using method maps in common with the package discussed here.

The model of [zsoter96] makes heavy use of an active object (like this in my model): The active object is not only used for accessing all fields, but also specifies the receiving object of every selector invocation; you have to change the active object explicitly with { ... }, whereas in my model it changes more or less implicitly at m: ... ;m. Such a change at the method entry point is unnecessary with the [zsoter96] model, because the receiving object is the active object already; OTOH, the explicit change is absolutely necessary in that model, because otherwise no one could ever change the active object.

The model of [paysan94] combines information hiding and overloading resolution (by keeping names in various wordlists) with object-oriented programming. It sets the active object implicitly on method entry, but also allows explicit changing (with >o...o> or with with...endwith). It uses parsing and state-smart objects and classes for resolving overloading and for early binding: the object or class parses the selector and determines the method from this. If the selector is not parsed by an object or class, it performs a call to the selector for the active object (late binding), like [zsoter96]. Fields are always accessed through the active object. The big disadvantage of this model is the parsing and the state-smartness, which reduces extensibility and increases the opportunities for subtle bugs; essentially, you are only safe if you never tick or postpone an object or class.

Acknowledgments

Marcel Hendrix provided helpful comments on the paper. Andras Zsoter and Bernd Paysan helped me with the related works section.

Glossary

bind ( ... "class" "selector" -- ... )
execute the method for selector in class.
<bind> ( class selector-xt -- xt )
xt is the method for the selector selector-xt in class.
bind' ( "class" "selector" -- xt )
xt is the method for selector in class.
[bind] ( compile-time: "class" "selector" -- ; run-time: ... -- ... )
compile the method for selector in class.
class ( parent-class -- align offset )
start a new class definition as a child of parent-class. align offset are for use by field etc.
class->map ( class -- map )
map is the pointer to class's method map; it points to the place in the map to which the selector offsets refer (i.e., where object-maps point to).
class-inst-size ( class -- addr )
used as class-inst-size 2@ ( class -- align size ), gives the size specification for an instance (i.e. an object) of class.
class-override! ( xt sel-xt class-map -- )
xt is the new method for the selector sel-xt in class-map.
construct ( ... object -- )
initializes the data fields of object. The method for the class object just does nothing ( object -- ).
current' ( "selector" -- xt )
xt is the method for selector in the current class.
[current] ( compile-time: "selector" -- ; run-time: ... -- ... )
compile the method for selector in the current class.
current-interface ( -- addr )
this variable contains the class or interface currently being defined.
dict-new ( ... class -- object )
allot and initialize an object of class class in the dictionary.
drop-order ( class -- )
drops class's wordlists from the search order. No checking is made whether class's wordlists are actually on the search order.
end-class ( align offset "name" -- )
name execution: -- class

ends a class definition. The resulting class is class.

end-class-noname ( align offset -- class )
ends a class definition. The resulting class is class.
end-interface ( "name" -- )
name execution: -- interface

ends an interface definition. The resulting interface is interface.

end-interface-noname ( -- interface )
ends an interface definition. The resulting interface is interface.
exitm ( -- )
exit from a method; restore old this.
heap-new ( ... class -- object )
allocate and initialize an object of class class.
implementation ( interface -- )
the current class implements interface. I.e., you can use all selectors of the interface in the current class and its descendents.
init-object ( ... class object -- )
initializes a chunk of memory (object) to an object of class class; then performs construct.
inst-value ( align1 offset1 "name" -- align2 offset2 )
name execution: -- w

w is the value of the field name in this object.

inst-var ( align1 offset1 align size "name" -- align2 offset2 )
name execution: -- addr

addr is the address of the field name in this object.

interface ( -- )
starts an interface definition.
;m ( colon-sys -- ) ( run-time: -- )
end a method definition; restore old this.
m: ( -- xt colon-sys ) ( run-time: object -- )
start a method definition; object becomes new this.
method ( xt "name" -- )
name execution: ... object -- ...

creates selector name and makes xt its method in the current class.

object ( -- class )
the ancestor of all classes.
overrides ( xt "selector" -- )
replace default method for selector in the current class with xt. overrides must not be used during an interface definition.
[parent] ( compile-time: "selector" -- ; run-time: ... object -- ... )
compile the method for selector in the parent of the current class.
print ( object -- )
prints the object. The method for the class object prints the address of the object and the address of its class.
protected ( -- )
set the compilation wordlist to the current class's wordlist
public ( -- )
restore the compilation wordlist that was in effect before the last protected that actually changed the compilation wordlist.
push-order ( class -- )
add class's wordlists to the search-order (in front)
selector ( "name" -- )
name execution: ... object -- ...

creates selector name for the current class and its descendents; you can set a method for the selector in the current class with overrides.

this ( -- object )
the receiving object of the current method (aka active object).
<to-inst> ( w xt -- )
store w into the field xt in this object.
[to-inst] ( compile-time: "name" -- ; run-time: w -- )
store w into field name in this object.
to-this ( object -- )
sets this (used internally, but useful when debugging).
xt-new ( ... class xt -- object )
makes a new object, using xt ( align size -- addr ) to get memory.

References

[chambers&ungar89] Craig Chambers and David Ungar.
Customization: Optimizing compiler technology for \sc Self, a dynamically-typed object-oriented programming language. In SIGPLAN '89 Conference on Programming Language Design and Implementation, pages 146-160, 1989.
[ertl89] M. Anton Ertl.
GRAY - ein Generator f\"ur rekursiv absteigende Ybersetzer. Praktikum, Institut f\"ur Computersprachen, Technische Universit\"at Wien, 1989. In German.
[ertl97] M. Anton Ertl.
GRAY - ein Generator f\"ur rekursiv absteigende Ybersetzer. In Forth-Tagung, Ludwigshafen, 1997. In German.
[mckewan97] Andrew McKewan.
Object-oriented programming in ANS Forth. Forth Dimensions, March 1997.
[paysan94] Bernd Paysan.
Object oriented bigFORTH. Vierte Dimension, 10 no. 2, 1994. An implementation in ANS Forth is available at http://www.jwdt.com/ paysan/oof.zip.
[pountain87] Dick Pountain.
Object-Oriented Forth. Academic Press, London, 1987.
[rodriguez&poehlman96] Bradford J. Rodriguez and W. F. S. Poehlman.
A survey of object-oriented Forths. SIGPLAN Notices, pages 39-42, April 1996.
[zsoter96] Andr\'as Zs\'oter.
Does late binding have to be slow?. Forth Dimensions, 18 no. 1 pp. 31-35, 1996.