RECORDhave been presented and discussed in the Forth community. One of them was posted on the USENET newsgroup comp.lang.forth in 1989 (unfortunately I don't remember, by whom; possibly John Hayes) and convinced me with its simplicity, elegance, and power (in a word, with its Forth-ness).
I have used this basic approach ever since, e.g, in the parser generator Gray [ertl89][ertl97]. It also inspired my approach for an object-oriented Forth extension. The package I present here adds automatic handling of alignments, a bit of syntactic sugar, and optimization of fields with offset 0.
float+ cell+ faligned
Moreover, it is not very readable, because seeing a
cell+ tells us neither which kind of structure is
accessed nor what field is accessed; we have to somehow infer the kind
of structure, and then look up in the documentation, which field of
that structure corresponds to that offset.
Finally, this kind of address arithmetic also causes maintenance troubles: If you add or delete a field somewhere in the middle of the structure, you have to find and change all computations for the fields afterwards.
So, instead of using
cell+ and friends directly, how
about storing the offsets in constants:
0 constant a-offset 0 float+ constant b-offset 0 float+ cell+ faligned constant c-offsetNow we can get the address of field
x-offset +. This is much better in all respects. Of course, you still have to change all later offset definitions if you add a field. You can fix this by declaring the offsets in the following way:
0 constant a-offset a-offset float+ constant b-offset b-offset cell+ faligned constant c-offsetSince we always use the offsets with
+, using a defining word
cfieldthat includes the
+in the action of the defined word offers itself:
: cfield ( n "name" -- ) create , does> ( name execution: addr1 -- addr2 ) @ + ; 0 cfield a 0 a float+ cfield b 0 b cell+ faligned cfield cInstead of
x-offset +, we now simply write
The structure field words now can be used quite nicely. However, their definition is still a bit cumbersome: We have to repeat the name, the information about size and alignment is distributed before and after the field definitions etc. The structure package presented here addresses these problems.
struct cell% field list-next end-struct list%With the address of the list node on the stack, you can compute the address of the field that contains the address of the next node with
list-next. E.g., you can determine the length of a list with:
: list-length ( list -- n ) \ "list" is a pointer to the first element of a linked list \ "n" is the length of the list 0 begin ( list1 n1 ) over while ( list1 n1 ) 1+ swap list-next @ swap repeat nip ;
You can reserve memory for a list node in the dictionary with
list% %allot, which leaves the address of the list node
on the stack. For the equivalent allocation on the heap you can use
list% %alloc (or, for an
effect (i.e., with ior), use
Note that in ANS Forth the body of a
created word is
aligned but not necessarily
therefore, if you do a
create name foo% %allotthen the memory alloted for
foo%is guaranteed to start at the body of
foo%contains only character, cell and double fields.
You can also include a structure
foo% as field of
another structure, with:
struct ... foo% field ... ... end-struct ...Instead of starting with an empty structure, you can also extend an existing structure. E.g., a plain linked list without data, as defined above, is hardly useful; You can extend it to a linked list of integers, like this:
list% cell% field intlist-int end-struct intlist%
intlist%is a structure with two fields:
This feature is also known as extended records [wirth88]. It is the main innovation in the Oberon language; in other words, adding this feature to Modula-2 led Wirth to create a new language, write a new compiler etc. Adding this feature to Forth just requires a few lines of code.
You can specify an array type containing n elements of
foo% like this:
foo% n *You can use this array type in any place where you can use a normal type, e.g., when defining a
field, or with
The first field is at the base address of a structure and the word
for this field (e.g.,
list-next) actually does not change
the address on the stack. You may be tempted to leave it away in the
interest of run-time and space efficiency. This is not necessary,
because the structure package optimizes this case and compiling such
words does not generate any code. So, in the interest of readability
and maintainability you should include the word for the field when
accessing the field.
counterfield. The structure names that come to (my) mind are often also the logical choice for the names of words that create such a structure.
Therefore, I have adopted the following naming conventions:
structis the basic name of the structure, and
fieldis the basic name of the field. You can think about field words as converting converts the (address of the) structure into the (address of the) field.
structis the basic name of the structure.
intlist-int, but has
The type description on the stack is of the form align size. Keeping the size on the top-of-stack makes dealing with arrays very simple.
field is a defining word that uses
does>. The body of the field contains the offset
of the field, and the normal
does> action is
@ +i.e., add the offset to the address, giving the stack effect
addr1 -- addr2for a field.
This simple structure is slightly complicated by the optimization
for fields with offset 0, which requires a different
does>-part (because we cannot rely on there being
something on the stack if such a field is invoked during
compilation). Therefore, we put the different
in separate words, and decide which one to invoke based on the
offset. For a zero offset, the field is basically a noop; it is
immediate, and therefore no code is generated when it is compiled.
%align ( align size -- )
%alignment ( align size -- align )
%alloc ( align size -- addr )
sizeaddress units with alignment
align, giving a data block at
throws an ior code if not successful.
%allocate ( align size -- addr ior )
sizeaddress units with alignment
align, similar to
%allot ( align size -- addr )
sizeaddress units of data space with alignment
align; the resulting block of data is found at
cell% ( -- align size )
char% ( -- align size )
create-field ( align1 offset1 align size "name" -- align2 offset2 )
field, but without the
dfloat% ( -- align size )
double% ( -- align size )
end-struct ( align size "name" -- )
-- align size2
size aligned with
align; this ensures that all elements of an array of
name elements have alignment
field ( align1 offset1 align size "name" -- align2 offset2 )
addr1 -- addr1+offset1
create a field
name with offset
and the type given by
the offset of the next field, and
align2 is the alignment
of all fields.
float% ( -- align size )
nalign ( addr1 n -- addr2 )
addr2is the aligned version of
addr1wrt the alignment
%size ( align size -- size )
sfloat% ( -- align size )
struct ( -- align size )