version 1.77, 2000/08/21 20:08:02
|
version 1.78, 2000/08/22 18:15:38
|
Line 170 personal machines. This manual correspon
|
Line 170 personal machines. This manual correspon
|
* Name Index:: Forth words, only names listed |
* Name Index:: Forth words, only names listed |
* Concept Index:: A menu covering many topics |
* Concept Index:: A menu covering many topics |
|
|
@detailmenu |
@detailmenu --- The Detailed Node Listing --- |
--- The Detailed Node Listing --- |
|
|
|
Gforth Environment |
Gforth Environment |
|
|
Line 250 Forth Words
|
Line 249 Forth Words
|
* Files:: |
* Files:: |
* Blocks:: |
* Blocks:: |
* Other I/O:: |
* Other I/O:: |
* Programming Tools:: |
|
* Assembler and Code Words:: |
|
* Threading Words:: |
|
* Locals:: |
* Locals:: |
* Structures:: |
* Structures:: |
* Object-oriented Forth:: |
* Object-oriented Forth:: |
|
* Programming Tools:: |
|
* Assembler and Code Words:: |
|
* Threading Words:: |
* Passing Commands to the OS:: |
* Passing Commands to the OS:: |
* Keeping track of Time:: |
* Keeping track of Time:: |
* Miscellaneous Words:: |
* Miscellaneous Words:: |
Line 357 Other I/O
|
Line 356 Other I/O
|
* Displaying characters and strings:: Other stuff |
* Displaying characters and strings:: Other stuff |
* Input:: Input |
* Input:: Input |
|
|
Programming Tools |
|
|
|
* Examining:: |
|
* Forgetting words:: |
|
* Debugging:: Simple and quick. |
|
* Assertions:: Making your programs self-checking. |
|
* Singlestep Debugger:: Executing your program word by word. |
|
|
|
Assembler and Code Words |
|
|
|
* Code and ;code:: |
|
* Common Assembler:: Assembler Syntax |
|
* Common Disassembler:: |
|
* 386 Assembler:: Deviations and special cases |
|
* Alpha Assembler:: Deviations and special cases |
|
* MIPS assembler:: Deviations and special cases |
|
* Other assemblers:: How to write them |
|
|
|
Locals |
Locals |
|
|
* Gforth locals:: |
* Gforth locals:: |
Line 384 Gforth locals
|
Line 365 Gforth locals
|
|
|
* Where are locals visible by name?:: |
* Where are locals visible by name?:: |
* How long do locals live?:: |
* How long do locals live?:: |
* Programming Style:: |
* Locals programming style:: |
* Implementation:: |
* Locals implementation:: |
|
|
Structures |
Structures |
|
|
Line 433 The @file{mini-oof.fs} model
|
Line 414 The @file{mini-oof.fs} model
|
* Mini-OOF Example:: |
* Mini-OOF Example:: |
* Mini-OOF Implementation:: |
* Mini-OOF Implementation:: |
|
|
|
Programming Tools |
|
|
|
* Examining:: |
|
* Forgetting words:: |
|
* Debugging:: Simple and quick. |
|
* Assertions:: Making your programs self-checking. |
|
* Singlestep Debugger:: Executing your program word by word. |
|
|
|
Assembler and Code Words |
|
|
|
* Code and ;code:: |
|
* Common Assembler:: Assembler Syntax |
|
* Common Disassembler:: |
|
* 386 Assembler:: Deviations and special cases |
|
* Alpha Assembler:: Deviations and special cases |
|
* MIPS assembler:: Deviations and special cases |
|
* Other assemblers:: How to write them |
|
|
Tools |
Tools |
|
|
* ANS Report:: Report the words used, sorted by wordset. |
* ANS Report:: Report the words used, sorted by wordset. |
Line 4350 the exercises in a .fs file in the distr
|
Line 4349 the exercises in a .fs file in the distr
|
* Files:: |
* Files:: |
* Blocks:: |
* Blocks:: |
* Other I/O:: |
* Other I/O:: |
* Programming Tools:: |
|
* Assembler and Code Words:: |
|
* Threading Words:: |
|
* Locals:: |
* Locals:: |
* Structures:: |
* Structures:: |
* Object-oriented Forth:: |
* Object-oriented Forth:: |
|
* Programming Tools:: |
|
* Assembler and Code Words:: |
|
* Threading Words:: |
* Passing Commands to the OS:: |
* Passing Commands to the OS:: |
* Keeping track of Time:: |
* Keeping track of Time:: |
* Miscellaneous Words:: |
* Miscellaneous Words:: |
Line 4936 doc-2rdrop
|
Line 4935 doc-2rdrop
|
@node Locals stack, Stack pointer manipulation, Return stack, Stack Manipulation |
@node Locals stack, Stack pointer manipulation, Return stack, Stack Manipulation |
@subsection Locals stack |
@subsection Locals stack |
|
|
Gforth uses an extra locals stack. It is described, along with the |
Gforth uses an extra locals stack. It is described, along with the |
reasons for its existence, in @ref{Implementation,Implementation of locals}. |
reasons for its existence, in @ref{Locals implementation}. |
|
|
@node Stack pointer manipulation, , Locals stack, Stack Manipulation |
@node Stack pointer manipulation, , Locals stack, Stack Manipulation |
@subsection Stack pointer manipulation |
@subsection Stack pointer manipulation |
Line 7714 Forth.
|
Line 7713 Forth.
|
|
|
@comment TODO: locals section refers to here, saying that every word list (aka |
@comment TODO: locals section refers to here, saying that every word list (aka |
@comment vocabulary) has its own methods for searching etc. Need to document that. |
@comment vocabulary) has its own methods for searching etc. Need to document that. |
|
@c anton: but better in a separate subsection on wordlist internals |
|
|
@comment TODO: document markers, reveal, tables, mappedwordlist |
@comment TODO: document markers, reveal, tables, mappedwordlist |
|
|
Line 8377 doc-block-included
|
Line 8377 doc-block-included
|
|
|
|
|
@c ------------------------------------------------------------- |
@c ------------------------------------------------------------- |
@node Other I/O, Programming Tools, Blocks, Words |
@node Other I/O, Locals, Blocks, Words |
@section Other I/O |
@section Other I/O |
@cindex I/O - keyboard and display |
@cindex I/O - keyboard and display |
|
|
Line 8715 doc-expect
|
Line 8715 doc-expect
|
doc-span |
doc-span |
|
|
|
|
|
|
@c ------------------------------------------------------------- |
@c ------------------------------------------------------------- |
@node Programming Tools, Assembler and Code Words, Other I/O, Words |
@node Locals, Structures, Other I/O, Words |
@section Programming Tools |
@section Locals |
@cindex programming tools |
@cindex locals |
|
|
|
Local variables can make Forth programming more enjoyable and Forth |
|
programs easier to read. Unfortunately, the locals of ANS Forth are |
|
laden with restrictions. Therefore, we provide not only the ANS Forth |
|
locals wordset, but also our own, more powerful locals wordset (we |
|
implemented the ANS Forth locals wordset through our locals wordset). |
|
|
|
The ideas in this section have also been published in M. Anton Ertl, |
|
@cite{@uref{http://www.complang.tuwien.ac.at/papers/ertl94l.ps.gz, |
|
Automatic Scoping of Local Variables}}, EuroForth '94. |
|
|
@menu |
@menu |
* Examining:: |
* Gforth locals:: |
* Forgetting words:: |
* ANS Forth locals:: |
* Debugging:: Simple and quick. |
|
* Assertions:: Making your programs self-checking. |
|
* Singlestep Debugger:: Executing your program word by word. |
|
@end menu |
@end menu |
|
|
@node Examining, Forgetting words, Programming Tools, Programming Tools |
@node Gforth locals, ANS Forth locals, Locals, Locals |
@subsection Examining data and code |
@subsection Gforth locals |
@cindex examining data and code |
@cindex Gforth locals |
@cindex data examination |
@cindex locals, Gforth style |
@cindex code examination |
|
|
|
The following words inspect the stack non-destructively: |
Locals can be defined with |
|
|
doc-.s |
@example |
doc-f.s |
@{ local1 local2 ... -- comment @} |
|
@end example |
|
or |
|
@example |
|
@{ local1 local2 ... @} |
|
@end example |
|
|
There is a word @code{.r} but it does @i{not} display the return stack! |
E.g., |
It is used for formatted numeric output (@pxref{Simple numeric output}). |
@example |
|
: max @{ n1 n2 -- n3 @} |
|
n1 n2 > if |
|
n1 |
|
else |
|
n2 |
|
endif ; |
|
@end example |
|
|
doc-depth |
The similarity of locals definitions with stack comments is intended. A |
doc-fdepth |
locals definition often replaces the stack comment of a word. The order |
doc-clearstack |
of the locals corresponds to the order in a stack comment and everything |
|
after the @code{--} is really a comment. |
|
|
The following words inspect memory. |
This similarity has one disadvantage: It is too easy to confuse locals |
|
declarations with stack comments, causing bugs and making them hard to |
|
find. However, this problem can be avoided by appropriate coding |
|
conventions: Do not use both notations in the same program. If you do, |
|
they should be distinguished using additional means, e.g. by position. |
|
|
doc-? |
@cindex types of locals |
doc-dump |
@cindex locals types |
|
The name of the local may be preceded by a type specifier, e.g., |
|
@code{F:} for a floating point value: |
|
|
And finally, @code{see} allows to inspect code: |
@example |
|
: CX* @{ F: Ar F: Ai F: Br F: Bi -- Cr Ci @} |
|
\ complex multiplication |
|
Ar Br f* Ai Bi f* f- |
|
Ar Bi f* Ai Br f* f+ ; |
|
@end example |
|
|
doc-see |
@cindex flavours of locals |
doc-xt-see |
@cindex locals flavours |
|
@cindex value-flavoured locals |
|
@cindex variable-flavoured locals |
|
Gforth currently supports cells (@code{W:}, @code{W^}), doubles |
|
(@code{D:}, @code{D^}), floats (@code{F:}, @code{F^}) and characters |
|
(@code{C:}, @code{C^}) in two flavours: a value-flavoured local (defined |
|
with @code{W:}, @code{D:} etc.) produces its value and can be changed |
|
with @code{TO}. A variable-flavoured local (defined with @code{W^} etc.) |
|
produces its address (which becomes invalid when the variable's scope is |
|
left). E.g., the standard word @code{emit} can be defined in terms of |
|
@code{type} like this: |
|
|
@node Forgetting words, Debugging, Examining, Programming Tools |
@example |
@subsection Forgetting words |
: emit @{ C^ char* -- @} |
@cindex words, forgetting |
char* 1 type ; |
@cindex forgeting words |
@end example |
|
|
@c anton: other, maybe better places for this subsection: Defining Words; |
@cindex default type of locals |
@c Dictionary allocation. At least a reference should be there. |
@cindex locals, default type |
|
A local without type specifier is a @code{W:} local. Both flavours of |
|
locals are initialized with values from the data or FP stack. |
|
|
Forth allows you to forget words (and everything that was alloted in the |
Currently there is no way to define locals with user-defined data |
dictonary after them) in a LIFO manner. |
structures, but we are working on it. |
|
|
doc-marker |
Gforth allows defining locals everywhere in a colon definition. This |
|
poses the following questions: |
|
|
The most common use of this feature is during progam development: when |
@menu |
you change a source file, forget all the words it defined and load it |
* Where are locals visible by name?:: |
again (since you also forget everything defined after the source file |
* How long do locals live?:: |
was loaded, you have to reload that, too). Note that effects like |
* Locals programming style:: |
storing to variables and destroyed system words are not undone when you |
* Locals implementation:: |
forget words. With a system like Gforth, that is fast enough at |
@end menu |
starting up and compiling, I find it more convenient to exit and restart |
|
Gforth, as this gives me a clean slate. |
|
|
|
Here's an example of using @code{marker} at the start of a source file |
@node Where are locals visible by name?, How long do locals live?, Gforth locals, Gforth locals |
that you are debugging; it ensures that you only ever have one copy of |
@subsubsection Where are locals visible by name? |
the file's definitions compiled at any time: |
@cindex locals visibility |
|
@cindex visibility of locals |
|
@cindex scope of locals |
|
|
@example |
Basically, the answer is that locals are visible where you would expect |
[IFDEF] my-code |
it in block-structured languages, and sometimes a little longer. If you |
my-code |
want to restrict the scope of a local, enclose its definition in |
[ENDIF] |
@code{SCOPE}...@code{ENDSCOPE}. |
|
|
marker my-code |
|
init-included-files |
|
|
|
\ .. definitions start here |
doc-scope |
\ . |
doc-endscope |
\ . |
|
\ end |
|
@end example |
|
|
|
|
|
@node Debugging, Assertions, Forgetting words, Programming Tools |
These words behave like control structure words, so you can use them |
@subsection Debugging |
with @code{CS-PICK} and @code{CS-ROLL} to restrict the scope in |
@cindex debugging |
arbitrary ways. |
|
|
Languages with a slow edit/compile/link/test development loop tend to |
If you want a more exact answer to the visibility question, here's the |
require sophisticated tracing/stepping debuggers to facilate debugging. |
basic principle: A local is visible in all places that can only be |
|
reached through the definition of the local@footnote{In compiler |
|
construction terminology, all places dominated by the definition of the |
|
local.}. In other words, it is not visible in places that can be reached |
|
without going through the definition of the local. E.g., locals defined |
|
in @code{IF}...@code{ENDIF} are visible until the @code{ENDIF}, locals |
|
defined in @code{BEGIN}...@code{UNTIL} are visible after the |
|
@code{UNTIL} (until, e.g., a subsequent @code{ENDSCOPE}). |
|
|
A much better (faster) way in fast-compiling languages is to add |
The reasoning behind this solution is: We want to have the locals |
printing code at well-selected places, let the program run, look at |
visible as long as it is meaningful. The user can always make the |
the output, see where things went wrong, add more printing code, etc., |
visibility shorter by using explicit scoping. In a place that can |
until the bug is found. |
only be reached through the definition of a local, the meaning of a |
|
local name is clear. In other places it is not: How is the local |
|
initialized at the control flow path that does not contain the |
|
definition? Which local is meant, if the same name is defined twice in |
|
two independent control flow paths? |
|
|
The simple debugging aids provided in @file{debugs.fs} |
This should be enough detail for nearly all users, so you can skip the |
are meant to support this style of debugging. |
rest of this section. If you really must know all the gory details and |
|
options, read on. |
|
|
The word @code{~~} prints debugging information (by default the source |
In order to implement this rule, the compiler has to know which places |
location and the stack contents). It is easy to insert. If you use Emacs |
are unreachable. It knows this automatically after @code{AHEAD}, |
it is also easy to remove (@kbd{C-x ~} in the Emacs Forth mode to |
@code{AGAIN}, @code{EXIT} and @code{LEAVE}; in other cases (e.g., after |
query-replace them with nothing). The deferred words |
most @code{THROW}s), you can use the word @code{UNREACHABLE} to tell the |
@code{printdebugdata} and @code{printdebugline} control the output of |
compiler that the control flow never reaches that place. If |
@code{~~}. The default source location output format works well with |
@code{UNREACHABLE} is not used where it could, the only consequence is |
Emacs' compilation mode, so you can step through the program at the |
that the visibility of some locals is more limited than the rule above |
source level using @kbd{C-x `} (the advantage over a stepping debugger |
says. If @code{UNREACHABLE} is used where it should not (i.e., if you |
is that you can step in any direction and you know where the crash has |
lie to the compiler), buggy code will be produced. |
happened or where the strange data has occurred). |
|
|
|
doc-~~ |
|
doc-printdebugdata |
|
doc-printdebugline |
|
|
|
@node Assertions, Singlestep Debugger, Debugging, Programming Tools |
doc-unreachable |
@subsection Assertions |
|
@cindex assertions |
|
|
|
It is a good idea to make your programs self-checking, especially if you |
|
make an assumption that may become invalid during maintenance (for |
|
example, that a certain field of a data structure is never zero). Gforth |
|
supports @dfn{assertions} for this purpose. They are used like this: |
|
|
|
|
Another problem with this rule is that at @code{BEGIN}, the compiler |
|
does not know which locals will be visible on the incoming |
|
back-edge. All problems discussed in the following are due to this |
|
ignorance of the compiler (we discuss the problems using @code{BEGIN} |
|
loops as examples; the discussion also applies to @code{?DO} and other |
|
loops). Perhaps the most insidious example is: |
@example |
@example |
assert( @i{flag} ) |
AHEAD |
|
BEGIN |
|
x |
|
[ 1 CS-ROLL ] THEN |
|
@{ x @} |
|
... |
|
UNTIL |
@end example |
@end example |
|
|
The code between @code{assert(} and @code{)} should compute a flag, that |
This should be legal according to the visibility rule. The use of |
should be true if everything is alright and false otherwise. It should |
@code{x} can only be reached through the definition; but that appears |
not change anything else on the stack. The overall stack effect of the |
textually below the use. |
assertion is @code{( -- )}. E.g. |
|
|
From this example it is clear that the visibility rules cannot be fully |
|
implemented without major headaches. Our implementation treats common |
|
cases as advertised and the exceptions are treated in a safe way: The |
|
compiler makes a reasonable guess about the locals visible after a |
|
@code{BEGIN}; if it is too pessimistic, the |
|
user will get a spurious error about the local not being defined; if the |
|
compiler is too optimistic, it will notice this later and issue a |
|
warning. In the case above the compiler would complain about @code{x} |
|
being undefined at its use. You can see from the obscure examples in |
|
this section that it takes quite unusual control structures to get the |
|
compiler into trouble, and even then it will often do fine. |
|
|
|
If the @code{BEGIN} is reachable from above, the most optimistic guess |
|
is that all locals visible before the @code{BEGIN} will also be |
|
visible after the @code{BEGIN}. This guess is valid for all loops that |
|
are entered only through the @code{BEGIN}, in particular, for normal |
|
@code{BEGIN}...@code{WHILE}...@code{REPEAT} and |
|
@code{BEGIN}...@code{UNTIL} loops and it is implemented in our |
|
compiler. When the branch to the @code{BEGIN} is finally generated by |
|
@code{AGAIN} or @code{UNTIL}, the compiler checks the guess and |
|
warns the user if it was too optimistic: |
@example |
@example |
assert( 1 1 + 2 = ) \ what we learn in school |
IF |
assert( dup 0<> ) \ assert that the top of stack is not zero |
@{ x @} |
assert( false ) \ this code should not be reached |
BEGIN |
|
\ x ? |
|
[ 1 cs-roll ] THEN |
|
... |
|
UNTIL |
@end example |
@end example |
|
|
The need for assertions is different at different times. During |
Here, @code{x} lives only until the @code{BEGIN}, but the compiler |
debugging, we want more checking, in production we sometimes care more |
optimistically assumes that it lives until the @code{THEN}. It notices |
for speed. Therefore, assertions can be turned off, i.e., the assertion |
this difference when it compiles the @code{UNTIL} and issues a |
becomes a comment. Depending on the importance of an assertion and the |
warning. The user can avoid the warning, and make sure that @code{x} |
time it takes to check it, you may want to turn off some assertions and |
is not used in the wrong area by using explicit scoping: |
keep others turned on. Gforth provides several levels of assertions for |
@example |
this purpose: |
IF |
|
SCOPE |
|
@{ x @} |
|
ENDSCOPE |
|
BEGIN |
|
[ 1 cs-roll ] THEN |
|
... |
|
UNTIL |
|
@end example |
|
|
|
Since the guess is optimistic, there will be no spurious error messages |
|
about undefined locals. |
|
|
doc-assert0( |
If the @code{BEGIN} is not reachable from above (e.g., after |
doc-assert1( |
@code{AHEAD} or @code{EXIT}), the compiler cannot even make an |
doc-assert2( |
optimistic guess, as the locals visible after the @code{BEGIN} may be |
doc-assert3( |
defined later. Therefore, the compiler assumes that no locals are |
doc-assert( |
visible after the @code{BEGIN}. However, the user can use |
doc-) |
@code{ASSUME-LIVE} to make the compiler assume that the same locals are |
|
visible at the BEGIN as at the point where the top control-flow stack |
|
item was created. |
|
|
|
|
The variable @code{assert-level} specifies the highest assertions that |
doc-assume-live |
are turned on. I.e., at the default @code{assert-level} of one, |
|
@code{assert0(} and @code{assert1(} assertions perform checking, while |
|
@code{assert2(} and @code{assert3(} assertions are treated as comments. |
|
|
|
The value of @code{assert-level} is evaluated at compile-time, not at |
|
run-time. Therefore you cannot turn assertions on or off at run-time; |
|
you have to set the @code{assert-level} appropriately before compiling a |
|
piece of code. You can compile different pieces of code at different |
|
@code{assert-level}s (e.g., a trusted library at level 1 and |
|
newly-written code at level 3). |
|
|
|
|
@noindent |
|
E.g., |
|
@example |
|
@{ x @} |
|
AHEAD |
|
ASSUME-LIVE |
|
BEGIN |
|
x |
|
[ 1 CS-ROLL ] THEN |
|
... |
|
UNTIL |
|
@end example |
|
|
doc-assert-level |
Other cases where the locals are defined before the @code{BEGIN} can be |
|
handled by inserting an appropriate @code{CS-ROLL} before the |
|
@code{ASSUME-LIVE} (and changing the control-flow stack manipulation |
|
behind the @code{ASSUME-LIVE}). |
|
|
|
Cases where locals are defined after the @code{BEGIN} (but should be |
|
visible immediately after the @code{BEGIN}) can only be handled by |
|
rearranging the loop. E.g., the ``most insidious'' example above can be |
|
arranged into: |
|
@example |
|
BEGIN |
|
@{ x @} |
|
... 0= |
|
WHILE |
|
x |
|
REPEAT |
|
@end example |
|
|
If an assertion fails, a message compatible with Emacs' compilation mode |
@node How long do locals live?, Locals programming style, Where are locals visible by name?, Gforth locals |
is produced and the execution is aborted (currently with @code{ABORT"}. |
@subsubsection How long do locals live? |
If there is interest, we will introduce a special throw code. But if you |
@cindex locals lifetime |
intend to @code{catch} a specific condition, using @code{throw} is |
@cindex lifetime of locals |
probably more appropriate than an assertion). |
|
|
|
Definitions in ANS Forth for these assertion words are provided |
The right answer for the lifetime question would be: A local lives at |
in @file{compat/assert.fs}. |
least as long as it can be accessed. For a value-flavoured local this |
|
means: until the end of its visibility. However, a variable-flavoured |
|
local could be accessed through its address far beyond its visibility |
|
scope. Ultimately, this would mean that such locals would have to be |
|
garbage collected. Since this entails un-Forth-like implementation |
|
complexities, I adopted the same cowardly solution as some other |
|
languages (e.g., C): The local lives only as long as it is visible; |
|
afterwards its address is invalid (and programs that access it |
|
afterwards are erroneous). |
|
|
|
@node Locals programming style, Locals implementation, How long do locals live?, Gforth locals |
|
@subsubsection Locals programming style |
|
@cindex locals programming style |
|
@cindex programming style, locals |
|
|
@node Singlestep Debugger, , Assertions, Programming Tools |
The freedom to define locals anywhere has the potential to change |
@subsection Singlestep Debugger |
programming styles dramatically. In particular, the need to use the |
@cindex singlestep Debugger |
return stack for intermediate storage vanishes. Moreover, all stack |
@cindex debugging Singlestep |
manipulations (except @code{PICK}s and @code{ROLL}s with run-time |
|
determined arguments) can be eliminated: If the stack items are in the |
|
wrong order, just write a locals definition for all of them; then |
|
write the items in the order you want. |
|
|
When you create a new word there's often the need to check whether it |
This seems a little far-fetched and eliminating stack manipulations is |
behaves correctly or not. You can do this by typing @code{dbg |
unlikely to become a conscious programming objective. Still, the number |
badword}. A debug session might look like this: |
of stack manipulations will be reduced dramatically if local variables |
|
are used liberally (e.g., compare @code{max} (@pxref{Gforth locals}) with |
|
a traditional implementation of @code{max}). |
|
|
@example |
This shows one potential benefit of locals: making Forth programs more |
: badword 0 DO i . LOOP ; ok |
readable. Of course, this benefit will only be realized if the |
2 dbg badword |
programmers continue to honour the principle of factoring instead of |
: badword |
using the added latitude to make the words longer. |
Scanning code... |
|
|
|
Nesting debugger ready! |
@cindex single-assignment style for locals |
|
Using @code{TO} can and should be avoided. Without @code{TO}, |
|
every value-flavoured local has only a single assignment and many |
|
advantages of functional languages apply to Forth. I.e., programs are |
|
easier to analyse, to optimize and to read: It is clear from the |
|
definition what the local stands for, it does not turn into something |
|
different later. |
|
|
400D4738 8049BC4 0 -> [ 2 ] 00002 00000 |
E.g., a definition using @code{TO} might look like this: |
400D4740 8049F68 DO -> [ 0 ] |
@example |
400D4744 804A0C8 i -> [ 1 ] 00000 |
: strcmp @{ addr1 u1 addr2 u2 -- n @} |
400D4748 400C5E60 . -> 0 [ 0 ] |
u1 u2 min 0 |
400D474C 8049D0C LOOP -> [ 0 ] |
?do |
400D4744 804A0C8 i -> [ 1 ] 00001 |
addr1 c@@ addr2 c@@ - |
400D4748 400C5E60 . -> 1 [ 0 ] |
?dup-if |
400D474C 8049D0C LOOP -> [ 0 ] |
unloop exit |
400D4758 804B384 ; -> ok |
then |
|
addr1 char+ TO addr1 |
|
addr2 char+ TO addr2 |
|
loop |
|
u1 u2 - ; |
@end example |
@end example |
|
Here, @code{TO} is used to update @code{addr1} and @code{addr2} at |
|
every loop iteration. @code{strcmp} is a typical example of the |
|
readability problems of using @code{TO}. When you start reading |
|
@code{strcmp}, you think that @code{addr1} refers to the start of the |
|
string. Only near the end of the loop you realize that it is something |
|
else. |
|
|
Each line displayed is one step. You always have to hit return to |
This can be avoided by defining two locals at the start of the loop that |
execute the next word that is displayed. If you don't want to execute |
are initialized with the right value for the current iteration. |
the next word in a whole, you have to type @kbd{n} for @code{nest}. Here is |
@example |
an overview what keys are available: |
: strcmp @{ addr1 u1 addr2 u2 -- n @} |
|
addr1 addr2 |
|
u1 u2 min 0 |
|
?do @{ s1 s2 @} |
|
s1 c@@ s2 c@@ - |
|
?dup-if |
|
unloop exit |
|
then |
|
s1 char+ s2 char+ |
|
loop |
|
2drop |
|
u1 u2 - ; |
|
@end example |
|
Here it is clear from the start that @code{s1} has a different value |
|
in every loop iteration. |
|
|
@table @i |
@node Locals implementation, , Locals programming style, Gforth locals |
|
@subsubsection Locals implementation |
|
@cindex locals implementation |
|
@cindex implementation of locals |
|
|
@item @key{RET} |
@cindex locals stack |
Next; Execute the next word. |
Gforth uses an extra locals stack. The most compelling reason for |
|
this is that the return stack is not float-aligned; using an extra stack |
|
also eliminates the problems and restrictions of using the return stack |
|
as locals stack. Like the other stacks, the locals stack grows toward |
|
lower addresses. A few primitives allow an efficient implementation: |
|
|
@item n |
|
Nest; Single step through next word. |
|
|
|
@item u |
doc-@local# |
Unnest; Stop debugging and execute rest of word. If we got to this word |
doc-f@local# |
with nest, continue debugging with the calling word. |
doc-laddr# |
|
doc-lp+!# |
|
doc-lp! |
|
doc->l |
|
doc-f>l |
|
|
@item d |
|
Done; Stop debugging and execute rest. |
|
|
|
@item s |
In addition to these primitives, some specializations of these |
Stop; Abort immediately. |
primitives for commonly occurring inline arguments are provided for |
|
efficiency reasons, e.g., @code{@@local0} as specialization of |
|
@code{@@local#} for the inline argument 0. The following compiling words |
|
compile the right specialized version, or the general version, as |
|
appropriate: |
|
|
@end table |
|
|
|
Debugging large application with this mechanism is very difficult, because |
doc-compile-@local |
you have to nest very deeply into the program before the interesting part |
doc-compile-f@local |
begins. This takes a lot of time. |
doc-compile-lp+! |
|
|
To do it more directly put a @code{BREAK:} command into your source code. |
|
When program execution reaches @code{BREAK:} the single step debugger is |
|
invoked and you have all the features described above. |
|
|
|
If you have more than one part to debug it is useful to know where the |
Combinations of conditional branches and @code{lp+!#} like |
program has stopped at the moment. You can do this by the |
@code{?branch-lp+!#} (the locals pointer is only changed if the branch |
@code{BREAK" string"} command. This behaves like @code{BREAK:} except that |
is taken) are provided for efficiency and correctness in loops. |
string is typed out when the ``breakpoint'' is reached. |
|
|
|
|
A special area in the dictionary space is reserved for keeping the |
|
local variable names. @code{@{} switches the dictionary pointer to this |
|
area and @code{@}} switches it back and generates the locals |
|
initializing code. @code{W:} etc.@ are normal defining words. This |
|
special area is cleared at the start of every colon definition. |
|
|
doc-dbg |
@cindex word list for defining locals |
doc-break: |
A special feature of Gforth's dictionary is used to implement the |
doc-break" |
definition of locals without type specifiers: every word list (aka |
|
vocabulary) has its own methods for searching |
|
etc. (@pxref{Word Lists}). For the present purpose we defined a word list |
|
with a special search method: When it is searched for a word, it |
|
actually creates that word using @code{W:}. @code{@{} changes the search |
|
order to first search the word list containing @code{@}}, @code{W:} etc., |
|
and then the word list for defining locals without type specifiers. |
|
|
|
The lifetime rules support a stack discipline within a colon |
|
definition: The lifetime of a local is either nested with other locals |
|
lifetimes or it does not overlap them. |
|
|
|
At @code{BEGIN}, @code{IF}, and @code{AHEAD} no code for locals stack |
|
pointer manipulation is generated. Between control structure words |
|
locals definitions can push locals onto the locals stack. @code{AGAIN} |
|
is the simplest of the other three control flow words. It has to |
|
restore the locals stack depth of the corresponding @code{BEGIN} |
|
before branching. The code looks like this: |
|
@format |
|
@code{lp+!#} current-locals-size @minus{} dest-locals-size |
|
@code{branch} <begin> |
|
@end format |
|
|
@c ------------------------------------------------------------- |
@code{UNTIL} is a little more complicated: If it branches back, it |
@node Assembler and Code Words, Threading Words, Programming Tools, Words |
must adjust the stack just like @code{AGAIN}. But if it falls through, |
@section Assembler and Code Words |
the locals stack must not be changed. The compiler generates the |
@cindex assembler |
following code: |
@cindex code words |
@format |
|
@code{?branch-lp+!#} <begin> current-locals-size @minus{} dest-locals-size |
|
@end format |
|
The locals stack pointer is only adjusted if the branch is taken. |
|
|
@menu |
@code{THEN} can produce somewhat inefficient code: |
* Code and ;code:: |
@format |
* Common Assembler:: Assembler Syntax |
@code{lp+!#} current-locals-size @minus{} orig-locals-size |
* Common Disassembler:: |
<orig target>: |
* 386 Assembler:: Deviations and special cases |
@code{lp+!#} orig-locals-size @minus{} new-locals-size |
* Alpha Assembler:: Deviations and special cases |
@end format |
* MIPS assembler:: Deviations and special cases |
The second @code{lp+!#} adjusts the locals stack pointer from the |
* Other assemblers:: How to write them |
level at the @i{orig} point to the level after the @code{THEN}. The |
@end menu |
first @code{lp+!#} adjusts the locals stack pointer from the current |
|
level to the level at the orig point, so the complete effect is an |
|
adjustment from the current level to the right level after the |
|
@code{THEN}. |
|
|
@node Code and ;code, Common Assembler, Assembler and Code Words, Assembler and Code Words |
@cindex locals information on the control-flow stack |
@subsection @code{Code} and @code{;code} |
@cindex control-flow stack items, locals information |
|
In a conventional Forth implementation a dest control-flow stack entry |
|
is just the target address and an orig entry is just the address to be |
|
patched. Our locals implementation adds a word list to every orig or dest |
|
item. It is the list of locals visible (or assumed visible) at the point |
|
described by the entry. Our implementation also adds a tag to identify |
|
the kind of entry, in particular to differentiate between live and dead |
|
(reachable and unreachable) orig entries. |
|
|
Gforth provides some words for defining primitives (words written in |
A few unusual operations have to be performed on locals word lists: |
machine code), and for defining the machine-code equivalent of |
|
@code{DOES>}-based defining words. However, the machine-independent |
|
nature of Gforth poses a few problems: First of all, Gforth runs on |
|
several architectures, so it can provide no standard assembler. What's |
|
worse is that the register allocation not only depends on the processor, |
|
but also on the @code{gcc} version and options used. |
|
|
|
The words that Gforth offers encapsulate some system dependences (e.g., |
|
the header structure), so a system-independent assembler may be used in |
|
Gforth. If you do not have an assembler, you can compile machine code |
|
directly with @code{,} and @code{c,}@footnote{This isn't portable, |
|
because these words emit stuff in @i{data} space; it works because |
|
Gforth has unified code/data spaces. Assembler isn't likely to be |
|
portable anyway.}. |
|
|
|
|
doc-common-list |
|
doc-sub-list? |
|
doc-list-size |
|
|
doc-assembler |
|
doc-init-asm |
|
doc-code |
|
doc-end-code |
|
doc-;code |
|
doc-flush-icache |
|
|
|
|
Several features of our locals word list implementation make these |
|
operations easy to implement: The locals word lists are organised as |
|
linked lists; the tails of these lists are shared, if the lists |
|
contain some of the same locals; and the address of a name is greater |
|
than the address of the names behind it in the list. |
|
|
If @code{flush-icache} does not work correctly, @code{code} words |
Another important implementation detail is the variable |
etc. will not work (reliably), either. |
@code{dead-code}. It is used by @code{BEGIN} and @code{THEN} to |
|
determine if they can be reached directly or only through the branch |
|
that they resolve. @code{dead-code} is set by @code{UNREACHABLE}, |
|
@code{AHEAD}, @code{EXIT} etc., and cleared at the start of a colon |
|
definition, by @code{BEGIN} and usually by @code{THEN}. |
|
|
The typical usage of these @code{code} words can be shown most easily by |
Counted loops are similar to other loops in most respects, but |
analogy to the equivalent high-level defining words: |
@code{LEAVE} requires special attention: It performs basically the same |
|
service as @code{AHEAD}, but it does not create a control-flow stack |
|
entry. Therefore the information has to be stored elsewhere; |
|
traditionally, the information was stored in the target fields of the |
|
branches created by the @code{LEAVE}s, by organizing these fields into a |
|
linked list. Unfortunately, this clever trick does not provide enough |
|
space for storing our extended control flow information. Therefore, we |
|
introduce another stack, the leave stack. It contains the control-flow |
|
stack entries for all unresolved @code{LEAVE}s. |
|
|
|
Local names are kept until the end of the colon definition, even if |
|
they are no longer visible in any control-flow path. In a few cases |
|
this may lead to increased space needs for the locals name area, but |
|
usually less than reclaiming this space would cost in code size. |
|
|
|
|
|
@node ANS Forth locals, , Gforth locals, Locals |
|
@subsection ANS Forth locals |
|
@cindex locals, ANS Forth style |
|
|
|
The ANS Forth locals wordset does not define a syntax for locals, but |
|
words that make it possible to define various syntaxes. One of the |
|
possible syntaxes is a subset of the syntax we used in the Gforth locals |
|
wordset, i.e.: |
|
|
@example |
@example |
: foo code foo |
@{ local1 local2 ... -- comment @} |
<high-level Forth words> <assembler> |
@end example |
; end-code |
@noindent |
|
or |
: bar : bar |
@example |
<high-level Forth words> <high-level Forth words> |
@{ local1 local2 ... @} |
CREATE CREATE |
|
<high-level Forth words> <high-level Forth words> |
|
DOES> ;code |
|
<high-level Forth words> <assembler> |
|
; end-code |
|
@end example |
@end example |
|
|
@c anton: the following stuff is also in "Common Assembler", in less detail. |
The order of the locals corresponds to the order in a stack comment. The |
|
restrictions are: |
|
|
@cindex registers of the inner interpreter |
@itemize @bullet |
In the assembly code you will want to refer to the inner interpreter's |
@item |
registers (e.g., the data stack pointer) and you may want to use other |
Locals can only be cell-sized values (no type specifiers are allowed). |
registers for temporary storage. Unfortunately, the register allocation |
@item |
is installation-dependent. |
Locals can be defined only outside control structures. |
|
@item |
|
Locals can interfere with explicit usage of the return stack. For the |
|
exact (and long) rules, see the standard. If you don't use return stack |
|
accessing words in a definition using locals, you will be all right. The |
|
purpose of this rule is to make locals implementation on the return |
|
stack easier. |
|
@item |
|
The whole definition must be in one line. |
|
@end itemize |
|
|
In particular, @code{ip} (Forth instruction pointer) and @code{rp} |
Locals defined in ANS Forth behave like @code{VALUE}s |
(return stack pointer) are in different places in @code{gforth} and |
(@pxref{Values}). I.e., they are initialized from the stack. Using their |
@code{gforth-fast}. This means that you cannot write a @code{NEXT} |
name produces their value. Their value can be changed using @code{TO}. |
routine that works on both versions; so for doing @code{NEXT}, I |
|
recomment jumping to @code{' noop >code-address}, which contains nothing |
|
but a @code{NEXT}. |
|
|
|
For general accesses to the inner interpreter's registers, the easiest |
Since the syntax above is supported by Gforth directly, you need not do |
solution is to use explicit register declarations (@pxref{Explicit Reg |
anything to use it. If you want to port a program using this syntax to |
Vars, , Variables in Specified Registers, gcc.info, GNU C Manual}) for |
another ANS Forth system, use @file{compat/anslocal.fs} to implement the |
all of the inner interpreter's registers: You have to compile Gforth |
syntax on the other system. |
with @code{-DFORCE_REG} (configure option @code{--enable-force-reg}) and |
|
the appropriate declarations must be present in the @code{machine.h} |
|
file (see @code{mips.h} for an example; you can find a full list of all |
|
declarable register symbols with @code{grep register engine.c}). If you |
|
give explicit registers to all variables that are declared at the |
|
beginning of @code{engine()}, you should be able to use the other |
|
caller-saved registers for temporary storage. Alternatively, you can use |
|
the @code{gcc} option @code{-ffixed-REG} (@pxref{Code Gen Options, , |
|
Options for Code Generation Conventions, gcc.info, GNU C Manual}) to |
|
reserve a register (however, this restriction on register allocation may |
|
slow Gforth significantly). |
|
|
|
If this solution is not viable (e.g., because @code{gcc} does not allow |
Note that a syntax shown in the standard, section A.13 looks |
you to explicitly declare all the registers you need), you have to find |
similar, but is quite different in having the order of locals |
out by looking at the code where the inner interpreter's registers |
reversed. Beware! |
reside and which registers can be used for temporary storage. You can |
|
get an assembly listing of the engine's code with @code{make engine.s}. |
|
|
|
In any case, it is good practice to abstract your assembly code from the |
The ANS Forth locals wordset itself consists of one word: |
actual register allocation. E.g., if the data stack pointer resides in |
|
register @code{$17}, create an alias for this register called @code{sp}, |
|
and use that in your assembly code. |
|
|
|
@cindex code words, portable |
doc-(local) |
Another option for implementing normal and defining words efficiently |
|
is to add the desired functionality to the source of Gforth. For normal |
|
words you just have to edit @file{primitives} (@pxref{Automatic |
|
Generation}). Defining words (equivalent to @code{;CODE} words, for fast |
|
defined words) may require changes in @file{engine.c}, @file{kernel.fs}, |
|
@file{prims2x.fs}, and possibly @file{cross.fs}. |
|
|
|
@node Common Assembler, Common Disassembler, Code and ;code, Assembler and Code Words |
The ANS Forth locals extension wordset defines a syntax using |
@subsection Common Assembler |
@code{locals|}, but it is so awful that we strongly recommend not to use |
|
it. We have implemented this syntax to make porting to Gforth easy, but |
|
do not document it here. The problem with this syntax is that the locals |
|
are defined in an order reversed with respect to the standard stack |
|
comment notation, making programs harder to read, and easier to misread |
|
and miswrite. The only merit of this syntax is that it is easy to |
|
implement using the ANS Forth locals wordset. |
|
|
The assemblers in Gforth generally use a postfix syntax, i.e., the |
|
instruction name follows the operands. |
|
|
|
The operands are passed in the usual order (the same that is used in the |
@c ---------------------------------------------------------- |
manual of the architecture). Since they all are Forth words, they have |
@node Structures, Object-oriented Forth, Locals, Words |
to be separated by spaces; you can also use Forth words to compute the |
@section Structures |
operands. |
@cindex structures |
|
@cindex records |
|
|
The instruction names usually end with a @code{,}. This makes it easier |
This section presents the structure package that comes with Gforth. A |
to visually separate instructions if you put several of them on one |
version of the package implemented in ANS Forth is available in |
line; it also avoids shadowing other Forth words (e.g., @code{and}). |
@file{compat/struct.fs}. This package was inspired by a posting on |
|
comp.lang.forth in 1989 (unfortunately I don't remember, by whom; |
|
possibly John Hayes). A version of this section has been published in |
|
M. Anton Ertl, |
|
@uref{http://www.complang.tuwien.ac.at/forth/objects/structs.html, Yet |
|
Another Forth Structures Package}, Forth Dimensions 19(3), pages |
|
13--16. Marcel Hendrix provided helpful comments. |
|
|
Registers are usually specified by number; e.g., (decimal) @code{11} |
@menu |
specifies registers R11 and F11 on the Alpha architecture (which one, |
* Why explicit structure support?:: |
depends on the instruction). The usual names are also available, e.g., |
* Structure Usage:: |
@code{s2} for R11 on Alpha. |
* Structure Naming Convention:: |
|
* Structure Implementation:: |
|
* Structure Glossary:: |
|
@end menu |
|
|
Control flow is specified similar to normal Forth code (@pxref{Arbitrary |
@node Why explicit structure support?, Structure Usage, Structures, Structures |
control structures}), with @code{if,}, @code{ahead,}, @code{then,}, |
@subsection Why explicit structure support? |
@code{begin,}, @code{until,}, @code{again,}, @code{cs-roll}, |
|
@code{cs-pick}, @code{else,}, @code{while,}, and @code{repeat,}. The |
|
conditions are specified in a way specific to each assembler. |
|
|
|
Note that the register assignments of the Gforth engine can change |
@cindex address arithmetic for structures |
between Gforth versions, or even between different compilations of the |
@cindex structures using address arithmetic |
same Gforth version (e.g., if you use a different GCC version). So if |
If we want to use a structure containing several fields, we could simply |
you want to refer to Gforth's registers (e.g., the stack pointer or |
reserve memory for it, and access the fields using address arithmetic |
TOS), I recommend defining your own words for refering to these |
(@pxref{Address arithmetic}). As an example, consider a structure with |
registers, and using them later on; then you can easily adapt to a |
the following fields |
changed register assignment. The stability of the register assignment |
|
is usually better if you build Gforth with @code{--enable-force-reg}. |
|
|
|
In particular, the return stack pointer and the instruction pointer are |
@table @code |
in memory in @code{gforth}, and usually in registers in |
@item a |
@code{gforth-fast}. The most common use of these registers is to |
is a float |
dispatch to the next word (the @code{next} routine). A portable way to |
@item b |
do this is to jump to @code{' noop >code-address} (of course, this is |
is a cell |
less efficient than integrating the @code{next} code and scheduling it |
@item c |
well). |
is a float |
|
@end table |
|
|
@node Common Disassembler, 386 Assembler, Common Assembler, Assembler and Code Words |
Given the (float-aligned) base address of the structure we get the |
@subsection Common Disassembler |
address of the field |
|
|
You can disassemble a @code{code} word with @code{see} |
@table @code |
(@pxref{Debugging}). You can disassemble a section of memory with |
@item a |
|
without doing anything further. |
|
@item b |
|
with @code{float+} |
|
@item c |
|
with @code{float+ cell+ faligned} |
|
@end table |
|
|
doc-disasm |
It is easy to see that this can become quite tiring. |
|
|
The disassembler generally produces output that can be fed into the |
Moreover, it is not very readable, because seeing a |
assembler (i.e., same syntax, etc.). It also includes additional |
@code{cell+} tells us neither which kind of structure is |
information in comments. In particular, the address of the instruction |
accessed nor what field is accessed; we have to somehow infer the kind |
is given in a comment before the instruction. |
of structure, and then look up in the documentation, which field of |
|
that structure corresponds to that offset. |
|
|
@code{See} may display more or less than the actual code of the word, |
Finally, this kind of address arithmetic also causes maintenance |
because the recognition of the end of the code is unreliable. You can |
troubles: If you add or delete a field somewhere in the middle of the |
use @code{disasm} if it did not display enough. It may display more, if |
structure, you have to find and change all computations for the fields |
the code word is not immediately followed by a named word. If you have |
afterwards. |
something else there, you can follow the word with @code{align last @ ,} |
|
to ensure that the end is recognized. |
|
|
|
@node 386 Assembler, Alpha Assembler, Common Disassembler, Assembler and Code Words |
So, instead of using @code{cell+} and friends directly, how |
@subsection 386 Assembler |
about storing the offsets in constants: |
|
|
The 386 assembler included in Gforth was written by Bernd Paysan, it's |
@example |
available under GPL, and originally part of bigFORTH. |
0 constant a-offset |
|
0 float+ constant b-offset |
|
0 float+ cell+ faligned c-offset |
|
@end example |
|
|
The 386 disassembler included in Gforth was written by Andrew McKewan |
Now we can get the address of field @code{x} with @code{x-offset |
and is in the public domain. |
+}. This is much better in all respects. Of course, you still |
|
have to change all later offset definitions if you add a field. You can |
|
fix this by declaring the offsets in the following way: |
|
|
The disassembler displays code in prefix Intel syntax. |
@example |
|
0 constant a-offset |
|
a-offset float+ constant b-offset |
|
b-offset cell+ faligned constant c-offset |
|
@end example |
|
|
The assembler uses a postfix syntax with reversed parameters. |
Since we always use the offsets with @code{+}, we could use a defining |
|
word @code{cfield} that includes the @code{+} in the action of the |
|
defined word: |
|
|
The assembler includes all instruction of the Athlon, i.e. 486 core |
@example |
instructions, Pentium and PPro extensions, floating point, MMX, 3Dnow!, |
: cfield ( n "name" -- ) |
but not ISSE. It's an integrated 16- and 32-bit assembler. Default is 32 |
create , |
bit, you can switch to 16 bit with .86 and back to 32 bit with .386. |
does> ( name execution: addr1 -- addr2 ) |
|
@@ + ; |
|
|
There are several prefixes to switch between different operation sizes, |
0 cfield a |
@code{.b} for byte accesses, @code{.w} for word accesses, @code{.d} for |
0 a float+ cfield b |
double-word accesses. Addressing modes can be switched with @code{.wa} |
0 b cell+ faligned cfield c |
for 16 bit addresses, and @code{.da} for 32 bit addresses. You don't |
@end example |
need a prefix for byte register names (@code{AL} et al). |
|
|
|
For floating point operations, the prefixes are @code{.fs} (IEEE |
Instead of @code{x-offset +}, we now simply write @code{x}. |
single), @code{.fl} (IEEE double), @code{.fx} (extended), @code{.fw} |
|
(word), @code{.fd} (double-word), and @code{.fq} (quad-word). |
|
|
|
The MMX opcodes don't have size prefixes, they are spelled out like in |
The structure field words now can be used quite nicely. However, |
the Intel assembler. Instead of move from and to memory, there are |
their definition is still a bit cumbersome: We have to repeat the |
PLDQ/PLDD and PSTQ/PSTD. |
name, the information about size and alignment is distributed before |
|
and after the field definitions etc. The structure package presented |
|
here addresses these problems. |
|
|
The registers lack the 'e' prefix; even in 32 bit mode, eax is called |
@node Structure Usage, Structure Naming Convention, Why explicit structure support?, Structures |
ax. Immediate values are indicated by postfixing them with @code{#}, |
@subsection Structure Usage |
e.g., @code{3 #}. Here are some examples of addressing modes: |
@cindex structure usage |
|
|
|
@cindex @code{field} usage |
|
@cindex @code{struct} usage |
|
@cindex @code{end-struct} usage |
|
You can define a structure for a (data-less) linked list with: |
@example |
@example |
3 # \ immediate |
struct |
ax \ register |
cell% field list-next |
100 di d) \ 100[edi] |
end-struct list% |
4 bx cx di) \ 4[ebx][ecx] |
|
di ax *4 i) \ [edi][eax*4] |
|
20 ax *4 i#) \ 20[eax*4] |
|
@end example |
@end example |
|
|
Some example of instructions are: |
With the address of the list node on the stack, you can compute the |
|
address of the field that contains the address of the next node with |
|
@code{list-next}. E.g., you can determine the length of a list |
|
with: |
|
|
@example |
@example |
ax bx mov \ move ebx,eax |
: list-length ( list -- n ) |
3 # ax mov \ mov eax,3 |
\ "list" is a pointer to the first element of a linked list |
100 di ) ax mov \ mov eax,100[edi] |
\ "n" is the length of the list |
4 bx cx di) ax mov \ mov eax,4[ebx][ecx] |
0 BEGIN ( list1 n1 ) |
.w ax bx mov \ mov bx,ax |
over |
|
WHILE ( list1 n1 ) |
|
1+ swap list-next @@ swap |
|
REPEAT |
|
nip ; |
@end example |
@end example |
|
|
The following forms are supported for binary instructions: |
You can reserve memory for a list node in the dictionary with |
|
@code{list% %allot}, which leaves the address of the list node on the |
|
stack. For the equivalent allocation on the heap you can use @code{list% |
|
%alloc} (or, for an @code{allocate}-like stack effect (i.e., with ior), |
|
use @code{list% %allocate}). You can get the the size of a list |
|
node with @code{list% %size} and its alignment with @code{list% |
|
%alignment}. |
|
|
|
Note that in ANS Forth the body of a @code{create}d word is |
|
@code{aligned} but not necessarily @code{faligned}; |
|
therefore, if you do a: |
|
|
@example |
@example |
<reg> <reg> <inst> |
create @emph{name} foo% %allot drop |
<n> # <reg> <inst> |
|
<mem> <reg> <inst> |
|
<reg> <mem> <inst> |
|
@end example |
@end example |
|
|
Immediate to memory is not supported. The shift/rotate syntax is: |
@noindent |
|
then the memory alloted for @code{foo%} is guaranteed to start at the |
|
body of @code{@emph{name}} only if @code{foo%} contains only character, |
|
cell and double fields. Therefore, if your structure contains floats, |
|
better use |
|
|
@example |
@example |
<reg/mem> 1 # shl \ shortens to shift without immediate |
foo% %allot constant @emph{name} |
<reg/mem> 4 # shl |
|
<reg/mem> cl shl |
|
@end example |
@end example |
|
|
Precede string instructions (@code{movs} etc.) with @code{.b} to get |
@cindex structures containing structures |
the byte version. |
You can include a structure @code{foo%} as a field of |
|
another structure, like this: |
The control structure words @code{IF} @code{UNTIL} etc. must be preceded |
|
by one of these conditions: @code{vs vc u< u>= 0= 0<> u<= u> 0< 0>= ps |
|
pc < >= <= >}. (Note that most of these words shadow some Forth words |
|
when @code{assembler} is in front of @code{forth} in the search path, |
|
e.g., in @code{code} words). Currently the control structure words use |
|
one stack item, so you have to use @code{roll} instead of @code{cs-roll} |
|
to shuffle them (you can also use @code{swap} etc.). |
|
|
|
Here is an example of a @code{code} word (assumes that the stack pointer |
|
is in esi and the TOS is in ebx): |
|
|
|
@example |
@example |
code my+ ( n1 n2 -- n ) |
struct |
4 si D) bx add |
... |
4 # si add |
foo% field ... |
Next |
... |
end-code |
end-struct ... |
@end example |
@end example |
|
|
@node Alpha Assembler, MIPS assembler, 386 Assembler, Assembler and Code Words |
@cindex structure extension |
@subsection Alpha Assembler |
@cindex extended records |
|
Instead of starting with an empty structure, you can extend an |
The Alpha assembler and disassembler were originally written by Bernd |
existing structure. E.g., a plain linked list without data, as defined |
Thallner. |
above, is hardly useful; You can extend it to a linked list of integers, |
|
like this:@footnote{This feature is also known as @emph{extended |
The register names @code{a0}--@code{a5} are not available to avoid |
records}. It is the main innovation in the Oberon language; in other |
shadowing hex numbers. |
words, adding this feature to Modula-2 led Wirth to create a new |
|
language, write a new compiler etc. Adding this feature to Forth just |
|
required a few lines of code.} |
|
|
Immediate forms of arithmetic instructions are distinguished by a |
@example |
@code{#} just before the @code{,}, e.g., @code{and#,} (note: @code{lda,} |
list% |
does not count as arithmetic instruction). |
cell% field intlist-int |
|
end-struct intlist% |
|
@end example |
|
|
You have to specify all operands to an instruction, even those that |
@code{intlist%} is a structure with two fields: |
other assemblers consider optional, e.g., the destination register for |
@code{list-next} and @code{intlist-int}. |
@code{br,}, or the destination register and hint for @code{jmp,}. |
|
|
|
You can specify conditions for @code{if,} by removing the first @code{b} |
@cindex structures containing arrays |
and the trailing @code{,} from a branch with a corresponding name; e.g., |
You can specify an array type containing @emph{n} elements of |
|
type @code{foo%} like this: |
|
|
@example |
@example |
11 fgt if, \ if F11>0e |
foo% @emph{n} * |
... |
|
endif, |
|
@end example |
@end example |
|
|
@code{fbgt,} gives @code{fgt}. |
You can use this array type in any place where you can use a normal |
|
type, e.g., when defining a @code{field}, or with |
|
@code{%allot}. |
|
|
@node MIPS assembler, Other assemblers, Alpha Assembler, Assembler and Code Words |
@cindex first field optimization |
@subsection MIPS assembler |
The first field is at the base address of a structure and the word for |
|
this field (e.g., @code{list-next}) actually does not change the address |
|
on the stack. You may be tempted to leave it away in the interest of |
|
run-time and space efficiency. This is not necessary, because the |
|
structure package optimizes this case: If you compile a first-field |
|
words, no code is generated. So, in the interest of readability and |
|
maintainability you should include the word for the field when accessing |
|
the field. |
|
|
The MIPS assembler was originally written by Christian Pirker. |
|
|
|
Currently the assembler and disassembler only cover the MIPS-I |
@node Structure Naming Convention, Structure Implementation, Structure Usage, Structures |
architecture (R3000), and don't support FP instructions. |
@subsection Structure Naming Convention |
|
@cindex structure naming convention |
|
|
The register names @code{$a0}--@code{$a3} are not available to avoid |
The field names that come to (my) mind are often quite generic, and, |
shadowing hex numbers. |
if used, would cause frequent name clashes. E.g., many structures |
|
probably contain a @code{counter} field. The structure names |
|
that come to (my) mind are often also the logical choice for the names |
|
of words that create such a structure. |
|
|
Because there is no way to distinguish registers from immediate values, |
Therefore, I have adopted the following naming conventions: |
you have to explicitly use the immediate forms of instructions, i.e., |
|
@code{addiu,}, not just @code{addu,} (@command{as} does this |
|
implicitly). |
|
|
|
If the architecture manual specifies several formats for the instruction |
@itemize @bullet |
(e.g., for @code{jalr,}), you usually have to use the one with more |
@cindex field naming convention |
arguments (i.e., two for @code{jalr,}). When in doubt, see |
@item |
@code{arch/mips/testasm.fs} for an example of correct use. |
The names of fields are of the form |
|
@code{@emph{struct}-@emph{field}}, where |
|
@code{@emph{struct}} is the basic name of the structure, and |
|
@code{@emph{field}} is the basic name of the field. You can |
|
think of field words as converting the (address of the) |
|
structure into the (address of the) field. |
|
|
Branches and jumps in the MIPS architecture have a delay slot. You have |
@cindex structure naming convention |
to fill it yourself (the simplest way is to use @code{nop,}), the |
@item |
assembler does not do it for you (unlike @command{as}). Even |
The names of structures are of the form |
@code{if,}, @code{ahead,}, @code{until,}, @code{again,}, @code{while,}, |
@code{@emph{struct}%}, where |
@code{else,} and @code{repeat,} need a delay slot. Since @code{begin,} |
@code{@emph{struct}} is the basic name of the structure. |
and @code{then,} just specify branch targets, they are not affected. |
@end itemize |
|
|
Note that you must not put branches, jumps, or @code{li,} into the delay |
This naming convention does not work that well for fields of extended |
slot: @code{li,} may expand to several instructions, and control flow |
structures; e.g., the integer list structure has a field |
instructions may not be put into the branch delay slot in any case. |
@code{intlist-int}, but has @code{list-next}, not |
|
@code{intlist-next}. |
|
|
For branches the argument specifying the target is a relative address; |
@node Structure Implementation, Structure Glossary, Structure Naming Convention, Structures |
You have to add the address of the delay slot to get the absolute |
@subsection Structure Implementation |
address. |
@cindex structure implementation |
|
@cindex implementation of structures |
|
|
The MIPS architecture also has load delay slots and restrictions on |
The central idea in the implementation is to pass the data about the |
using @code{mfhi,} and @code{mflo,}; you have to order the instructions |
structure being built on the stack, not in some global |
yourself to satisfy these restrictions, the assembler does not do it for |
variable. Everything else falls into place naturally once this design |
you. |
decision is made. |
|
|
You can specify the conditions for @code{if,} etc. by taking a |
The type description on the stack is of the form @emph{align |
conditional branch and leaving away the @code{b} at the start and the |
size}. Keeping the size on the top-of-stack makes dealing with arrays |
@code{,} at the end. E.g., |
very simple. |
|
|
|
@code{field} is a defining word that uses @code{Create} |
|
and @code{DOES>}. The body of the field contains the offset |
|
of the field, and the normal @code{DOES>} action is simply: |
|
|
@example |
@example |
4 5 eq if, |
@@ + |
... \ do something if $4 equals $5 |
|
then, |
|
@end example |
@end example |
|
|
@node Other assemblers, , MIPS assembler, Assembler and Code Words |
@noindent |
@subsection Other assemblers |
i.e., add the offset to the address, giving the stack effect |
|
@i{addr1 -- addr2} for a field. |
|
|
If you want to contribute another assembler/disassembler, please contact |
@cindex first field optimization, implementation |
us (@email{bug-gforth@@gnu.org}) to check if we have such an assembler |
This simple structure is slightly complicated by the optimization |
already. If you are writing them from scratch, please use a similar |
for fields with offset 0, which requires a different |
syntax style as the one we use (i.e., postfix, commas at the end of the |
@code{DOES>}-part (because we cannot rely on there being |
instruction names, @pxref{Common Assembler}); make the output of the |
something on the stack if such a field is invoked during |
disassembler be valid input for the assembler, and keep the style |
compilation). Therefore, we put the different @code{DOES>}-parts |
similar to the style we used. |
in separate words, and decide which one to invoke based on the |
|
offset. For a zero offset, the field is basically a noop; it is |
|
immediate, and therefore no code is generated when it is compiled. |
|
|
Hints on implementation: The most important part is to have a good test |
@node Structure Glossary, , Structure Implementation, Structures |
suite that contains all instructions. Once you have that, the rest is |
@subsection Structure Glossary |
easy. For actual coding you can take a look at |
@cindex structure glossary |
@file{arch/mips/disasm.fs} to get some ideas on how to use data for both |
|
the assembler and disassembler, avoiding redundancy and some potential |
|
bugs. You can also look at that file (and @pxref{Advanced does> usage |
|
example}) to get ideas how to factor a disassembler. |
|
|
|
Start with the disassembler, because it's easier to reuse data from the |
|
disassembler for the assembler than the other way round. |
|
|
|
For the assembler, take a look at @file{arch/alpha/asm.fs}, which shows |
doc-%align |
how simple it can be. |
doc-%alignment |
|
doc-%alloc |
|
doc-%allocate |
|
doc-%allot |
|
doc-cell% |
|
doc-char% |
|
doc-dfloat% |
|
doc-double% |
|
doc-end-struct |
|
doc-field |
|
doc-float% |
|
doc-naligned |
|
doc-sfloat% |
|
doc-%size |
|
doc-struct |
|
|
@c ------------------------------------------------------------- |
|
@node Threading Words, Locals, Assembler and Code Words, Words |
|
@section Threading Words |
|
@cindex threading words |
|
|
|
@cindex code address |
@c ------------------------------------------------------------- |
These words provide access to code addresses and other threading stuff |
@node Object-oriented Forth, Programming Tools, Structures, Words |
in Gforth (and, possibly, other interpretive Forths). It more or less |
@section Object-oriented Forth |
abstracts away the differences between direct and indirect threading |
|
(and, for direct threading, the machine dependences). However, at |
|
present this wordset is still incomplete. It is also pretty low-level; |
|
some day it will hopefully be made unnecessary by an internals wordset |
|
that abstracts implementation details away completely. |
|
|
|
|
Gforth comes with three packages for object-oriented programming: |
|
@file{objects.fs}, @file{oof.fs}, and @file{mini-oof.fs}; none of them |
|
is preloaded, so you have to @code{include} them before use. The most |
|
important differences between these packages (and others) are discussed |
|
in @ref{Comparison with other object models}. All packages are written |
|
in ANS Forth and can be used with any other ANS Forth. |
|
|
doc-threading-method |
@menu |
doc->code-address |
* Why object-oriented programming?:: |
doc->does-code |
* Object-Oriented Terminology:: |
doc-code-address! |
* Objects:: |
doc-does-code! |
* OOF:: |
doc-does-handler! |
* Mini-OOF:: |
doc-/does-handler |
* Comparison with other object models:: |
|
@end menu |
|
|
|
@c ---------------------------------------------------------------- |
|
@node Why object-oriented programming?, Object-Oriented Terminology, Object-oriented Forth, Object-oriented Forth |
|
@subsection Why object-oriented programming? |
|
@cindex object-oriented programming motivation |
|
@cindex motivation for object-oriented programming |
|
|
The code addresses produced by various defining words are produced by |
Often we have to deal with several data structures (@emph{objects}), |
the following words: |
that have to be treated similarly in some respects, but differently in |
|
others. Graphical objects are the textbook example: circles, triangles, |
|
dinosaurs, icons, and others, and we may want to add more during program |
|
development. We want to apply some operations to any graphical object, |
|
e.g., @code{draw} for displaying it on the screen. However, @code{draw} |
|
has to do something different for every kind of object. |
|
@comment TODO add some other operations eg perimeter, area |
|
@comment and tie in to concrete examples later.. |
|
|
|
We could implement @code{draw} as a big @code{CASE} |
|
control structure that executes the appropriate code depending on the |
|
kind of object to be drawn. This would be not be very elegant, and, |
|
moreover, we would have to change @code{draw} every time we add |
|
a new kind of graphical object (say, a spaceship). |
|
|
doc-docol: |
What we would rather do is: When defining spaceships, we would tell |
doc-docon: |
the system: ``Here's how you @code{draw} a spaceship; you figure |
doc-dovar: |
out the rest''. |
doc-douser: |
|
doc-dodefer: |
|
doc-dofield: |
|
|
|
|
This is the problem that all systems solve that (rightfully) call |
|
themselves object-oriented; the object-oriented packages presented here |
|
solve this problem (and not much else). |
|
@comment TODO ?list properties of oo systems.. oo vs o-based? |
|
|
You can recognize words defined by a @code{CREATE}...@code{DOES>} word |
@c ------------------------------------------------------------------------ |
with @code{>does-code}. If the word was defined in that way, the value |
@node Object-Oriented Terminology, Objects, Why object-oriented programming?, Object-oriented Forth |
returned is non-zero and identifies the @code{DOES>} used by the |
@subsection Object-Oriented Terminology |
defining word. |
@cindex object-oriented terminology |
@comment TODO should that be ``identifies the xt of the DOES> ??'' |
@cindex terminology for object-oriented programming |
|
|
@c ------------------------------------------------------------- |
This section is mainly for reference, so you don't have to understand |
@node Locals, Structures, Threading Words, Words |
all of it right away. The terminology is mainly Smalltalk-inspired. In |
@section Locals |
short: |
@cindex locals |
|
|
|
Local variables can make Forth programming more enjoyable and Forth |
@table @emph |
programs easier to read. Unfortunately, the locals of ANS Forth are |
@cindex class |
laden with restrictions. Therefore, we provide not only the ANS Forth |
@item class |
locals wordset, but also our own, more powerful locals wordset (we |
a data structure definition with some extras. |
implemented the ANS Forth locals wordset through our locals wordset). |
|
|
|
The ideas in this section have also been published in M. Anton Ertl, |
@cindex object |
@cite{@uref{http://www.complang.tuwien.ac.at/papers/ertl94l.ps.gz, |
@item object |
Automatic Scoping of Local Variables}}, EuroForth '94. |
an instance of the data structure described by the class definition. |
|
|
@menu |
@cindex instance variables |
* Gforth locals:: |
@item instance variables |
* ANS Forth locals:: |
fields of the data structure. |
@end menu |
|
|
|
@node Gforth locals, ANS Forth locals, Locals, Locals |
@cindex selector |
@subsection Gforth locals |
@cindex method selector |
@cindex Gforth locals |
@cindex virtual function |
@cindex locals, Gforth style |
@item selector |
|
(or @emph{method selector}) a word (e.g., |
|
@code{draw}) that performs an operation on a variety of data |
|
structures (classes). A selector describes @emph{what} operation to |
|
perform. In C++ terminology: a (pure) virtual function. |
|
|
Locals can be defined with |
@cindex method |
|
@item method |
|
the concrete definition that performs the operation |
|
described by the selector for a specific class. A method specifies |
|
@emph{how} the operation is performed for a specific class. |
|
|
@example |
@cindex selector invocation |
@{ local1 local2 ... -- comment @} |
@cindex message send |
@end example |
@cindex invoking a selector |
or |
@item selector invocation |
@example |
a call of a selector. One argument of the call (the TOS (top-of-stack)) |
@{ local1 local2 ... @} |
is used for determining which method is used. In Smalltalk terminology: |
@end example |
a message (consisting of the selector and the other arguments) is sent |
|
to the object. |
|
|
E.g., |
@cindex receiving object |
@example |
@item receiving object |
: max @{ n1 n2 -- n3 @} |
the object used for determining the method executed by a selector |
n1 n2 > if |
invocation. In the @file{objects.fs} model, it is the object that is on |
n1 |
the TOS when the selector is invoked. (@emph{Receiving} comes from |
else |
the Smalltalk @emph{message} terminology.) |
n2 |
|
endif ; |
|
@end example |
|
|
|
The similarity of locals definitions with stack comments is intended. A |
@cindex child class |
locals definition often replaces the stack comment of a word. The order |
@cindex parent class |
of the locals corresponds to the order in a stack comment and everything |
@cindex inheritance |
after the @code{--} is really a comment. |
@item child class |
|
a class that has (@emph{inherits}) all properties (instance variables, |
|
selectors, methods) from a @emph{parent class}. In Smalltalk |
|
terminology: The subclass inherits from the superclass. In C++ |
|
terminology: The derived class inherits from the base class. |
|
|
This similarity has one disadvantage: It is too easy to confuse locals |
@end table |
declarations with stack comments, causing bugs and making them hard to |
|
find. However, this problem can be avoided by appropriate coding |
|
conventions: Do not use both notations in the same program. If you do, |
|
they should be distinguished using additional means, e.g. by position. |
|
|
|
@cindex types of locals |
@c If you wonder about the message sending terminology, it comes from |
@cindex locals types |
@c a time when each object had it's own task and objects communicated via |
The name of the local may be preceded by a type specifier, e.g., |
@c message passing; eventually the Smalltalk developers realized that |
@code{F:} for a floating point value: |
@c they can do most things through simple (indirect) calls. They kept the |
|
@c terminology. |
|
|
@example |
@c -------------------------------------------------------------- |
: CX* @{ F: Ar F: Ai F: Br F: Bi -- Cr Ci @} |
@node Objects, OOF, Object-Oriented Terminology, Object-oriented Forth |
\ complex multiplication |
@subsection The @file{objects.fs} model |
Ar Br f* Ai Bi f* f- |
@cindex objects |
Ar Bi f* Ai Br f* f+ ; |
@cindex object-oriented programming |
@end example |
|
|
|
@cindex flavours of locals |
@cindex @file{objects.fs} |
@cindex locals flavours |
@cindex @file{oof.fs} |
@cindex value-flavoured locals |
|
@cindex variable-flavoured locals |
|
Gforth currently supports cells (@code{W:}, @code{W^}), doubles |
|
(@code{D:}, @code{D^}), floats (@code{F:}, @code{F^}) and characters |
|
(@code{C:}, @code{C^}) in two flavours: a value-flavoured local (defined |
|
with @code{W:}, @code{D:} etc.) produces its value and can be changed |
|
with @code{TO}. A variable-flavoured local (defined with @code{W^} etc.) |
|
produces its address (which becomes invalid when the variable's scope is |
|
left). E.g., the standard word @code{emit} can be defined in terms of |
|
@code{type} like this: |
|
|
|
@example |
This section describes the @file{objects.fs} package. This material also |
: emit @{ C^ char* -- @} |
has been published in M. Anton Ertl, |
char* 1 type ; |
@cite{@uref{http://www.complang.tuwien.ac.at/forth/objects/objects.html, |
@end example |
Yet Another Forth Objects Package}}, Forth Dimensions 19(2), pages |
|
37--43. |
|
@c McKewan's and Zsoter's packages |
|
|
@cindex default type of locals |
This section assumes that you have read @ref{Structures}. |
@cindex locals, default type |
|
A local without type specifier is a @code{W:} local. Both flavours of |
|
locals are initialized with values from the data or FP stack. |
|
|
|
Currently there is no way to define locals with user-defined data |
The techniques on which this model is based have been used to implement |
structures, but we are working on it. |
the parser generator, Gray, and have also been used in Gforth for |
|
implementing the various flavours of word lists (hashed or not, |
|
case-sensitive or not, special-purpose word lists for locals etc.). |
|
|
Gforth allows defining locals everywhere in a colon definition. This |
|
poses the following questions: |
|
|
|
@menu |
@menu |
* Where are locals visible by name?:: |
* Properties of the Objects model:: |
* How long do locals live?:: |
* Basic Objects Usage:: |
* Programming Style:: |
* The Objects base class:: |
* Implementation:: |
* Creating objects:: |
|
* Object-Oriented Programming Style:: |
|
* Class Binding:: |
|
* Method conveniences:: |
|
* Classes and Scoping:: |
|
* Dividing classes:: |
|
* Object Interfaces:: |
|
* Objects Implementation:: |
|
* Objects Glossary:: |
@end menu |
@end menu |
|
|
@node Where are locals visible by name?, How long do locals live?, Gforth locals, Gforth locals |
Marcel Hendrix provided helpful comments on this section. |
@subsubsection Where are locals visible by name? |
|
@cindex locals visibility |
|
@cindex visibility of locals |
|
@cindex scope of locals |
|
|
|
Basically, the answer is that locals are visible where you would expect |
@node Properties of the Objects model, Basic Objects Usage, Objects, Objects |
it in block-structured languages, and sometimes a little longer. If you |
@subsubsection Properties of the @file{objects.fs} model |
want to restrict the scope of a local, enclose its definition in |
@cindex @file{objects.fs} properties |
@code{SCOPE}...@code{ENDSCOPE}. |
|
|
|
|
@itemize @bullet |
|
@item |
|
It is straightforward to pass objects on the stack. Passing |
|
selectors on the stack is a little less convenient, but possible. |
|
|
doc-scope |
@item |
doc-endscope |
Objects are just data structures in memory, and are referenced by their |
|
address. You can create words for objects with normal defining words |
|
like @code{constant}. Likewise, there is no difference between instance |
|
variables that contain objects and those that contain other data. |
|
|
|
@item |
|
Late binding is efficient and easy to use. |
|
|
These words behave like control structure words, so you can use them |
@item |
with @code{CS-PICK} and @code{CS-ROLL} to restrict the scope in |
It avoids parsing, and thus avoids problems with state-smartness |
arbitrary ways. |
and reduced extensibility; for convenience there are a few parsing |
|
words, but they have non-parsing counterparts. There are also a few |
|
defining words that parse. This is hard to avoid, because all standard |
|
defining words parse (except @code{:noname}); however, such |
|
words are not as bad as many other parsing words, because they are not |
|
state-smart. |
|
|
If you want a more exact answer to the visibility question, here's the |
@item |
basic principle: A local is visible in all places that can only be |
It does not try to incorporate everything. It does a few things and does |
reached through the definition of the local@footnote{In compiler |
them well (IMO). In particular, this model was not designed to support |
construction terminology, all places dominated by the definition of the |
information hiding (although it has features that may help); you can use |
local.}. In other words, it is not visible in places that can be reached |
a separate package for achieving this. |
without going through the definition of the local. E.g., locals defined |
|
in @code{IF}...@code{ENDIF} are visible until the @code{ENDIF}, locals |
|
defined in @code{BEGIN}...@code{UNTIL} are visible after the |
|
@code{UNTIL} (until, e.g., a subsequent @code{ENDSCOPE}). |
|
|
|
The reasoning behind this solution is: We want to have the locals |
@item |
visible as long as it is meaningful. The user can always make the |
It is layered; you don't have to learn and use all features to use this |
visibility shorter by using explicit scoping. In a place that can |
model. Only a few features are necessary (@pxref{Basic Objects Usage}, |
only be reached through the definition of a local, the meaning of a |
@pxref{The Objects base class}, @pxref{Creating objects}.), the others |
local name is clear. In other places it is not: How is the local |
are optional and independent of each other. |
initialized at the control flow path that does not contain the |
|
definition? Which local is meant, if the same name is defined twice in |
|
two independent control flow paths? |
|
|
|
This should be enough detail for nearly all users, so you can skip the |
@item |
rest of this section. If you really must know all the gory details and |
An implementation in ANS Forth is available. |
options, read on. |
|
|
|
In order to implement this rule, the compiler has to know which places |
@end itemize |
are unreachable. It knows this automatically after @code{AHEAD}, |
|
@code{AGAIN}, @code{EXIT} and @code{LEAVE}; in other cases (e.g., after |
|
most @code{THROW}s), you can use the word @code{UNREACHABLE} to tell the |
|
compiler that the control flow never reaches that place. If |
|
@code{UNREACHABLE} is not used where it could, the only consequence is |
|
that the visibility of some locals is more limited than the rule above |
|
says. If @code{UNREACHABLE} is used where it should not (i.e., if you |
|
lie to the compiler), buggy code will be produced. |
|
|
|
|
|
doc-unreachable |
@node Basic Objects Usage, The Objects base class, Properties of the Objects model, Objects |
|
@subsubsection Basic @file{objects.fs} Usage |
|
@cindex basic objects usage |
|
@cindex objects, basic usage |
|
|
|
You can define a class for graphical objects like this: |
|
|
Another problem with this rule is that at @code{BEGIN}, the compiler |
@cindex @code{class} usage |
does not know which locals will be visible on the incoming |
@cindex @code{end-class} usage |
back-edge. All problems discussed in the following are due to this |
@cindex @code{selector} usage |
ignorance of the compiler (we discuss the problems using @code{BEGIN} |
|
loops as examples; the discussion also applies to @code{?DO} and other |
|
loops). Perhaps the most insidious example is: |
|
@example |
@example |
AHEAD |
object class \ "object" is the parent class |
BEGIN |
selector draw ( x y graphical -- ) |
x |
end-class graphical |
[ 1 CS-ROLL ] THEN |
|
@{ x @} |
|
... |
|
UNTIL |
|
@end example |
@end example |
|
|
This should be legal according to the visibility rule. The use of |
This code defines a class @code{graphical} with an |
@code{x} can only be reached through the definition; but that appears |
operation @code{draw}. We can perform the operation |
textually below the use. |
@code{draw} on any @code{graphical} object, e.g.: |
|
|
From this example it is clear that the visibility rules cannot be fully |
|
implemented without major headaches. Our implementation treats common |
|
cases as advertised and the exceptions are treated in a safe way: The |
|
compiler makes a reasonable guess about the locals visible after a |
|
@code{BEGIN}; if it is too pessimistic, the |
|
user will get a spurious error about the local not being defined; if the |
|
compiler is too optimistic, it will notice this later and issue a |
|
warning. In the case above the compiler would complain about @code{x} |
|
being undefined at its use. You can see from the obscure examples in |
|
this section that it takes quite unusual control structures to get the |
|
compiler into trouble, and even then it will often do fine. |
|
|
|
If the @code{BEGIN} is reachable from above, the most optimistic guess |
|
is that all locals visible before the @code{BEGIN} will also be |
|
visible after the @code{BEGIN}. This guess is valid for all loops that |
|
are entered only through the @code{BEGIN}, in particular, for normal |
|
@code{BEGIN}...@code{WHILE}...@code{REPEAT} and |
|
@code{BEGIN}...@code{UNTIL} loops and it is implemented in our |
|
compiler. When the branch to the @code{BEGIN} is finally generated by |
|
@code{AGAIN} or @code{UNTIL}, the compiler checks the guess and |
|
warns the user if it was too optimistic: |
|
@example |
@example |
IF |
100 100 t-rex draw |
@{ x @} |
|
BEGIN |
|
\ x ? |
|
[ 1 cs-roll ] THEN |
|
... |
|
UNTIL |
|
@end example |
@end example |
|
|
Here, @code{x} lives only until the @code{BEGIN}, but the compiler |
@noindent |
optimistically assumes that it lives until the @code{THEN}. It notices |
where @code{t-rex} is a word (say, a constant) that produces a |
this difference when it compiles the @code{UNTIL} and issues a |
graphical object. |
warning. The user can avoid the warning, and make sure that @code{x} |
|
is not used in the wrong area by using explicit scoping: |
|
@example |
|
IF |
|
SCOPE |
|
@{ x @} |
|
ENDSCOPE |
|
BEGIN |
|
[ 1 cs-roll ] THEN |
|
... |
|
UNTIL |
|
@end example |
|
|
|
Since the guess is optimistic, there will be no spurious error messages |
@comment TODO add a 2nd operation eg perimeter.. and use for |
about undefined locals. |
@comment a concrete example |
|
|
If the @code{BEGIN} is not reachable from above (e.g., after |
@cindex abstract class |
@code{AHEAD} or @code{EXIT}), the compiler cannot even make an |
How do we create a graphical object? With the present definitions, |
optimistic guess, as the locals visible after the @code{BEGIN} may be |
we cannot create a useful graphical object. The class |
defined later. Therefore, the compiler assumes that no locals are |
@code{graphical} describes graphical objects in general, but not |
visible after the @code{BEGIN}. However, the user can use |
any concrete graphical object type (C++ users would call it an |
@code{ASSUME-LIVE} to make the compiler assume that the same locals are |
@emph{abstract class}); e.g., there is no method for the selector |
visible at the BEGIN as at the point where the top control-flow stack |
@code{draw} in the class @code{graphical}. |
item was created. |
|
|
|
|
For concrete graphical objects, we define child classes of the |
|
class @code{graphical}, e.g.: |
|
|
doc-assume-live |
@cindex @code{overrides} usage |
|
@cindex @code{field} usage in class definition |
|
@example |
|
graphical class \ "graphical" is the parent class |
|
cell% field circle-radius |
|
|
|
:noname ( x y circle -- ) |
|
circle-radius @@ draw-circle ; |
|
overrides draw |
|
|
@noindent |
:noname ( n-radius circle -- ) |
E.g., |
circle-radius ! ; |
@example |
overrides construct |
@{ x @} |
|
AHEAD |
end-class circle |
ASSUME-LIVE |
|
BEGIN |
|
x |
|
[ 1 CS-ROLL ] THEN |
|
... |
|
UNTIL |
|
@end example |
@end example |
|
|
Other cases where the locals are defined before the @code{BEGIN} can be |
Here we define a class @code{circle} as a child of @code{graphical}, |
handled by inserting an appropriate @code{CS-ROLL} before the |
with field @code{circle-radius} (which behaves just like a field |
@code{ASSUME-LIVE} (and changing the control-flow stack manipulation |
(@pxref{Structures}); it defines (using @code{overrides}) new methods |
behind the @code{ASSUME-LIVE}). |
for the selectors @code{draw} and @code{construct} (@code{construct} is |
|
defined in @code{object}, the parent class of @code{graphical}). |
|
|
Cases where locals are defined after the @code{BEGIN} (but should be |
Now we can create a circle on the heap (i.e., |
visible immediately after the @code{BEGIN}) can only be handled by |
@code{allocate}d memory) with: |
rearranging the loop. E.g., the ``most insidious'' example above can be |
|
arranged into: |
@cindex @code{heap-new} usage |
@example |
@example |
BEGIN |
50 circle heap-new constant my-circle |
@{ x @} |
|
... 0= |
|
WHILE |
|
x |
|
REPEAT |
|
@end example |
@end example |
|
|
@node How long do locals live?, Programming Style, Where are locals visible by name?, Gforth locals |
@noindent |
@subsubsection How long do locals live? |
@code{heap-new} invokes @code{construct}, thus |
@cindex locals lifetime |
initializing the field @code{circle-radius} with 50. We can draw |
@cindex lifetime of locals |
this new circle at (100,100) with: |
|
|
The right answer for the lifetime question would be: A local lives at |
@example |
least as long as it can be accessed. For a value-flavoured local this |
100 100 my-circle draw |
means: until the end of its visibility. However, a variable-flavoured |
@end example |
local could be accessed through its address far beyond its visibility |
|
scope. Ultimately, this would mean that such locals would have to be |
|
garbage collected. Since this entails un-Forth-like implementation |
|
complexities, I adopted the same cowardly solution as some other |
|
languages (e.g., C): The local lives only as long as it is visible; |
|
afterwards its address is invalid (and programs that access it |
|
afterwards are erroneous). |
|
|
|
@node Programming Style, Implementation, How long do locals live?, Gforth locals |
@cindex selector invocation, restrictions |
@subsubsection Programming Style |
@cindex class definition, restrictions |
@cindex locals programming style |
Note: You can only invoke a selector if the object on the TOS |
@cindex programming style, locals |
(the receiving object) belongs to the class where the selector was |
|
defined or one of its descendents; e.g., you can invoke |
|
@code{draw} only for objects belonging to @code{graphical} |
|
or its descendents (e.g., @code{circle}). Immediately before |
|
@code{end-class}, the search order has to be the same as |
|
immediately after @code{class}. |
|
|
The freedom to define locals anywhere has the potential to change |
@node The Objects base class, Creating objects, Basic Objects Usage, Objects |
programming styles dramatically. In particular, the need to use the |
@subsubsection The @file{object.fs} base class |
return stack for intermediate storage vanishes. Moreover, all stack |
@cindex @code{object} class |
manipulations (except @code{PICK}s and @code{ROLL}s with run-time |
|
determined arguments) can be eliminated: If the stack items are in the |
|
wrong order, just write a locals definition for all of them; then |
|
write the items in the order you want. |
|
|
|
This seems a little far-fetched and eliminating stack manipulations is |
When you define a class, you have to specify a parent class. So how do |
unlikely to become a conscious programming objective. Still, the number |
you start defining classes? There is one class available from the start: |
of stack manipulations will be reduced dramatically if local variables |
@code{object}. It is ancestor for all classes and so is the |
are used liberally (e.g., compare @code{max} (@pxref{Gforth locals}) with |
only class that has no parent. It has two selectors: @code{construct} |
a traditional implementation of @code{max}). |
and @code{print}. |
|
|
This shows one potential benefit of locals: making Forth programs more |
@node Creating objects, Object-Oriented Programming Style, The Objects base class, Objects |
readable. Of course, this benefit will only be realized if the |
@subsubsection Creating objects |
programmers continue to honour the principle of factoring instead of |
@cindex creating objects |
using the added latitude to make the words longer. |
@cindex object creation |
|
@cindex object allocation options |
|
|
@cindex single-assignment style for locals |
@cindex @code{heap-new} discussion |
Using @code{TO} can and should be avoided. Without @code{TO}, |
@cindex @code{dict-new} discussion |
every value-flavoured local has only a single assignment and many |
@cindex @code{construct} discussion |
advantages of functional languages apply to Forth. I.e., programs are |
You can create and initialize an object of a class on the heap with |
easier to analyse, to optimize and to read: It is clear from the |
@code{heap-new} ( ... class -- object ) and in the dictionary |
definition what the local stands for, it does not turn into something |
(allocation with @code{allot}) with @code{dict-new} ( |
different later. |
... class -- object ). Both words invoke @code{construct}, which |
|
consumes the stack items indicated by "..." above. |
|
|
E.g., a definition using @code{TO} might look like this: |
@cindex @code{init-object} discussion |
@example |
@cindex @code{class-inst-size} discussion |
: strcmp @{ addr1 u1 addr2 u2 -- n @} |
If you want to allocate memory for an object yourself, you can get its |
u1 u2 min 0 |
alignment and size with @code{class-inst-size 2@@} ( class -- |
?do |
align size ). Once you have memory for an object, you can initialize |
addr1 c@@ addr2 c@@ - |
it with @code{init-object} ( ... class object -- ); |
?dup-if |
@code{construct} does only a part of the necessary work. |
unloop exit |
|
then |
|
addr1 char+ TO addr1 |
|
addr2 char+ TO addr2 |
|
loop |
|
u1 u2 - ; |
|
@end example |
|
Here, @code{TO} is used to update @code{addr1} and @code{addr2} at |
|
every loop iteration. @code{strcmp} is a typical example of the |
|
readability problems of using @code{TO}. When you start reading |
|
@code{strcmp}, you think that @code{addr1} refers to the start of the |
|
string. Only near the end of the loop you realize that it is something |
|
else. |
|
|
|
This can be avoided by defining two locals at the start of the loop that |
@node Object-Oriented Programming Style, Class Binding, Creating objects, Objects |
are initialized with the right value for the current iteration. |
@subsubsection Object-Oriented Programming Style |
@example |
@cindex object-oriented programming style |
: strcmp @{ addr1 u1 addr2 u2 -- n @} |
@cindex programming style, object-oriented |
addr1 addr2 |
|
u1 u2 min 0 |
|
?do @{ s1 s2 @} |
|
s1 c@@ s2 c@@ - |
|
?dup-if |
|
unloop exit |
|
then |
|
s1 char+ s2 char+ |
|
loop |
|
2drop |
|
u1 u2 - ; |
|
@end example |
|
Here it is clear from the start that @code{s1} has a different value |
|
in every loop iteration. |
|
|
|
@node Implementation, , Programming Style, Gforth locals |
This section is not exhaustive. |
@subsubsection Implementation |
|
@cindex locals implementation |
|
@cindex implementation of locals |
|
|
|
@cindex locals stack |
@cindex stack effects of selectors |
Gforth uses an extra locals stack. The most compelling reason for |
@cindex selectors and stack effects |
this is that the return stack is not float-aligned; using an extra stack |
In general, it is a good idea to ensure that all methods for the |
also eliminates the problems and restrictions of using the return stack |
same selector have the same stack effect: when you invoke a selector, |
as locals stack. Like the other stacks, the locals stack grows toward |
you often have no idea which method will be invoked, so, unless all |
lower addresses. A few primitives allow an efficient implementation: |
methods have the same stack effect, you will not know the stack effect |
|
of the selector invocation. |
|
|
|
One exception to this rule is methods for the selector |
|
@code{construct}. We know which method is invoked, because we |
|
specify the class to be constructed at the same place. Actually, I |
|
defined @code{construct} as a selector only to give the users a |
|
convenient way to specify initialization. The way it is used, a |
|
mechanism different from selector invocation would be more natural |
|
(but probably would take more code and more space to explain). |
|
|
doc-@local# |
@node Class Binding, Method conveniences, Object-Oriented Programming Style, Objects |
doc-f@local# |
@subsubsection Class Binding |
doc-laddr# |
@cindex class binding |
doc-lp+!# |
@cindex early binding |
doc-lp! |
|
doc->l |
|
doc-f>l |
|
|
|
|
@cindex late binding |
|
Normal selector invocations determine the method at run-time depending |
|
on the class of the receiving object. This run-time selection is called |
|
@i{late binding}. |
|
|
In addition to these primitives, some specializations of these |
Sometimes it's preferable to invoke a different method. For example, |
primitives for commonly occurring inline arguments are provided for |
you might want to use the simple method for @code{print}ing |
efficiency reasons, e.g., @code{@@local0} as specialization of |
@code{object}s instead of the possibly long-winded @code{print} method |
@code{@@local#} for the inline argument 0. The following compiling words |
of the receiver class. You can achieve this by replacing the invocation |
compile the right specialized version, or the general version, as |
of @code{print} with: |
appropriate: |
|
|
|
|
@cindex @code{[bind]} usage |
|
@example |
|
[bind] object print |
|
@end example |
|
|
doc-compile-@local |
@noindent |
doc-compile-f@local |
in compiled code or: |
doc-compile-lp+! |
|
|
|
|
@cindex @code{bind} usage |
|
@example |
|
bind object print |
|
@end example |
|
|
Combinations of conditional branches and @code{lp+!#} like |
@cindex class binding, alternative to |
@code{?branch-lp+!#} (the locals pointer is only changed if the branch |
@noindent |
is taken) are provided for efficiency and correctness in loops. |
in interpreted code. Alternatively, you can define the method with a |
|
name (e.g., @code{print-object}), and then invoke it through the |
|
name. Class binding is just a (often more convenient) way to achieve |
|
the same effect; it avoids name clutter and allows you to invoke |
|
methods directly without naming them first. |
|
|
A special area in the dictionary space is reserved for keeping the |
@cindex superclass binding |
local variable names. @code{@{} switches the dictionary pointer to this |
@cindex parent class binding |
area and @code{@}} switches it back and generates the locals |
A frequent use of class binding is this: When we define a method |
initializing code. @code{W:} etc.@ are normal defining words. This |
for a selector, we often want the method to do what the selector does |
special area is cleared at the start of every colon definition. |
in the parent class, and a little more. There is a special word for |
|
this purpose: @code{[parent]}; @code{[parent] |
|
@emph{selector}} is equivalent to @code{[bind] @emph{parent |
|
selector}}, where @code{@emph{parent}} is the parent |
|
class of the current class. E.g., a method definition might look like: |
|
|
@cindex word list for defining locals |
@cindex @code{[parent]} usage |
A special feature of Gforth's dictionary is used to implement the |
@example |
definition of locals without type specifiers: every word list (aka |
:noname |
vocabulary) has its own methods for searching |
dup [parent] foo \ do parent's foo on the receiving object |
etc. (@pxref{Word Lists}). For the present purpose we defined a word list |
... \ do some more |
with a special search method: When it is searched for a word, it |
; overrides foo |
actually creates that word using @code{W:}. @code{@{} changes the search |
@end example |
order to first search the word list containing @code{@}}, @code{W:} etc., |
|
and then the word list for defining locals without type specifiers. |
|
|
|
The lifetime rules support a stack discipline within a colon |
@cindex class binding as optimization |
definition: The lifetime of a local is either nested with other locals |
In @cite{Object-oriented programming in ANS Forth} (Forth Dimensions, |
lifetimes or it does not overlap them. |
March 1997), Andrew McKewan presents class binding as an optimization |
|
technique. I recommend not using it for this purpose unless you are in |
|
an emergency. Late binding is pretty fast with this model anyway, so the |
|
benefit of using class binding is small; the cost of using class binding |
|
where it is not appropriate is reduced maintainability. |
|
|
At @code{BEGIN}, @code{IF}, and @code{AHEAD} no code for locals stack |
While we are at programming style questions: You should bind |
pointer manipulation is generated. Between control structure words |
selectors only to ancestor classes of the receiving object. E.g., say, |
locals definitions can push locals onto the locals stack. @code{AGAIN} |
you know that the receiving object is of class @code{foo} or its |
is the simplest of the other three control flow words. It has to |
descendents; then you should bind only to @code{foo} and its |
restore the locals stack depth of the corresponding @code{BEGIN} |
ancestors. |
before branching. The code looks like this: |
|
@format |
|
@code{lp+!#} current-locals-size @minus{} dest-locals-size |
|
@code{branch} <begin> |
|
@end format |
|
|
|
@code{UNTIL} is a little more complicated: If it branches back, it |
@node Method conveniences, Classes and Scoping, Class Binding, Objects |
must adjust the stack just like @code{AGAIN}. But if it falls through, |
@subsubsection Method conveniences |
the locals stack must not be changed. The compiler generates the |
@cindex method conveniences |
following code: |
|
@format |
|
@code{?branch-lp+!#} <begin> current-locals-size @minus{} dest-locals-size |
|
@end format |
|
The locals stack pointer is only adjusted if the branch is taken. |
|
|
|
@code{THEN} can produce somewhat inefficient code: |
In a method you usually access the receiving object pretty often. If |
@format |
you define the method as a plain colon definition (e.g., with |
@code{lp+!#} current-locals-size @minus{} orig-locals-size |
@code{:noname}), you may have to do a lot of stack |
<orig target>: |
gymnastics. To avoid this, you can define the method with @code{m: |
@code{lp+!#} orig-locals-size @minus{} new-locals-size |
... ;m}. E.g., you could define the method for |
@end format |
@code{draw}ing a @code{circle} with |
The second @code{lp+!#} adjusts the locals stack pointer from the |
|
level at the @i{orig} point to the level after the @code{THEN}. The |
|
first @code{lp+!#} adjusts the locals stack pointer from the current |
|
level to the level at the orig point, so the complete effect is an |
|
adjustment from the current level to the right level after the |
|
@code{THEN}. |
|
|
|
@cindex locals information on the control-flow stack |
@cindex @code{this} usage |
@cindex control-flow stack items, locals information |
@cindex @code{m:} usage |
In a conventional Forth implementation a dest control-flow stack entry |
@cindex @code{;m} usage |
is just the target address and an orig entry is just the address to be |
@example |
patched. Our locals implementation adds a word list to every orig or dest |
m: ( x y circle -- ) |
item. It is the list of locals visible (or assumed visible) at the point |
( x y ) this circle-radius @@ draw-circle ;m |
described by the entry. Our implementation also adds a tag to identify |
@end example |
the kind of entry, in particular to differentiate between live and dead |
|
(reachable and unreachable) orig entries. |
|
|
|
A few unusual operations have to be performed on locals word lists: |
@cindex @code{exit} in @code{m: ... ;m} |
|
@cindex @code{exitm} discussion |
|
@cindex @code{catch} in @code{m: ... ;m} |
|
When this method is executed, the receiver object is removed from the |
|
stack; you can access it with @code{this} (admittedly, in this |
|
example the use of @code{m: ... ;m} offers no advantage). Note |
|
that I specify the stack effect for the whole method (i.e. including |
|
the receiver object), not just for the code between @code{m:} |
|
and @code{;m}. You cannot use @code{exit} in |
|
@code{m:...;m}; instead, use |
|
@code{exitm}.@footnote{Moreover, for any word that calls |
|
@code{catch} and was defined before loading |
|
@code{objects.fs}, you have to redefine it like I redefined |
|
@code{catch}: @code{: catch this >r catch r> to-this ;}} |
|
|
|
@cindex @code{inst-var} usage |
|
You will frequently use sequences of the form @code{this |
|
@emph{field}} (in the example above: @code{this |
|
circle-radius}). If you use the field only in this way, you can |
|
define it with @code{inst-var} and eliminate the |
|
@code{this} before the field name. E.g., the @code{circle} |
|
class above could also be defined with: |
|
|
doc-common-list |
@example |
doc-sub-list? |
graphical class |
doc-list-size |
cell% inst-var radius |
|
|
|
m: ( x y circle -- ) |
|
radius @@ draw-circle ;m |
|
overrides draw |
|
|
Several features of our locals word list implementation make these |
m: ( n-radius circle -- ) |
operations easy to implement: The locals word lists are organised as |
radius ! ;m |
linked lists; the tails of these lists are shared, if the lists |
overrides construct |
contain some of the same locals; and the address of a name is greater |
|
than the address of the names behind it in the list. |
|
|
|
Another important implementation detail is the variable |
end-class circle |
@code{dead-code}. It is used by @code{BEGIN} and @code{THEN} to |
@end example |
determine if they can be reached directly or only through the branch |
|
that they resolve. @code{dead-code} is set by @code{UNREACHABLE}, |
|
@code{AHEAD}, @code{EXIT} etc., and cleared at the start of a colon |
|
definition, by @code{BEGIN} and usually by @code{THEN}. |
|
|
|
Counted loops are similar to other loops in most respects, but |
@code{radius} can only be used in @code{circle} and its |
@code{LEAVE} requires special attention: It performs basically the same |
descendent classes and inside @code{m:...;m}. |
service as @code{AHEAD}, but it does not create a control-flow stack |
|
entry. Therefore the information has to be stored elsewhere; |
|
traditionally, the information was stored in the target fields of the |
|
branches created by the @code{LEAVE}s, by organizing these fields into a |
|
linked list. Unfortunately, this clever trick does not provide enough |
|
space for storing our extended control flow information. Therefore, we |
|
introduce another stack, the leave stack. It contains the control-flow |
|
stack entries for all unresolved @code{LEAVE}s. |
|
|
|
Local names are kept until the end of the colon definition, even if |
@cindex @code{inst-value} usage |
they are no longer visible in any control-flow path. In a few cases |
You can also define fields with @code{inst-value}, which is |
this may lead to increased space needs for the locals name area, but |
to @code{inst-var} what @code{value} is to |
usually less than reclaiming this space would cost in code size. |
@code{variable}. You can change the value of such a field with |
|
@code{[to-inst]}. E.g., we could also define the class |
|
@code{circle} like this: |
|
|
|
@example |
|
graphical class |
|
inst-value radius |
|
|
@node ANS Forth locals, , Gforth locals, Locals |
m: ( x y circle -- ) |
@subsection ANS Forth locals |
radius draw-circle ;m |
@cindex locals, ANS Forth style |
overrides draw |
|
|
The ANS Forth locals wordset does not define a syntax for locals, but |
m: ( n-radius circle -- ) |
words that make it possible to define various syntaxes. One of the |
[to-inst] radius ;m |
possible syntaxes is a subset of the syntax we used in the Gforth locals |
overrides construct |
wordset, i.e.: |
|
|
|
@example |
end-class circle |
@{ local1 local2 ... -- comment @} |
|
@end example |
|
@noindent |
|
or |
|
@example |
|
@{ local1 local2 ... @} |
|
@end example |
@end example |
|
|
The order of the locals corresponds to the order in a stack comment. The |
@c !! :m is easy to confuse with m:. Another name would be better. |
restrictions are: |
|
|
|
@itemize @bullet |
@c Finally, you can define named methods with @code{:m}. One use of this |
@item |
@c feature is the definition of words that occur only in one class and are |
Locals can only be cell-sized values (no type specifiers are allowed). |
@c not intended to be overridden, but which still need method context |
@item |
@c (e.g., for accessing @code{inst-var}s). Another use is for methods that |
Locals can be defined only outside control structures. |
@c would be bound frequently, if defined anonymously. |
@item |
|
Locals can interfere with explicit usage of the return stack. For the |
|
exact (and long) rules, see the standard. If you don't use return stack |
|
accessing words in a definition using locals, you will be all right. The |
|
purpose of this rule is to make locals implementation on the return |
|
stack easier. |
|
@item |
|
The whole definition must be in one line. |
|
@end itemize |
|
|
|
Locals defined in this way behave like @code{VALUE}s |
|
(@pxref{Values}). I.e., they are initialized from the stack. Using their |
|
name produces their value. Their value can be changed using @code{TO}. |
|
|
|
Since this syntax is supported by Gforth directly, you need not do |
@node Classes and Scoping, Dividing classes, Method conveniences, Objects |
anything to use it. If you want to port a program using this syntax to |
@subsubsection Classes and Scoping |
another ANS Forth system, use @file{compat/anslocal.fs} to implement the |
@cindex classes and scoping |
syntax on the other system. |
@cindex scoping and classes |
|
|
Note that a syntax shown in the standard, section A.13 looks |
Inheritance is frequent, unlike structure extension. This exacerbates |
similar, but is quite different in having the order of locals |
the problem with the field name convention (@pxref{Structure Naming |
reversed. Beware! |
Convention}): One always has to remember in which class the field was |
|
originally defined; changing a part of the class structure would require |
|
changes for renaming in otherwise unaffected code. |
|
|
The ANS Forth locals wordset itself consists of a word: |
@cindex @code{inst-var} visibility |
|
@cindex @code{inst-value} visibility |
|
To solve this problem, I added a scoping mechanism (which was not in my |
|
original charter): A field defined with @code{inst-var} (or |
|
@code{inst-value}) is visible only in the class where it is defined and in |
|
the descendent classes of this class. Using such fields only makes |
|
sense in @code{m:}-defined methods in these classes anyway. |
|
|
|
This scoping mechanism allows us to use the unadorned field name, |
|
because name clashes with unrelated words become much less likely. |
|
|
doc-(local) |
@cindex @code{protected} discussion |
|
@cindex @code{private} discussion |
|
Once we have this mechanism, we can also use it for controlling the |
|
visibility of other words: All words defined after |
|
@code{protected} are visible only in the current class and its |
|
descendents. @code{public} restores the compilation |
|
(i.e. @code{current}) word list that was in effect before. If you |
|
have several @code{protected}s without an intervening |
|
@code{public} or @code{set-current}, @code{public} |
|
will restore the compilation word list in effect before the first of |
|
these @code{protected}s. |
|
|
|
@node Dividing classes, Object Interfaces, Classes and Scoping, Objects |
|
@subsubsection Dividing classes |
|
@cindex Dividing classes |
|
@cindex @code{methods}...@code{end-methods} |
|
|
The ANS Forth locals extension wordset defines a syntax using @code{locals|}, but it is so |
You may want to do the definition of methods separate from the |
awful that we strongly recommend not to use it. We have implemented this |
definition of the class, its selectors, fields, and instance variables, |
syntax to make porting to Gforth easy, but do not document it here. The |
i.e., separate the implementation from the definition. You can do this |
problem with this syntax is that the locals are defined in an order |
in the following way: |
reversed with respect to the standard stack comment notation, making |
|
programs harder to read, and easier to misread and miswrite. The only |
|
merit of this syntax is that it is easy to implement using the ANS Forth |
|
locals wordset. |
|
|
|
|
@example |
|
graphical class |
|
inst-value radius |
|
end-class circle |
|
|
@c ---------------------------------------------------------- |
... \ do some other stuff |
@node Structures, Object-oriented Forth, Locals, Words |
|
@section Structures |
|
@cindex structures |
|
@cindex records |
|
|
|
This section presents the structure package that comes with Gforth. A |
circle methods \ now we are ready |
version of the package implemented in ANS Forth is available in |
|
@file{compat/struct.fs}. This package was inspired by a posting on |
|
comp.lang.forth in 1989 (unfortunately I don't remember, by whom; |
|
possibly John Hayes). A version of this section has been published in |
|
???. Marcel Hendrix provided helpful comments. |
|
|
|
@menu |
m: ( x y circle -- ) |
* Why explicit structure support?:: |
radius draw-circle ;m |
* Structure Usage:: |
overrides draw |
* Structure Naming Convention:: |
|
* Structure Implementation:: |
|
* Structure Glossary:: |
|
@end menu |
|
|
|
@node Why explicit structure support?, Structure Usage, Structures, Structures |
m: ( n-radius circle -- ) |
@subsection Why explicit structure support? |
[to-inst] radius ;m |
|
overrides construct |
|
|
@cindex address arithmetic for structures |
end-methods |
@cindex structures using address arithmetic |
@end example |
If we want to use a structure containing several fields, we could simply |
|
reserve memory for it, and access the fields using address arithmetic |
|
(@pxref{Address arithmetic}). As an example, consider a structure with |
|
the following fields |
|
|
|
@table @code |
You can use several @code{methods}...@code{end-methods} sections. The |
@item a |
only things you can do to the class in these sections are: defining |
is a float |
methods, and overriding the class's selectors. You must not define new |
@item b |
selectors or fields. |
is a cell |
|
@item c |
|
is a float |
|
@end table |
|
|
|
Given the (float-aligned) base address of the structure we get the |
Note that you often have to override a selector before using it. In |
address of the field |
particular, you usually have to override @code{construct} with a new |
|
method before you can invoke @code{heap-new} and friends. E.g., you |
|
must not create a circle before the @code{overrides construct} sequence |
|
in the example above. |
|
|
@table @code |
@node Object Interfaces, Objects Implementation, Dividing classes, Objects |
@item a |
@subsubsection Object Interfaces |
without doing anything further. |
@cindex object interfaces |
@item b |
@cindex interfaces for objects |
with @code{float+} |
|
@item c |
|
with @code{float+ cell+ faligned} |
|
@end table |
|
|
|
It is easy to see that this can become quite tiring. |
In this model you can only call selectors defined in the class of the |
|
receiving objects or in one of its ancestors. If you call a selector |
|
with a receiving object that is not in one of these classes, the |
|
result is undefined; if you are lucky, the program crashes |
|
immediately. |
|
|
Moreover, it is not very readable, because seeing a |
@cindex selectors common to hardly-related classes |
@code{cell+} tells us neither which kind of structure is |
Now consider the case when you want to have a selector (or several) |
accessed nor what field is accessed; we have to somehow infer the kind |
available in two classes: You would have to add the selector to a |
of structure, and then look up in the documentation, which field of |
common ancestor class, in the worst case to @code{object}. You |
that structure corresponds to that offset. |
may not want to do this, e.g., because someone else is responsible for |
|
this ancestor class. |
|
|
Finally, this kind of address arithmetic also causes maintenance |
The solution for this problem is interfaces. An interface is a |
troubles: If you add or delete a field somewhere in the middle of the |
collection of selectors. If a class implements an interface, the |
structure, you have to find and change all computations for the fields |
selectors become available to the class and its descendents. A class |
afterwards. |
can implement an unlimited number of interfaces. For the problem |
|
discussed above, we would define an interface for the selector(s), and |
|
both classes would implement the interface. |
|
|
So, instead of using @code{cell+} and friends directly, how |
As an example, consider an interface @code{storage} for |
about storing the offsets in constants: |
writing objects to disk and getting them back, and a class |
|
@code{foo} that implements it. The code would look like this: |
|
|
|
@cindex @code{interface} usage |
|
@cindex @code{end-interface} usage |
|
@cindex @code{implementation} usage |
@example |
@example |
0 constant a-offset |
interface |
0 float+ constant b-offset |
selector write ( file object -- ) |
0 float+ cell+ faligned c-offset |
selector read1 ( file object -- ) |
@end example |
end-interface storage |
|
|
Now we can get the address of field @code{x} with @code{x-offset |
bar class |
+}. This is much better in all respects. Of course, you still |
storage implementation |
have to change all later offset definitions if you add a field. You can |
|
fix this by declaring the offsets in the following way: |
|
|
|
@example |
... overrides write |
0 constant a-offset |
... overrides read1 |
a-offset float+ constant b-offset |
... |
b-offset cell+ faligned constant c-offset |
end-class foo |
@end example |
@end example |
|
|
Since we always use the offsets with @code{+}, we could use a defining |
@noindent |
word @code{cfield} that includes the @code{+} in the action of the |
(I would add a word @code{read} @i{( file -- object )} that uses |
defined word: |
@code{read1} internally, but that's beyond the point illustrated |
|
here.) |
|
|
@example |
Note that you cannot use @code{protected} in an interface; and |
: cfield ( n "name" -- ) |
of course you cannot define fields. |
create , |
|
does> ( name execution: addr1 -- addr2 ) |
|
@@ + ; |
|
|
|
0 cfield a |
In the Neon model, all selectors are available for all classes; |
0 a float+ cfield b |
therefore it does not need interfaces. The price you pay in this model |
0 b cell+ faligned cfield c |
is slower late binding, and therefore, added complexity to avoid late |
@end example |
binding. |
|
|
Instead of @code{x-offset +}, we now simply write @code{x}. |
@node Objects Implementation, Objects Glossary, Object Interfaces, Objects |
|
@subsubsection @file{objects.fs} Implementation |
|
@cindex @file{objects.fs} implementation |
|
|
The structure field words now can be used quite nicely. However, |
@cindex @code{object-map} discussion |
their definition is still a bit cumbersome: We have to repeat the |
An object is a piece of memory, like one of the data structures |
name, the information about size and alignment is distributed before |
described with @code{struct...end-struct}. It has a field |
and after the field definitions etc. The structure package presented |
@code{object-map} that points to the method map for the object's |
here addresses these problems. |
class. |
|
|
@node Structure Usage, Structure Naming Convention, Why explicit structure support?, Structures |
@cindex method map |
@subsection Structure Usage |
@cindex virtual function table |
@cindex structure usage |
The @emph{method map}@footnote{This is Self terminology; in C++ |
|
terminology: virtual function table.} is an array that contains the |
|
execution tokens (@i{xt}s) of the methods for the object's class. Each |
|
selector contains an offset into a method map. |
|
|
|
@cindex @code{selector} implementation, class |
|
@code{selector} is a defining word that uses |
|
@code{CREATE} and @code{DOES>}. The body of the |
|
selector contains the offset; the @code{DOES>} action for a |
|
class selector is, basically: |
|
|
@cindex @code{field} usage |
|
@cindex @code{struct} usage |
|
@cindex @code{end-struct} usage |
|
You can define a structure for a (data-less) linked list with: |
|
@example |
@example |
struct |
( object addr ) @@ over object-map @@ + @@ execute |
cell% field list-next |
|
end-struct list% |
|
@end example |
@end example |
|
|
With the address of the list node on the stack, you can compute the |
Since @code{object-map} is the first field of the object, it |
address of the field that contains the address of the next node with |
does not generate any code. As you can see, calling a selector has a |
@code{list-next}. E.g., you can determine the length of a list |
small, constant cost. |
with: |
|
|
|
@example |
|
: list-length ( list -- n ) |
|
\ "list" is a pointer to the first element of a linked list |
|
\ "n" is the length of the list |
|
0 BEGIN ( list1 n1 ) |
|
over |
|
WHILE ( list1 n1 ) |
|
1+ swap list-next @@ swap |
|
REPEAT |
|
nip ; |
|
@end example |
|
|
|
You can reserve memory for a list node in the dictionary with |
|
@code{list% %allot}, which leaves the address of the list node on the |
|
stack. For the equivalent allocation on the heap you can use @code{list% |
|
%alloc} (or, for an @code{allocate}-like stack effect (i.e., with ior), |
|
use @code{list% %allocate}). You can get the the size of a list |
|
node with @code{list% %size} and its alignment with @code{list% |
|
%alignment}. |
|
|
|
Note that in ANS Forth the body of a @code{create}d word is |
|
@code{aligned} but not necessarily @code{faligned}; |
|
therefore, if you do a: |
|
@example |
|
create @emph{name} foo% %allot |
|
@end example |
|
|
|
@noindent |
|
then the memory alloted for @code{foo%} is |
|
guaranteed to start at the body of @code{@emph{name}} only if |
|
@code{foo%} contains only character, cell and double fields. |
|
|
|
@cindex structures containing structures |
|
You can include a structure @code{foo%} as a field of |
|
another structure, like this: |
|
@example |
|
struct |
|
... |
|
foo% field ... |
|
... |
|
end-struct ... |
|
@end example |
|
|
|
@cindex structure extension |
|
@cindex extended records |
|
Instead of starting with an empty structure, you can extend an |
|
existing structure. E.g., a plain linked list without data, as defined |
|
above, is hardly useful; You can extend it to a linked list of integers, |
|
like this:@footnote{This feature is also known as @emph{extended |
|
records}. It is the main innovation in the Oberon language; in other |
|
words, adding this feature to Modula-2 led Wirth to create a new |
|
language, write a new compiler etc. Adding this feature to Forth just |
|
required a few lines of code.} |
|
|
|
@example |
|
list% |
|
cell% field intlist-int |
|
end-struct intlist% |
|
@end example |
|
|
|
@code{intlist%} is a structure with two fields: |
|
@code{list-next} and @code{intlist-int}. |
|
|
|
@cindex structures containing arrays |
|
You can specify an array type containing @emph{n} elements of |
|
type @code{foo%} like this: |
|
|
|
@example |
|
foo% @emph{n} * |
|
@end example |
|
|
|
You can use this array type in any place where you can use a normal |
|
type, e.g., when defining a @code{field}, or with |
|
@code{%allot}. |
|
|
|
@cindex first field optimization |
|
The first field is at the base address of a structure and the word |
|
for this field (e.g., @code{list-next}) actually does not change |
|
the address on the stack. You may be tempted to leave it away in the |
|
interest of run-time and space efficiency. This is not necessary, |
|
because the structure package optimizes this case and compiling such |
|
words does not generate any code. So, in the interest of readability |
|
and maintainability you should include the word for the field when |
|
accessing the field. |
|
|
|
@node Structure Naming Convention, Structure Implementation, Structure Usage, Structures |
|
@subsection Structure Naming Convention |
|
@cindex structure naming convention |
|
|
|
The field names that come to (my) mind are often quite generic, and, |
|
if used, would cause frequent name clashes. E.g., many structures |
|
probably contain a @code{counter} field. The structure names |
|
that come to (my) mind are often also the logical choice for the names |
|
of words that create such a structure. |
|
|
|
Therefore, I have adopted the following naming conventions: |
|
|
|
@itemize @bullet |
|
@cindex field naming convention |
|
@item |
|
The names of fields are of the form |
|
@code{@emph{struct}-@emph{field}}, where |
|
@code{@emph{struct}} is the basic name of the structure, and |
|
@code{@emph{field}} is the basic name of the field. You can |
|
think of field words as converting the (address of the) |
|
structure into the (address of the) field. |
|
|
|
@cindex structure naming convention |
|
@item |
|
The names of structures are of the form |
|
@code{@emph{struct}%}, where |
|
@code{@emph{struct}} is the basic name of the structure. |
|
@end itemize |
|
|
|
This naming convention does not work that well for fields of extended |
|
structures; e.g., the integer list structure has a field |
|
@code{intlist-int}, but has @code{list-next}, not |
|
@code{intlist-next}. |
|
|
|
@node Structure Implementation, Structure Glossary, Structure Naming Convention, Structures |
@cindex @code{current-interface} discussion |
@subsection Structure Implementation |
@cindex class implementation and representation |
@cindex structure implementation |
A class is basically a @code{struct} combined with a method |
@cindex implementation of structures |
map. During the class definition the alignment and size of the class |
|
are passed on the stack, just as with @code{struct}s, so |
|
@code{field} can also be used for defining class |
|
fields. However, passing more items on the stack would be |
|
inconvenient, so @code{class} builds a data structure in memory, |
|
which is accessed through the variable |
|
@code{current-interface}. After its definition is complete, the |
|
class is represented on the stack by a pointer (e.g., as parameter for |
|
a child class definition). |
|
|
The central idea in the implementation is to pass the data about the |
A new class starts off with the alignment and size of its parent, |
structure being built on the stack, not in some global |
and a copy of the parent's method map. Defining new fields extends the |
variable. Everything else falls into place naturally once this design |
size and alignment; likewise, defining new selectors extends the |
decision is made. |
method map. @code{overrides} just stores a new @i{xt} in the method |
|
map at the offset given by the selector. |
|
|
The type description on the stack is of the form @emph{align |
@cindex class binding, implementation |
size}. Keeping the size on the top-of-stack makes dealing with arrays |
Class binding just gets the @i{xt} at the offset given by the selector |
very simple. |
from the class's method map and @code{compile,}s (in the case of |
|
@code{[bind]}) it. |
|
|
@code{field} is a defining word that uses @code{Create} |
@cindex @code{this} implementation |
and @code{DOES>}. The body of the field contains the offset |
@cindex @code{catch} and @code{this} |
of the field, and the normal @code{DOES>} action is simply: |
@cindex @code{this} and @code{catch} |
|
I implemented @code{this} as a @code{value}. At the |
|
start of an @code{m:...;m} method the old @code{this} is |
|
stored to the return stack and restored at the end; and the object on |
|
the TOS is stored @code{TO this}. This technique has one |
|
disadvantage: If the user does not leave the method via |
|
@code{;m}, but via @code{throw} or @code{exit}, |
|
@code{this} is not restored (and @code{exit} may |
|
crash). To deal with the @code{throw} problem, I have redefined |
|
@code{catch} to save and restore @code{this}; the same |
|
should be done with any word that can catch an exception. As for |
|
@code{exit}, I simply forbid it (as a replacement, there is |
|
@code{exitm}). |
|
|
|
@cindex @code{inst-var} implementation |
|
@code{inst-var} is just the same as @code{field}, with |
|
a different @code{DOES>} action: |
@example |
@example |
@@ + |
@@ this + |
@end example |
@end example |
|
Similar for @code{inst-value}. |
|
|
@noindent |
@cindex class scoping implementation |
i.e., add the offset to the address, giving the stack effect |
Each class also has a word list that contains the words defined with |
@i{addr1 -- addr2} for a field. |
@code{inst-var} and @code{inst-value}, and its protected |
|
words. It also has a pointer to its parent. @code{class} pushes |
@cindex first field optimization, implementation |
the word lists of the class and all its ancestors onto the search order stack, |
This simple structure is slightly complicated by the optimization |
and @code{end-class} drops them. |
for fields with offset 0, which requires a different |
|
@code{DOES>}-part (because we cannot rely on there being |
|
something on the stack if such a field is invoked during |
|
compilation). Therefore, we put the different @code{DOES>}-parts |
|
in separate words, and decide which one to invoke based on the |
|
offset. For a zero offset, the field is basically a noop; it is |
|
immediate, and therefore no code is generated when it is compiled. |
|
|
|
@node Structure Glossary, , Structure Implementation, Structures |
|
@subsection Structure Glossary |
|
@cindex structure glossary |
|
|
|
|
|
doc-%align |
|
doc-%alignment |
|
doc-%alloc |
|
doc-%allocate |
|
doc-%allot |
|
doc-cell% |
|
doc-char% |
|
doc-dfloat% |
|
doc-double% |
|
doc-end-struct |
|
doc-field |
|
doc-float% |
|
doc-naligned |
|
doc-sfloat% |
|
doc-%size |
|
doc-struct |
|
|
|
|
|
@c ------------------------------------------------------------- |
|
@node Object-oriented Forth, Passing Commands to the OS, Structures, Words |
|
@section Object-oriented Forth |
|
|
|
Gforth comes with three packages for object-oriented programming: |
|
@file{objects.fs}, @file{oof.fs}, and @file{mini-oof.fs}; none of them |
|
is preloaded, so you have to @code{include} them before use. The most |
|
important differences between these packages (and others) are discussed |
|
in @ref{Comparison with other object models}. All packages are written |
|
in ANS Forth and can be used with any other ANS Forth. |
|
|
|
@menu |
|
* Why object-oriented programming?:: |
|
* Object-Oriented Terminology:: |
|
* Objects:: |
|
* OOF:: |
|
* Mini-OOF:: |
|
* Comparison with other object models:: |
|
@end menu |
|
|
|
@c ---------------------------------------------------------------- |
|
@node Why object-oriented programming?, Object-Oriented Terminology, Object-oriented Forth, Object-oriented Forth |
|
@subsection Why object-oriented programming? |
|
@cindex object-oriented programming motivation |
|
@cindex motivation for object-oriented programming |
|
|
|
Often we have to deal with several data structures (@emph{objects}), |
|
that have to be treated similarly in some respects, but differently in |
|
others. Graphical objects are the textbook example: circles, triangles, |
|
dinosaurs, icons, and others, and we may want to add more during program |
|
development. We want to apply some operations to any graphical object, |
|
e.g., @code{draw} for displaying it on the screen. However, @code{draw} |
|
has to do something different for every kind of object. |
|
@comment TODO add some other operations eg perimeter, area |
|
@comment and tie in to concrete examples later.. |
|
|
|
We could implement @code{draw} as a big @code{CASE} |
|
control structure that executes the appropriate code depending on the |
|
kind of object to be drawn. This would be not be very elegant, and, |
|
moreover, we would have to change @code{draw} every time we add |
|
a new kind of graphical object (say, a spaceship). |
|
|
|
What we would rather do is: When defining spaceships, we would tell |
|
the system: ``Here's how you @code{draw} a spaceship; you figure |
|
out the rest''. |
|
|
|
This is the problem that all systems solve that (rightfully) call |
@cindex interface implementation |
themselves object-oriented; the object-oriented packages presented here |
An interface is like a class without fields, parent and protected |
solve this problem (and not much else). |
words; i.e., it just has a method map. If a class implements an |
@comment TODO ?list properties of oo systems.. oo vs o-based? |
interface, its method map contains a pointer to the method map of the |
|
interface. The positive offsets in the map are reserved for class |
|
methods, therefore interface map pointers have negative |
|
offsets. Interfaces have offsets that are unique throughout the |
|
system, unlike class selectors, whose offsets are only unique for the |
|
classes where the selector is available (invokable). |
|
|
@c ------------------------------------------------------------------------ |
This structure means that interface selectors have to perform one |
@node Object-Oriented Terminology, Objects, Why object-oriented programming?, Object-oriented Forth |
indirection more than class selectors to find their method. Their body |
@subsection Object-Oriented Terminology |
contains the interface map pointer offset in the class method map, and |
@cindex object-oriented terminology |
the method offset in the interface method map. The |
@cindex terminology for object-oriented programming |
@code{does>} action for an interface selector is, basically: |
|
|
This section is mainly for reference, so you don't have to understand |
@example |
all of it right away. The terminology is mainly Smalltalk-inspired. In |
( object selector-body ) |
short: |
2dup selector-interface @@ ( object selector-body object interface-offset ) |
|
swap object-map @@ + @@ ( object selector-body map ) |
|
swap selector-offset @@ + @@ execute |
|
@end example |
|
|
@table @emph |
where @code{object-map} and @code{selector-offset} are |
@cindex class |
first fields and generate no code. |
@item class |
|
a data structure definition with some extras. |
|
|
|
@cindex object |
As a concrete example, consider the following code: |
@item object |
|
an instance of the data structure described by the class definition. |
|
|
|
@cindex instance variables |
@example |
@item instance variables |
interface |
fields of the data structure. |
selector if1sel1 |
|
selector if1sel2 |
|
end-interface if1 |
|
|
@cindex selector |
object class |
@cindex method selector |
if1 implementation |
@cindex virtual function |
selector cl1sel1 |
@item selector |
cell% inst-var cl1iv1 |
(or @emph{method selector}) a word (e.g., |
|
@code{draw}) that performs an operation on a variety of data |
|
structures (classes). A selector describes @emph{what} operation to |
|
perform. In C++ terminology: a (pure) virtual function. |
|
|
|
@cindex method |
' m1 overrides construct |
@item method |
' m2 overrides if1sel1 |
the concrete definition that performs the operation |
' m3 overrides if1sel2 |
described by the selector for a specific class. A method specifies |
' m4 overrides cl1sel2 |
@emph{how} the operation is performed for a specific class. |
end-class cl1 |
|
|
@cindex selector invocation |
create obj1 object dict-new drop |
@cindex message send |
create obj2 cl1 dict-new drop |
@cindex invoking a selector |
@end example |
@item selector invocation |
|
a call of a selector. One argument of the call (the TOS (top-of-stack)) |
|
is used for determining which method is used. In Smalltalk terminology: |
|
a message (consisting of the selector and the other arguments) is sent |
|
to the object. |
|
|
|
@cindex receiving object |
The data structure created by this code (including the data structure |
@item receiving object |
for @code{object}) is shown in the |
the object used for determining the method executed by a selector |
@uref{objects-implementation.eps,figure}, assuming a cell size of 4. |
invocation. In the @file{objects.fs} model, it is the object that is on |
@comment TODO add this diagram.. |
the TOS when the selector is invoked. (@emph{Receiving} comes from |
|
the Smalltalk @emph{message} terminology.) |
|
|
|
@cindex child class |
@node Objects Glossary, , Objects Implementation, Objects |
@cindex parent class |
@subsubsection @file{objects.fs} Glossary |
@cindex inheritance |
@cindex @file{objects.fs} Glossary |
@item child class |
|
a class that has (@emph{inherits}) all properties (instance variables, |
|
selectors, methods) from a @emph{parent class}. In Smalltalk |
|
terminology: The subclass inherits from the superclass. In C++ |
|
terminology: The derived class inherits from the base class. |
|
|
|
@end table |
|
|
|
@c If you wonder about the message sending terminology, it comes from |
doc---objects-bind |
@c a time when each object had it's own task and objects communicated via |
doc---objects-<bind> |
@c message passing; eventually the Smalltalk developers realized that |
doc---objects-bind' |
@c they can do most things through simple (indirect) calls. They kept the |
doc---objects-[bind] |
@c terminology. |
doc---objects-class |
|
doc---objects-class->map |
|
doc---objects-class-inst-size |
|
doc---objects-class-override! |
|
doc---objects-construct |
|
doc---objects-current' |
|
doc---objects-[current] |
|
doc---objects-current-interface |
|
doc---objects-dict-new |
|
doc---objects-drop-order |
|
doc---objects-end-class |
|
doc---objects-end-class-noname |
|
doc---objects-end-interface |
|
doc---objects-end-interface-noname |
|
doc---objects-end-methods |
|
doc---objects-exitm |
|
doc---objects-heap-new |
|
doc---objects-implementation |
|
doc---objects-init-object |
|
doc---objects-inst-value |
|
doc---objects-inst-var |
|
doc---objects-interface |
|
doc---objects-m: |
|
doc---objects-:m |
|
doc---objects-;m |
|
doc---objects-method |
|
doc---objects-methods |
|
doc---objects-object |
|
doc---objects-overrides |
|
doc---objects-[parent] |
|
doc---objects-print |
|
doc---objects-protected |
|
doc---objects-public |
|
@c !! push-order conflicts |
|
doc---objects-push-order |
|
doc---objects-selector |
|
doc---objects-this |
|
doc---objects-<to-inst> |
|
doc---objects-[to-inst] |
|
doc---objects-to-this |
|
doc---objects-xt-new |
|
|
@c -------------------------------------------------------------- |
|
@node Objects, OOF, Object-Oriented Terminology, Object-oriented Forth |
@c ------------------------------------------------------------- |
@subsection The @file{objects.fs} model |
@node OOF, Mini-OOF, Objects, Object-oriented Forth |
@cindex objects |
@subsection The @file{oof.fs} model |
|
@cindex oof |
@cindex object-oriented programming |
@cindex object-oriented programming |
|
|
@cindex @file{objects.fs} |
@cindex @file{objects.fs} |
@cindex @file{oof.fs} |
@cindex @file{oof.fs} |
|
|
This section describes the @file{objects.fs} package. This material also |
This section describes the @file{oof.fs} package. |
has been published in M. Anton Ertl, |
|
@cite{@uref{http://www.complang.tuwien.ac.at/forth/objects/objects.html, |
|
Yet Another Forth Objects Package}}, Forth Dimensions 19(2), pages |
|
37--43. |
|
@c McKewan's and Zsoter's packages |
|
|
|
This section assumes that you have read @ref{Structures}. |
|
|
|
The techniques on which this model is based have been used to implement |
The package described in this section has been used in bigFORTH since 1991, and |
the parser generator, Gray, and have also been used in Gforth for |
used for two large applications: a chromatographic system used to |
implementing the various flavours of word lists (hashed or not, |
create new medicaments, and a graphic user interface library (MINOS). |
case-sensitive or not, special-purpose word lists for locals etc.). |
|
|
|
|
You can find a description (in German) of @file{oof.fs} in @cite{Object |
|
oriented bigFORTH} by Bernd Paysan, published in @cite{Vierte Dimension} |
|
10(2), 1994. |
|
|
@menu |
@menu |
* Properties of the Objects model:: |
* Properties of the OOF model:: |
* Basic Objects Usage:: |
* Basic OOF Usage:: |
* The Objects base class:: |
* The OOF base class:: |
* Creating objects:: |
* Class Declaration:: |
* Object-Oriented Programming Style:: |
* Class Implementation:: |
* Class Binding:: |
|
* Method conveniences:: |
|
* Classes and Scoping:: |
|
* Dividing classes:: |
|
* Object Interfaces:: |
|
* Objects Implementation:: |
|
* Objects Glossary:: |
|
@end menu |
@end menu |
|
|
Marcel Hendrix provided helpful comments on this section. Andras Zsoter |
@node Properties of the OOF model, Basic OOF Usage, OOF, OOF |
and Bernd Paysan helped me with the related works section. |
@subsubsection Properties of the @file{oof.fs} model |
|
@cindex @file{oof.fs} properties |
@node Properties of the Objects model, Basic Objects Usage, Objects, Objects |
|
@subsubsection Properties of the @file{objects.fs} model |
|
@cindex @file{objects.fs} properties |
|
|
|
@itemize @bullet |
@itemize @bullet |
@item |
@item |
It is straightforward to pass objects on the stack. Passing |
This model combines object oriented programming with information |
selectors on the stack is a little less convenient, but possible. |
hiding. It helps you writing large application, where scoping is |
|
necessary, because it provides class-oriented scoping. |
@item |
|
Objects are just data structures in memory, and are referenced by their |
|
address. You can create words for objects with normal defining words |
|
like @code{constant}. Likewise, there is no difference between instance |
|
variables that contain objects and those that contain other data. |
|
|
|
@item |
@item |
Late binding is efficient and easy to use. |
Named objects, object pointers, and object arrays can be created, |
|
selector invocation uses the ``object selector'' syntax. Selector invocation |
|
to objects and/or selectors on the stack is a bit less convenient, but |
|
possible. |
|
|
@item |
@item |
It avoids parsing, and thus avoids problems with state-smartness |
Selector invocation and instance variable usage of the active object is |
and reduced extensibility; for convenience there are a few parsing |
straightforward, since both make use of the active object. |
words, but they have non-parsing counterparts. There are also a few |
|
defining words that parse. This is hard to avoid, because all standard |
|
defining words parse (except @code{:noname}); however, such |
|
words are not as bad as many other parsing words, because they are not |
|
state-smart. |
|
|
|
@item |
@item |
It does not try to incorporate everything. It does a few things and does |
Late binding is efficient and easy to use. |
them well (IMO). In particular, this model was not designed to support |
|
information hiding (although it has features that may help); you can use |
|
a separate package for achieving this. |
|
|
|
@item |
@item |
It is layered; you don't have to learn and use all features to use this |
State-smart objects parse selectors. However, extensibility is provided |
model. Only a few features are necessary (@pxref{Basic Objects Usage}, |
using a (parsing) selector @code{postpone} and a selector @code{'}. |
@pxref{The Objects base class}, @pxref{Creating objects}.), the others |
|
are optional and independent of each other. |
|
|
|
@item |
@item |
An implementation in ANS Forth is available. |
An implementation in ANS Forth is available. |
Line 10467 An implementation in ANS Forth is availa
|
Line 10499 An implementation in ANS Forth is availa
|
@end itemize |
@end itemize |
|
|
|
|
@node Basic Objects Usage, The Objects base class, Properties of the Objects model, Objects |
@node Basic OOF Usage, The OOF base class, Properties of the OOF model, OOF |
@subsubsection Basic @file{objects.fs} Usage |
@subsubsection Basic @file{oof.fs} Usage |
@cindex basic objects usage |
@cindex @file{oof.fs} usage |
@cindex objects, basic usage |
|
|
This section uses the same example as for @code{objects} (@pxref{Basic Objects Usage}). |
|
|
You can define a class for graphical objects like this: |
You can define a class for graphical objects like this: |
|
|
@cindex @code{class} usage |
@cindex @code{class} usage |
@cindex @code{end-class} usage |
@cindex @code{class;} usage |
@cindex @code{selector} usage |
@cindex @code{method} usage |
@example |
@example |
object class \ "object" is the parent class |
object class graphical \ "object" is the parent class |
selector draw ( x y graphical -- ) |
method draw ( x y graphical -- ) |
end-class graphical |
class; |
@end example |
@end example |
|
|
This code defines a class @code{graphical} with an |
This code defines a class @code{graphical} with an |
Line 10492 operation @code{draw}. We can perform t
|
Line 10525 operation @code{draw}. We can perform t
|
@end example |
@end example |
|
|
@noindent |
@noindent |
where @code{t-rex} is a word (say, a constant) that produces a |
where @code{t-rex} is an object or object pointer, created with e.g. |
graphical object. |
@code{graphical : t-rex}. |
|
|
@comment TODO add a 2nd operation eg perimeter.. and use for |
|
@comment a concrete example |
|
|
|
@cindex abstract class |
@cindex abstract class |
How do we create a graphical object? With the present definitions, |
How do we create a graphical object? With the present definitions, |
Line 10509 any concrete graphical object type (C++
|
Line 10539 any concrete graphical object type (C++
|
For concrete graphical objects, we define child classes of the |
For concrete graphical objects, we define child classes of the |
class @code{graphical}, e.g.: |
class @code{graphical}, e.g.: |
|
|
@cindex @code{overrides} usage |
|
@cindex @code{field} usage in class definition |
|
@example |
@example |
graphical class \ "graphical" is the parent class |
graphical class circle \ "graphical" is the parent class |
cell% field circle-radius |
cell var circle-radius |
|
how: |
:noname ( x y circle -- ) |
: draw ( x y -- ) |
circle-radius @@ draw-circle ; |
circle-radius @@ draw-circle ; |
overrides draw |
|
|
|
:noname ( n-radius circle -- ) |
|
circle-radius ! ; |
|
overrides construct |
|
|
|
end-class circle |
: init ( n-radius -- ( |
|
circle-radius ! ; |
|
class; |
@end example |
@end example |
|
|
Here we define a class @code{circle} as a child of @code{graphical}, |
Here we define a class @code{circle} as a child of @code{graphical}, |
with field @code{circle-radius} (which behaves just like a field |
with a field @code{circle-radius}; it defines new methods for the |
(@pxref{Structures}); it defines (using @code{overrides}) new methods |
selectors @code{draw} and @code{init} (@code{init} is defined in |
for the selectors @code{draw} and @code{construct} (@code{construct} is |
@code{object}, the parent class of @code{graphical}). |
defined in @code{object}, the parent class of @code{graphical}). |
|
|
|
Now we can create a circle on the heap (i.e., |
Now we can create a circle in the dictionary with: |
@code{allocate}d memory) with: |
|
|
|
@cindex @code{heap-new} usage |
|
@example |
@example |
50 circle heap-new constant my-circle |
50 circle : my-circle |
@end example |
@end example |
|
|
@noindent |
@noindent |
@code{heap-new} invokes @code{construct}, thus |
@code{:} invokes @code{init}, thus initializing the field |
initializing the field @code{circle-radius} with 50. We can draw |
@code{circle-radius} with 50. We can draw this new circle at (100,100) |
this new circle at (100,100) with: |
with: |
|
|
@example |
@example |
100 100 my-circle draw |
100 100 my-circle draw |
Line 10551 this new circle at (100,100) with:
|
Line 10573 this new circle at (100,100) with:
|
|
|
@cindex selector invocation, restrictions |
@cindex selector invocation, restrictions |
@cindex class definition, restrictions |
@cindex class definition, restrictions |
Note: You can only invoke a selector if the object on the TOS |
Note: You can only invoke a selector if the receiving object belongs to |
(the receiving object) belongs to the class where the selector was |
the class where the selector was defined or one of its descendents; |
defined or one of its descendents; e.g., you can invoke |
e.g., you can invoke @code{draw} only for objects belonging to |
@code{draw} only for objects belonging to @code{graphical} |
@code{graphical} or its descendents (e.g., @code{circle}). The scoping |
or its descendents (e.g., @code{circle}). Immediately before |
mechanism will check if you try to invoke a selector that is not |
@code{end-class}, the search order has to be the same as |
defined in this class hierarchy, so you'll get an error at compilation |
immediately after @code{class}. |
time. |
|
|
@node The Objects base class, Creating objects, Basic Objects Usage, Objects |
|
@subsubsection The @file{object.fs} base class |
@node The OOF base class, Class Declaration, Basic OOF Usage, OOF |
@cindex @code{object} class |
@subsubsection The @file{oof.fs} base class |
|
@cindex @file{oof.fs} base class |
|
|
When you define a class, you have to specify a parent class. So how do |
When you define a class, you have to specify a parent class. So how do |
you start defining classes? There is one class available from the start: |
you start defining classes? There is one class available from the start: |
@code{object}. It is ancestor for all classes and so is the |
@code{object}. You have to use it as ancestor for all classes. It is the |
only class that has no parent. It has two selectors: @code{construct} |
only class that has no parent. Classes are also objects, except that |
and @code{print}. |
they don't have instance variables; class manipulation such as |
|
inheritance or changing definitions of a class is handled through |
|
selectors of the class @code{object}. |
|
|
@node Creating objects, Object-Oriented Programming Style, The Objects base class, Objects |
@code{object} provides a number of selectors: |
@subsubsection Creating objects |
|
@cindex creating objects |
|
@cindex object creation |
|
@cindex object allocation options |
|
|
|
@cindex @code{heap-new} discussion |
@itemize @bullet |
@cindex @code{dict-new} discussion |
@item |
@cindex @code{construct} discussion |
@code{class} for subclassing, @code{definitions} to add definitions |
You can create and initialize an object of a class on the heap with |
later on, and @code{class?} to get type informations (is the class a |
@code{heap-new} ( ... class -- object ) and in the dictionary |
subclass of the class passed on the stack?). |
(allocation with @code{allot}) with @code{dict-new} ( |
|
... class -- object ). Both words invoke @code{construct}, which |
|
consumes the stack items indicated by "..." above. |
|
|
|
@cindex @code{init-object} discussion |
doc---object-class |
@cindex @code{class-inst-size} discussion |
doc---object-definitions |
If you want to allocate memory for an object yourself, you can get its |
doc---object-class? |
alignment and size with @code{class-inst-size 2@@} ( class -- |
|
align size ). Once you have memory for an object, you can initialize |
|
it with @code{init-object} ( ... class object -- ); |
|
@code{construct} does only a part of the necessary work. |
|
|
|
@node Object-Oriented Programming Style, Class Binding, Creating objects, Objects |
|
@subsubsection Object-Oriented Programming Style |
|
@cindex object-oriented programming style |
|
@cindex programming style, object-oriented |
|
|
|
This section is not exhaustive. |
@item |
|
@code{init} and @code{dispose} as constructor and destructor of the |
|
object. @code{init} is invocated after the object's memory is allocated, |
|
while @code{dispose} also handles deallocation. Thus if you redefine |
|
@code{dispose}, you have to call the parent's dispose with @code{super |
|
dispose}, too. |
|
|
@cindex stack effects of selectors |
doc---object-init |
@cindex selectors and stack effects |
doc---object-dispose |
In general, it is a good idea to ensure that all methods for the |
|
same selector have the same stack effect: when you invoke a selector, |
|
you often have no idea which method will be invoked, so, unless all |
|
methods have the same stack effect, you will not know the stack effect |
|
of the selector invocation. |
|
|
|
One exception to this rule is methods for the selector |
|
@code{construct}. We know which method is invoked, because we |
|
specify the class to be constructed at the same place. Actually, I |
|
defined @code{construct} as a selector only to give the users a |
|
convenient way to specify initialization. The way it is used, a |
|
mechanism different from selector invocation would be more natural |
|
(but probably would take more code and more space to explain). |
|
|
|
@node Class Binding, Method conveniences, Object-Oriented Programming Style, Objects |
@item |
@subsubsection Class Binding |
@code{new}, @code{new[]}, @code{:}, @code{ptr}, @code{asptr}, and |
@cindex class binding |
@code{[]} to create named and unnamed objects and object arrays or |
@cindex early binding |
object pointers. |
|
|
@cindex late binding |
doc---object-new |
Normal selector invocations determine the method at run-time depending |
doc---object-new[] |
on the class of the receiving object. This run-time selection is called |
doc---object-: |
@i{late binding}. |
doc---object-ptr |
|
doc---object-asptr |
|
doc---object-[] |
|
|
Sometimes it's preferable to invoke a different method. For example, |
|
you might want to use the simple method for @code{print}ing |
|
@code{object}s instead of the possibly long-winded @code{print} method |
|
of the receiver class. You can achieve this by replacing the invocation |
|
of @code{print} with: |
|
|
|
@cindex @code{[bind]} usage |
@item |
@example |
@code{::} and @code{super} for explicit scoping. You should use explicit |
[bind] object print |
scoping only for super classes or classes with the same set of instance |
@end example |
variables. Explicitly-scoped selectors use early binding. |
|
|
@noindent |
doc---object-:: |
in compiled code or: |
doc---object-super |
|
|
@cindex @code{bind} usage |
|
@example |
|
bind object print |
|
@end example |
|
|
|
@cindex class binding, alternative to |
@item |
@noindent |
@code{self} to get the address of the object |
in interpreted code. Alternatively, you can define the method with a |
|
name (e.g., @code{print-object}), and then invoke it through the |
|
name. Class binding is just a (often more convenient) way to achieve |
|
the same effect; it avoids name clutter and allows you to invoke |
|
methods directly without naming them first. |
|
|
|
@cindex superclass binding |
doc---object-self |
@cindex parent class binding |
|
A frequent use of class binding is this: When we define a method |
|
for a selector, we often want the method to do what the selector does |
|
in the parent class, and a little more. There is a special word for |
|
this purpose: @code{[parent]}; @code{[parent] |
|
@emph{selector}} is equivalent to @code{[bind] @emph{parent |
|
selector}}, where @code{@emph{parent}} is the parent |
|
class of the current class. E.g., a method definition might look like: |
|
|
|
@cindex @code{[parent]} usage |
|
@example |
|
:noname |
|
dup [parent] foo \ do parent's foo on the receiving object |
|
... \ do some more |
|
; overrides foo |
|
@end example |
|
|
|
@cindex class binding as optimization |
@item |
In @cite{Object-oriented programming in ANS Forth} (Forth Dimensions, |
@code{bind}, @code{bound}, @code{link}, and @code{is} to assign object |
March 1997), Andrew McKewan presents class binding as an optimization |
pointers and instance defers. |
technique. I recommend not using it for this purpose unless you are in |
|
an emergency. Late binding is pretty fast with this model anyway, so the |
|
benefit of using class binding is small; the cost of using class binding |
|
where it is not appropriate is reduced maintainability. |
|
|
|
While we are at programming style questions: You should bind |
doc---object-bind |
selectors only to ancestor classes of the receiving object. E.g., say, |
doc---object-bound |
you know that the receiving object is of class @code{foo} or its |
doc---object-link |
descendents; then you should bind only to @code{foo} and its |
doc---object-is |
ancestors. |
|
|
|
@node Method conveniences, Classes and Scoping, Class Binding, Objects |
|
@subsubsection Method conveniences |
|
@cindex method conveniences |
|
|
|
In a method you usually access the receiving object pretty often. If |
@item |
you define the method as a plain colon definition (e.g., with |
@code{'} to obtain selector tokens, @code{send} to invocate selectors |
@code{:noname}), you may have to do a lot of stack |
form the stack, and @code{postpone} to generate selector invocation code. |
gymnastics. To avoid this, you can define the method with @code{m: |
|
... ;m}. E.g., you could define the method for |
doc---object-' |
@code{draw}ing a @code{circle} with |
doc---object-postpone |
|
|
|
|
|
@item |
|
@code{with} and @code{endwith} to select the active object from the |
|
stack, and enable its scope. Using @code{with} and @code{endwith} |
|
also allows you to create code using selector @code{postpone} without being |
|
trapped by the state-smart objects. |
|
|
|
doc---object-with |
|
doc---object-endwith |
|
|
|
|
|
@end itemize |
|
|
|
@node Class Declaration, Class Implementation, The OOF base class, OOF |
|
@subsubsection Class Declaration |
|
@cindex class declaration |
|
|
|
@itemize @bullet |
|
@item |
|
Instance variables |
|
|
|
doc---oof-var |
|
|
|
|
|
@item |
|
Object pointers |
|
|
|
doc---oof-ptr |
|
doc---oof-asptr |
|
|
|
|
|
@item |
|
Instance defers |
|
|
|
doc---oof-defer |
|
|
|
|
|
@item |
|
Method selectors |
|
|
|
doc---oof-early |
|
doc---oof-method |
|
|
|
|
@cindex @code{this} usage |
@item |
@cindex @code{m:} usage |
Class-wide variables |
@cindex @code{;m} usage |
|
@example |
|
m: ( x y circle -- ) |
|
( x y ) this circle-radius @@ draw-circle ;m |
|
@end example |
|
|
|
@cindex @code{exit} in @code{m: ... ;m} |
doc---oof-static |
@cindex @code{exitm} discussion |
|
@cindex @code{catch} in @code{m: ... ;m} |
|
When this method is executed, the receiver object is removed from the |
|
stack; you can access it with @code{this} (admittedly, in this |
|
example the use of @code{m: ... ;m} offers no advantage). Note |
|
that I specify the stack effect for the whole method (i.e. including |
|
the receiver object), not just for the code between @code{m:} |
|
and @code{;m}. You cannot use @code{exit} in |
|
@code{m:...;m}; instead, use |
|
@code{exitm}.@footnote{Moreover, for any word that calls |
|
@code{catch} and was defined before loading |
|
@code{objects.fs}, you have to redefine it like I redefined |
|
@code{catch}: @code{: catch this >r catch r> to-this ;}} |
|
|
|
@cindex @code{inst-var} usage |
|
You will frequently use sequences of the form @code{this |
|
@emph{field}} (in the example above: @code{this |
|
circle-radius}). If you use the field only in this way, you can |
|
define it with @code{inst-var} and eliminate the |
|
@code{this} before the field name. E.g., the @code{circle} |
|
class above could also be defined with: |
|
|
|
@example |
@item |
graphical class |
End declaration |
cell% inst-var radius |
|
|
|
m: ( x y circle -- ) |
doc---oof-how: |
radius @@ draw-circle ;m |
doc---oof-class; |
overrides draw |
|
|
|
m: ( n-radius circle -- ) |
|
radius ! ;m |
|
overrides construct |
|
|
|
end-class circle |
@end itemize |
@end example |
|
|
|
@code{radius} can only be used in @code{circle} and its |
@c ------------------------------------------------------------- |
descendent classes and inside @code{m:...;m}. |
@node Class Implementation, , Class Declaration, OOF |
|
@subsubsection Class Implementation |
|
@cindex class implementation |
|
|
@cindex @code{inst-value} usage |
@c ------------------------------------------------------------- |
You can also define fields with @code{inst-value}, which is |
@node Mini-OOF, Comparison with other object models, OOF, Object-oriented Forth |
to @code{inst-var} what @code{value} is to |
@subsection The @file{mini-oof.fs} model |
@code{variable}. You can change the value of such a field with |
@cindex mini-oof |
@code{[to-inst]}. E.g., we could also define the class |
|
@code{circle} like this: |
|
|
|
@example |
Gforth's third object oriented Forth package is a 12-liner. It uses a |
graphical class |
mixture of the @file{object.fs} and the @file{oof.fs} syntax, |
inst-value radius |
and reduces to the bare minimum of features. This is based on a posting |
|
of Bernd Paysan in comp.lang.forth. |
|
|
m: ( x y circle -- ) |
@menu |
radius draw-circle ;m |
* Basic Mini-OOF Usage:: |
overrides draw |
* Mini-OOF Example:: |
|
* Mini-OOF Implementation:: |
|
@end menu |
|
|
m: ( n-radius circle -- ) |
@c ------------------------------------------------------------- |
[to-inst] radius ;m |
@node Basic Mini-OOF Usage, Mini-OOF Example, Mini-OOF, Mini-OOF |
overrides construct |
@subsubsection Basic @file{mini-oof.fs} Usage |
|
@cindex mini-oof usage |
|
|
end-class circle |
There is a base class (@code{class}, which allocates one cell for the |
@end example |
object pointer) plus seven other words: to define a method, a variable, |
|
a class; to end a class, to resolve binding, to allocate an object and |
|
to compile a class method. |
|
@comment TODO better description of the last one |
|
|
Finally, you can define named methods with @code{:m}. One use of this |
|
feature is the definition of words that occur only in one class and are |
|
not intended to be overridden, but which still need method context |
|
(e.g., for accessing @code{inst-var}s). Another use is for methods that |
|
would be bound frequently, if defined anonymously. |
|
|
|
|
doc-object |
|
doc-method |
|
doc-var |
|
doc-class |
|
doc-end-class |
|
doc-defines |
|
doc-new |
|
doc-:: |
|
|
@node Classes and Scoping, Dividing classes, Method conveniences, Objects |
|
@subsubsection Classes and Scoping |
|
@cindex classes and scoping |
|
@cindex scoping and classes |
|
|
|
Inheritance is frequent, unlike structure extension. This exacerbates |
|
the problem with the field name convention (@pxref{Structure Naming |
|
Convention}): One always has to remember in which class the field was |
|
originally defined; changing a part of the class structure would require |
|
changes for renaming in otherwise unaffected code. |
|
|
|
@cindex @code{inst-var} visibility |
@c ------------------------------------------------------------- |
@cindex @code{inst-value} visibility |
@node Mini-OOF Example, Mini-OOF Implementation, Basic Mini-OOF Usage, Mini-OOF |
To solve this problem, I added a scoping mechanism (which was not in my |
@subsubsection Mini-OOF Example |
original charter): A field defined with @code{inst-var} (or |
@cindex mini-oof example |
@code{inst-value}) is visible only in the class where it is defined and in |
|
the descendent classes of this class. Using such fields only makes |
|
sense in @code{m:}-defined methods in these classes anyway. |
|
|
|
This scoping mechanism allows us to use the unadorned field name, |
A short example shows how to use this package. This example, in slightly |
because name clashes with unrelated words become much less likely. |
extended form, is supplied as @file{moof-exm.fs} |
|
@comment TODO could flesh this out with some comments from the Forthwrite article |
|
|
@cindex @code{protected} discussion |
@example |
@cindex @code{private} discussion |
object class |
Once we have this mechanism, we can also use it for controlling the |
method init |
visibility of other words: All words defined after |
method draw |
@code{protected} are visible only in the current class and its |
end-class graphical |
descendents. @code{public} restores the compilation |
@end example |
(i.e. @code{current}) word list that was in effect before. If you |
|
have several @code{protected}s without an intervening |
|
@code{public} or @code{set-current}, @code{public} |
|
will restore the compilation word list in effect before the first of |
|
these @code{protected}s. |
|
|
|
@node Dividing classes, Object Interfaces, Classes and Scoping, Objects |
This code defines a class @code{graphical} with an |
@subsubsection Dividing classes |
operation @code{draw}. We can perform the operation |
@cindex Dividing classes |
@code{draw} on any @code{graphical} object, e.g.: |
@cindex @code{methods}...@code{end-methods} |
|
|
|
You may want to do the definition of methods separate from the |
@example |
definition of the class, its selectors, fields, and instance variables, |
100 100 t-rex draw |
i.e., separate the implementation from the definition. You can do this |
@end example |
in the following way: |
|
|
where @code{t-rex} is an object or object pointer, created with e.g. |
|
@code{graphical new Constant t-rex}. |
|
|
|
For concrete graphical objects, we define child classes of the |
|
class @code{graphical}, e.g.: |
|
|
@example |
@example |
graphical class |
graphical class |
inst-value radius |
cell var circle-radius |
end-class circle |
end-class circle \ "graphical" is the parent class |
|
|
... \ do some other stuff |
:noname ( x y -- ) |
|
circle-radius @@ draw-circle ; circle defines draw |
|
:noname ( r -- ) |
|
circle-radius ! ; circle defines init |
|
@end example |
|
|
circle methods \ now we are ready |
There is no implicit init method, so we have to define one. The creation |
|
code of the object now has to call init explicitely. |
|
|
m: ( x y circle -- ) |
@example |
radius draw-circle ;m |
circle new Constant my-circle |
overrides draw |
50 my-circle init |
|
@end example |
|
|
m: ( n-radius circle -- ) |
It is also possible to add a function to create named objects with |
[to-inst] radius ;m |
automatic call of @code{init}, given that all objects have @code{init} |
overrides construct |
on the same place: |
|
|
end-methods |
@example |
|
: new: ( .. o "name" -- ) |
|
new dup Constant init ; |
|
80 circle new: large-circle |
@end example |
@end example |
|
|
You can use several @code{methods}...@code{end-methods} sections. The |
We can draw this new circle at (100,100) with: |
only things you can do to the class in these sections are: defining |
|
methods, and overriding the class's selectors. You must not define new |
|
selectors or fields. |
|
|
|
Note that you often have to override a selector before using it. In |
@example |
particular, you usually have to override @code{construct} with a new |
100 100 my-circle draw |
method before you can invoke @code{heap-new} and friends. E.g., you |
@end example |
must not create a circle before the @code{overrides construct} sequence |
|
in the example above. |
|
|
|
@node Object Interfaces, Objects Implementation, Dividing classes, Objects |
@node Mini-OOF Implementation, , Mini-OOF Example, Mini-OOF |
@subsubsection Object Interfaces |
@subsubsection @file{mini-oof.fs} Implementation |
@cindex object interfaces |
|
@cindex interfaces for objects |
|
|
|
In this model you can only call selectors defined in the class of the |
Object-oriented systems with late binding typically use a |
receiving objects or in one of its ancestors. If you call a selector |
``vtable''-approach: the first variable in each object is a pointer to a |
with a receiving object that is not in one of these classes, the |
table, which contains the methods as function pointers. The vtable |
result is undefined; if you are lucky, the program crashes |
may also contain other information. |
immediately. |
|
|
|
@cindex selectors common to hardly-related classes |
So first, let's declare methods: |
Now consider the case when you want to have a selector (or several) |
|
available in two classes: You would have to add the selector to a |
|
common ancestor class, in the worst case to @code{object}. You |
|
may not want to do this, e.g., because someone else is responsible for |
|
this ancestor class. |
|
|
|
The solution for this problem is interfaces. An interface is a |
@example |
collection of selectors. If a class implements an interface, the |
: method ( m v -- m' v ) Create over , swap cell+ swap |
selectors become available to the class and its descendents. A class |
DOES> ( ... o -- ... ) @@ over @@ + @@ execute ; |
can implement an unlimited number of interfaces. For the problem |
@end example |
discussed above, we would define an interface for the selector(s), and |
|
both classes would implement the interface. |
During method declaration, the number of methods and instance |
|
variables is on the stack (in address units). @code{method} creates |
|
one method and increments the method number. To execute a method, it |
|
takes the object, fetches the vtable pointer, adds the offset, and |
|
executes the @i{xt} stored there. Each method takes the object it is |
|
invoked from as top of stack parameter. The method itself should |
|
consume that object. |
|
|
|
Now, we also have to declare instance variables |
|
|
|
@example |
|
: var ( m v size -- m v' ) Create over , + |
|
DOES> ( o -- addr ) @@ + ; |
|
@end example |
|
|
|
As before, a word is created with the current offset. Instance |
|
variables can have different sizes (cells, floats, doubles, chars), so |
|
all we do is take the size and add it to the offset. If your machine |
|
has alignment restrictions, put the proper @code{aligned} or |
|
@code{faligned} before the variable, to adjust the variable |
|
offset. That's why it is on the top of stack. |
|
|
As an example, consider an interface @code{storage} for |
We need a starting point (the base object) and some syntactic sugar: |
writing objects to disk and getting them back, and a class |
|
@code{foo} that implements it. The code would look like this: |
|
|
|
@cindex @code{interface} usage |
|
@cindex @code{end-interface} usage |
|
@cindex @code{implementation} usage |
|
@example |
@example |
interface |
Create object 1 cells , 2 cells , |
selector write ( file object -- ) |
: class ( class -- class methods vars ) dup 2@@ ; |
selector read1 ( file object -- ) |
@end example |
end-interface storage |
|
|
|
bar class |
For inheritance, the vtable of the parent object has to be |
storage implementation |
copied when a new, derived class is declared. This gives all the |
|
methods of the parent class, which can be overridden, though. |
|
|
... overrides write |
@example |
... overrides read1 |
: end-class ( class methods vars -- ) |
... |
Create here >r , dup , 2 cells ?DO ['] noop , 1 cells +LOOP |
end-class foo |
cell+ dup cell+ r> rot @@ 2 cells /string move ; |
@end example |
@end example |
|
|
@noindent |
The first line creates the vtable, initialized with |
(I would add a word @code{read} @i{( file -- object )} that uses |
@code{noop}s. The second line is the inheritance mechanism, it |
@code{read1} internally, but that's beyond the point illustrated |
copies the xts from the parent vtable. |
here.) |
|
|
|
Note that you cannot use @code{protected} in an interface; and |
|
of course you cannot define fields. |
|
|
|
In the Neon model, all selectors are available for all classes; |
We still have no way to define new methods, let's do that now: |
therefore it does not need interfaces. The price you pay in this model |
|
is slower late binding, and therefore, added complexity to avoid late |
|
binding. |
|
|
|
@node Objects Implementation, Objects Glossary, Object Interfaces, Objects |
@example |
@subsubsection @file{objects.fs} Implementation |
: defines ( xt class -- ) ' >body @@ + ! ; |
@cindex @file{objects.fs} implementation |
@end example |
|
|
@cindex @code{object-map} discussion |
To allocate a new object, we need a word, too: |
An object is a piece of memory, like one of the data structures |
|
described with @code{struct...end-struct}. It has a field |
|
@code{object-map} that points to the method map for the object's |
|
class. |
|
|
|
@cindex method map |
@example |
@cindex virtual function table |
: new ( class -- o ) here over @@ allot swap over ! ; |
The @emph{method map}@footnote{This is Self terminology; in C++ |
@end example |
terminology: virtual function table.} is an array that contains the |
|
execution tokens (@i{xt}s) of the methods for the object's class. Each |
|
selector contains an offset into a method map. |
|
|
|
@cindex @code{selector} implementation, class |
Sometimes derived classes want to access the method of the |
@code{selector} is a defining word that uses |
parent object. There are two ways to achieve this with Mini-OOF: |
@code{CREATE} and @code{DOES>}. The body of the |
first, you could use named words, and second, you could look up the |
selector contains the offset; the @code{DOES>} action for a |
vtable of the parent object. |
class selector is, basically: |
|
|
|
@example |
@example |
( object addr ) @@ over object-map @@ + @@ execute |
: :: ( class "name" -- ) ' >body @@ + @@ compile, ; |
@end example |
@end example |
|
|
Since @code{object-map} is the first field of the object, it |
|
does not generate any code. As you can see, calling a selector has a |
|
small, constant cost. |
|
|
|
@cindex @code{current-interface} discussion |
Nothing can be more confusing than a good example, so here is |
@cindex class implementation and representation |
one. First let's declare a text object (called |
A class is basically a @code{struct} combined with a method |
@code{button}), that stores text and position: |
map. During the class definition the alignment and size of the class |
|
are passed on the stack, just as with @code{struct}s, so |
|
@code{field} can also be used for defining class |
|
fields. However, passing more items on the stack would be |
|
inconvenient, so @code{class} builds a data structure in memory, |
|
which is accessed through the variable |
|
@code{current-interface}. After its definition is complete, the |
|
class is represented on the stack by a pointer (e.g., as parameter for |
|
a child class definition). |
|
|
|
A new class starts off with the alignment and size of its parent, |
@example |
and a copy of the parent's method map. Defining new fields extends the |
object class |
size and alignment; likewise, defining new selectors extends the |
cell var text |
method map. @code{overrides} just stores a new @i{xt} in the method |
cell var len |
map at the offset given by the selector. |
cell var x |
|
cell var y |
|
method init |
|
method draw |
|
end-class button |
|
@end example |
|
|
@cindex class binding, implementation |
@noindent |
Class binding just gets the @i{xt} at the offset given by the selector |
Now, implement the two methods, @code{draw} and @code{init}: |
from the class's method map and @code{compile,}s (in the case of |
|
@code{[bind]}) it. |
|
|
|
@cindex @code{this} implementation |
@example |
@cindex @code{catch} and @code{this} |
:noname ( o -- ) |
@cindex @code{this} and @code{catch} |
>r r@@ x @@ r@@ y @@ at-xy r@@ text @@ r> len @@ type ; |
I implemented @code{this} as a @code{value}. At the |
button defines draw |
start of an @code{m:...;m} method the old @code{this} is |
:noname ( addr u o -- ) |
stored to the return stack and restored at the end; and the object on |
>r 0 r@@ x ! 0 r@@ y ! r@@ len ! r> text ! ; |
the TOS is stored @code{TO this}. This technique has one |
button defines init |
disadvantage: If the user does not leave the method via |
@end example |
@code{;m}, but via @code{throw} or @code{exit}, |
|
@code{this} is not restored (and @code{exit} may |
@noindent |
crash). To deal with the @code{throw} problem, I have redefined |
To demonstrate inheritance, we define a class @code{bold-button}, with no |
@code{catch} to save and restore @code{this}; the same |
new data and no new methods: |
should be done with any word that can catch an exception. As for |
|
@code{exit}, I simply forbid it (as a replacement, there is |
|
@code{exitm}). |
|
|
|
@cindex @code{inst-var} implementation |
|
@code{inst-var} is just the same as @code{field}, with |
|
a different @code{DOES>} action: |
|
@example |
@example |
@@ this + |
button class |
|
end-class bold-button |
|
|
|
: bold 27 emit ." [1m" ; |
|
: normal 27 emit ." [0m" ; |
@end example |
@end example |
Similar for @code{inst-value}. |
|
|
|
@cindex class scoping implementation |
@noindent |
Each class also has a word list that contains the words defined with |
The class @code{bold-button} has a different draw method to |
@code{inst-var} and @code{inst-value}, and its protected |
@code{button}, but the new method is defined in terms of the draw method |
words. It also has a pointer to its parent. @code{class} pushes |
for @code{button}: |
the word lists of the class and all its ancestors onto the search order stack, |
|
and @code{end-class} drops them. |
|
|
|
@cindex interface implementation |
@example |
An interface is like a class without fields, parent and protected |
:noname bold [ button :: draw ] normal ; bold-button defines draw |
words; i.e., it just has a method map. If a class implements an |
@end example |
interface, its method map contains a pointer to the method map of the |
|
interface. The positive offsets in the map are reserved for class |
|
methods, therefore interface map pointers have negative |
|
offsets. Interfaces have offsets that are unique throughout the |
|
system, unlike class selectors, whose offsets are only unique for the |
|
classes where the selector is available (invokable). |
|
|
|
This structure means that interface selectors have to perform one |
@noindent |
indirection more than class selectors to find their method. Their body |
Finally, create two objects and apply methods: |
contains the interface map pointer offset in the class method map, and |
|
the method offset in the interface method map. The |
|
@code{does>} action for an interface selector is, basically: |
|
|
|
@example |
@example |
( object selector-body ) |
button new Constant foo |
2dup selector-interface @@ ( object selector-body object interface-offset ) |
s" thin foo" foo init |
swap object-map @@ + @@ ( object selector-body map ) |
page |
swap selector-offset @@ + @@ execute |
foo draw |
|
bold-button new Constant bar |
|
s" fat bar" bar init |
|
1 bar y ! |
|
bar draw |
@end example |
@end example |
|
|
where @code{object-map} and @code{selector-offset} are |
|
first fields and generate no code. |
|
|
|
As a concrete example, consider the following code: |
@node Comparison with other object models, , Mini-OOF, Object-oriented Forth |
|
@subsection Comparison with other object models |
|
@cindex comparison of object models |
|
@cindex object models, comparison |
|
|
@example |
Many object-oriented Forth extensions have been proposed (@cite{A survey |
interface |
of object-oriented Forths} (SIGPLAN Notices, April 1996) by Bradford |
selector if1sel1 |
J. Rodriguez and W. F. S. Poehlman lists 17). This section discusses the |
selector if1sel2 |
relation of the object models described here to two well-known and two |
end-interface if1 |
closely-related (by the use of method maps) models. Andras Zsoter |
|
helped us with this section. |
|
|
object class |
@cindex Neon model |
if1 implementation |
The most popular model currently seems to be the Neon model (see |
selector cl1sel1 |
@cite{Object-oriented programming in ANS Forth} (Forth Dimensions, March |
cell% inst-var cl1iv1 |
1997) by Andrew McKewan) but this model has a number of limitations |
|
@footnote{A longer version of this critique can be |
|
found in @cite{On Standardizing Object-Oriented Forth Extensions} (Forth |
|
Dimensions, May 1997) by Anton Ertl.}: |
|
|
' m1 overrides construct |
@itemize @bullet |
' m2 overrides if1sel1 |
@item |
' m3 overrides if1sel2 |
It uses a @code{@emph{selector object}} syntax, which makes it unnatural |
' m4 overrides cl1sel2 |
to pass objects on the stack. |
end-class cl1 |
|
|
|
create obj1 object dict-new drop |
@item |
create obj2 cl1 dict-new drop |
It requires that the selector parses the input stream (at |
@end example |
compile time); this leads to reduced extensibility and to bugs that are+ |
|
hard to find. |
|
|
The data structure created by this code (including the data structure |
@item |
for @code{object}) is shown in the <a |
It allows using every selector to every object; |
href="objects-implementation.eps">figure</a>, assuming a cell size of 4. |
this eliminates the need for classes, but makes it harder to create |
@comment TODO add this diagram.. |
efficient implementations. |
|
@end itemize |
|
|
@node Objects Glossary, , Objects Implementation, Objects |
@cindex Pountain's object-oriented model |
@subsubsection @file{objects.fs} Glossary |
Another well-known publication is @cite{Object-Oriented Forth} (Academic |
@cindex @file{objects.fs} Glossary |
Press, London, 1987) by Dick Pountain. However, it is not really about |
|
object-oriented programming, because it hardly deals with late |
|
binding. Instead, it focuses on features like information hiding and |
|
overloading that are characteristic of modular languages like Ada (83). |
|
|
|
@cindex Zsoter's object-oriented model |
|
In @cite{Does late binding have to be slow?} (Forth Dimensions 18(1) |
|
1996, pages 31-35) Andras Zsoter describes a model that makes heavy use |
|
of an active object (like @code{this} in @file{objects.fs}): The active |
|
object is not only used for accessing all fields, but also specifies the |
|
receiving object of every selector invocation; you have to change the |
|
active object explicitly with @code{@{ ... @}}, whereas in |
|
@file{objects.fs} it changes more or less implicitly at @code{m: |
|
... ;m}. Such a change at the method entry point is unnecessary with the |
|
Zsoter's model, because the receiving object is the active object |
|
already. On the other hand, the explicit change is absolutely necessary |
|
in that model, because otherwise no one could ever change the active |
|
object. An ANS Forth implementation of this model is available at |
|
@uref{http://www.forth.org/fig/oopf.html}. |
|
|
|
@cindex @file{oof.fs}, differences to other models |
|
The @file{oof.fs} model combines information hiding and overloading |
|
resolution (by keeping names in various word lists) with object-oriented |
|
programming. It sets the active object implicitly on method entry, but |
|
also allows explicit changing (with @code{>o...o>} or with |
|
@code{with...endwith}). It uses parsing and state-smart objects and |
|
classes for resolving overloading and for early binding: the object or |
|
class parses the selector and determines the method from this. If the |
|
selector is not parsed by an object or class, it performs a call to the |
|
selector for the active object (late binding), like Zsoter's model. |
|
Fields are always accessed through the active object. The big |
|
disadvantage of this model is the parsing and the state-smartness, which |
|
reduces extensibility and increases the opportunities for subtle bugs; |
|
essentially, you are only safe if you never tick or @code{postpone} an |
|
object or class (Bernd disagrees, but I (Anton) am not convinced). |
|
|
doc---objects-bind |
@cindex @file{mini-oof.fs}, differences to other models |
doc---objects-<bind> |
The @file{mini-oof.fs} model is quite similar to a very stripped-down |
doc---objects-bind' |
version of the @file{objects.fs} model, but syntactically it is a |
doc---objects-[bind] |
mixture of the @file{objects.fs} and @file{oof.fs} models. |
doc---objects-class |
|
doc---objects-class->map |
|
doc---objects-class-inst-size |
|
doc---objects-class-override! |
|
doc---objects-construct |
|
doc---objects-current' |
|
doc---objects-[current] |
|
doc---objects-current-interface |
|
doc---objects-dict-new |
|
doc---objects-drop-order |
|
doc---objects-end-class |
|
doc---objects-end-class-noname |
|
doc---objects-end-interface |
|
doc---objects-end-interface-noname |
|
doc---objects-end-methods |
|
doc---objects-exitm |
|
doc---objects-heap-new |
|
doc---objects-implementation |
|
doc---objects-init-object |
|
doc---objects-inst-value |
|
doc---objects-inst-var |
|
doc---objects-interface |
|
doc---objects-m: |
|
doc---objects-:m |
|
doc---objects-;m |
|
doc---objects-method |
|
doc---objects-methods |
|
doc---objects-object |
|
doc---objects-overrides |
|
doc---objects-[parent] |
|
doc---objects-print |
|
doc---objects-protected |
|
doc---objects-public |
|
doc---objects-push-order |
|
doc---objects-selector |
|
doc---objects-this |
|
doc---objects-<to-inst> |
|
doc---objects-[to-inst] |
|
doc---objects-to-this |
|
doc---objects-xt-new |
|
|
|
|
|
@c ------------------------------------------------------------- |
@c ------------------------------------------------------------- |
@node OOF, Mini-OOF, Objects, Object-oriented Forth |
@node Programming Tools, Assembler and Code Words, Object-oriented Forth, Words |
@subsection The @file{oof.fs} model |
@section Programming Tools |
@cindex oof |
@cindex programming tools |
@cindex object-oriented programming |
|
|
|
@cindex @file{objects.fs} |
@c !! move this and assembler down below OO stuff. |
@cindex @file{oof.fs} |
|
|
|
This section describes the @file{oof.fs} package. |
@menu |
|
* Examining:: |
|
* Forgetting words:: |
|
* Debugging:: Simple and quick. |
|
* Assertions:: Making your programs self-checking. |
|
* Singlestep Debugger:: Executing your program word by word. |
|
@end menu |
|
|
The package described in this section has been used in bigFORTH since 1991, and |
@node Examining, Forgetting words, Programming Tools, Programming Tools |
used for two large applications: a chromatographic system used to |
@subsection Examining data and code |
create new medicaments, and a graphic user interface library (MINOS). |
@cindex examining data and code |
|
@cindex data examination |
|
@cindex code examination |
|
|
You can find a description (in German) of @file{oof.fs} in @cite{Object |
The following words inspect the stack non-destructively: |
oriented bigFORTH} by Bernd Paysan, published in @cite{Vierte Dimension} |
|
10(2), 1994. |
|
|
|
@menu |
doc-.s |
* Properties of the OOF model:: |
doc-f.s |
* Basic OOF Usage:: |
|
* The OOF base class:: |
|
* Class Declaration:: |
|
* Class Implementation:: |
|
@end menu |
|
|
|
@node Properties of the OOF model, Basic OOF Usage, OOF, OOF |
There is a word @code{.r} but it does @i{not} display the return stack! |
@subsubsection Properties of the @file{oof.fs} model |
It is used for formatted numeric output (@pxref{Simple numeric output}). |
@cindex @file{oof.fs} properties |
|
|
|
@itemize @bullet |
doc-depth |
@item |
doc-fdepth |
This model combines object oriented programming with information |
doc-clearstack |
hiding. It helps you writing large application, where scoping is |
|
necessary, because it provides class-oriented scoping. |
|
|
|
@item |
The following words inspect memory. |
Named objects, object pointers, and object arrays can be created, |
|
selector invocation uses the ``object selector'' syntax. Selector invocation |
|
to objects and/or selectors on the stack is a bit less convenient, but |
|
possible. |
|
|
|
@item |
doc-? |
Selector invocation and instance variable usage of the active object is |
doc-dump |
straightforward, since both make use of the active object. |
|
|
|
@item |
And finally, @code{see} allows to inspect code: |
Late binding is efficient and easy to use. |
|
|
|
@item |
doc-see |
State-smart objects parse selectors. However, extensibility is provided |
doc-xt-see |
using a (parsing) selector @code{postpone} and a selector @code{'}. |
|
|
|
@item |
@node Forgetting words, Debugging, Examining, Programming Tools |
An implementation in ANS Forth is available. |
@subsection Forgetting words |
|
@cindex words, forgetting |
|
@cindex forgeting words |
|
|
@end itemize |
@c anton: other, maybe better places for this subsection: Defining Words; |
|
@c Dictionary allocation. At least a reference should be there. |
|
|
|
Forth allows you to forget words (and everything that was alloted in the |
|
dictonary after them) in a LIFO manner. |
|
|
@node Basic OOF Usage, The OOF base class, Properties of the OOF model, OOF |
doc-marker |
@subsubsection Basic @file{oof.fs} Usage |
|
@cindex @file{oof.fs} usage |
|
|
|
This section uses the same example as for @code{objects} (@pxref{Basic Objects Usage}). |
The most common use of this feature is during progam development: when |
|
you change a source file, forget all the words it defined and load it |
|
again (since you also forget everything defined after the source file |
|
was loaded, you have to reload that, too). Note that effects like |
|
storing to variables and destroyed system words are not undone when you |
|
forget words. With a system like Gforth, that is fast enough at |
|
starting up and compiling, I find it more convenient to exit and restart |
|
Gforth, as this gives me a clean slate. |
|
|
You can define a class for graphical objects like this: |
Here's an example of using @code{marker} at the start of a source file |
|
that you are debugging; it ensures that you only ever have one copy of |
|
the file's definitions compiled at any time: |
|
|
@cindex @code{class} usage |
|
@cindex @code{class;} usage |
|
@cindex @code{method} usage |
|
@example |
@example |
object class graphical \ "object" is the parent class |
[IFDEF] my-code |
method draw ( x y graphical -- ) |
my-code |
class; |
[ENDIF] |
@end example |
|
|
|
This code defines a class @code{graphical} with an |
marker my-code |
operation @code{draw}. We can perform the operation |
init-included-files |
@code{draw} on any @code{graphical} object, e.g.: |
|
|
|
@example |
\ .. definitions start here |
100 100 t-rex draw |
\ . |
|
\ . |
|
\ end |
@end example |
@end example |
|
|
@noindent |
|
where @code{t-rex} is an object or object pointer, created with e.g. |
|
@code{graphical : t-rex}. |
|
|
|
@cindex abstract class |
@node Debugging, Assertions, Forgetting words, Programming Tools |
How do we create a graphical object? With the present definitions, |
@subsection Debugging |
we cannot create a useful graphical object. The class |
@cindex debugging |
@code{graphical} describes graphical objects in general, but not |
|
any concrete graphical object type (C++ users would call it an |
|
@emph{abstract class}); e.g., there is no method for the selector |
|
@code{draw} in the class @code{graphical}. |
|
|
|
For concrete graphical objects, we define child classes of the |
Languages with a slow edit/compile/link/test development loop tend to |
class @code{graphical}, e.g.: |
require sophisticated tracing/stepping debuggers to facilate debugging. |
|
|
@example |
A much better (faster) way in fast-compiling languages is to add |
graphical class circle \ "graphical" is the parent class |
printing code at well-selected places, let the program run, look at |
cell var circle-radius |
the output, see where things went wrong, add more printing code, etc., |
how: |
until the bug is found. |
: draw ( x y -- ) |
|
circle-radius @@ draw-circle ; |
|
|
|
: init ( n-radius -- ( |
The simple debugging aids provided in @file{debugs.fs} |
circle-radius ! ; |
are meant to support this style of debugging. |
class; |
|
@end example |
|
|
|
Here we define a class @code{circle} as a child of @code{graphical}, |
The word @code{~~} prints debugging information (by default the source |
with a field @code{circle-radius}; it defines new methods for the |
location and the stack contents). It is easy to insert. If you use Emacs |
selectors @code{draw} and @code{init} (@code{init} is defined in |
it is also easy to remove (@kbd{C-x ~} in the Emacs Forth mode to |
@code{object}, the parent class of @code{graphical}). |
query-replace them with nothing). The deferred words |
|
@code{printdebugdata} and @code{printdebugline} control the output of |
|
@code{~~}. The default source location output format works well with |
|
Emacs' compilation mode, so you can step through the program at the |
|
source level using @kbd{C-x `} (the advantage over a stepping debugger |
|
is that you can step in any direction and you know where the crash has |
|
happened or where the strange data has occurred). |
|
|
Now we can create a circle in the dictionary with: |
doc-~~ |
|
doc-printdebugdata |
|
doc-printdebugline |
|
|
|
@node Assertions, Singlestep Debugger, Debugging, Programming Tools |
|
@subsection Assertions |
|
@cindex assertions |
|
|
|
It is a good idea to make your programs self-checking, especially if you |
|
make an assumption that may become invalid during maintenance (for |
|
example, that a certain field of a data structure is never zero). Gforth |
|
supports @dfn{assertions} for this purpose. They are used like this: |
|
|
@example |
@example |
50 circle : my-circle |
assert( @i{flag} ) |
@end example |
@end example |
|
|
@noindent |
The code between @code{assert(} and @code{)} should compute a flag, that |
@code{:} invokes @code{init}, thus initializing the field |
should be true if everything is alright and false otherwise. It should |
@code{circle-radius} with 50. We can draw this new circle at (100,100) |
not change anything else on the stack. The overall stack effect of the |
with: |
assertion is @code{( -- )}. E.g. |
|
|
@example |
@example |
100 100 my-circle draw |
assert( 1 1 + 2 = ) \ what we learn in school |
|
assert( dup 0<> ) \ assert that the top of stack is not zero |
|
assert( false ) \ this code should not be reached |
@end example |
@end example |
|
|
@cindex selector invocation, restrictions |
The need for assertions is different at different times. During |
@cindex class definition, restrictions |
debugging, we want more checking, in production we sometimes care more |
Note: You can only invoke a selector if the receiving object belongs to |
for speed. Therefore, assertions can be turned off, i.e., the assertion |
the class where the selector was defined or one of its descendents; |
becomes a comment. Depending on the importance of an assertion and the |
e.g., you can invoke @code{draw} only for objects belonging to |
time it takes to check it, you may want to turn off some assertions and |
@code{graphical} or its descendents (e.g., @code{circle}). The scoping |
keep others turned on. Gforth provides several levels of assertions for |
mechanism will check if you try to invoke a selector that is not |
this purpose: |
defined in this class hierarchy, so you'll get an error at compilation |
|
time. |
|
|
|
|
|
@node The OOF base class, Class Declaration, Basic OOF Usage, OOF |
doc-assert0( |
@subsubsection The @file{oof.fs} base class |
doc-assert1( |
@cindex @file{oof.fs} base class |
doc-assert2( |
|
doc-assert3( |
|
doc-assert( |
|
doc-) |
|
|
When you define a class, you have to specify a parent class. So how do |
|
you start defining classes? There is one class available from the start: |
|
@code{object}. You have to use it as ancestor for all classes. It is the |
|
only class that has no parent. Classes are also objects, except that |
|
they don't have instance variables; class manipulation such as |
|
inheritance or changing definitions of a class is handled through |
|
selectors of the class @code{object}. |
|
|
|
@code{object} provides a number of selectors: |
The variable @code{assert-level} specifies the highest assertions that |
|
are turned on. I.e., at the default @code{assert-level} of one, |
|
@code{assert0(} and @code{assert1(} assertions perform checking, while |
|
@code{assert2(} and @code{assert3(} assertions are treated as comments. |
|
|
@itemize @bullet |
The value of @code{assert-level} is evaluated at compile-time, not at |
@item |
run-time. Therefore you cannot turn assertions on or off at run-time; |
@code{class} for subclassing, @code{definitions} to add definitions |
you have to set the @code{assert-level} appropriately before compiling a |
later on, and @code{class?} to get type informations (is the class a |
piece of code. You can compile different pieces of code at different |
subclass of the class passed on the stack?). |
@code{assert-level}s (e.g., a trusted library at level 1 and |
|
newly-written code at level 3). |
|
|
doc---object-class |
|
doc---object-definitions |
doc-assert-level |
doc---object-class? |
|
|
|
|
|
@item |
If an assertion fails, a message compatible with Emacs' compilation mode |
@code{init} and @code{dispose} as constructor and destructor of the |
is produced and the execution is aborted (currently with @code{ABORT"}. |
object. @code{init} is invocated after the object's memory is allocated, |
If there is interest, we will introduce a special throw code. But if you |
while @code{dispose} also handles deallocation. Thus if you redefine |
intend to @code{catch} a specific condition, using @code{throw} is |
@code{dispose}, you have to call the parent's dispose with @code{super |
probably more appropriate than an assertion). |
dispose}, too. |
|
|
|
doc---object-init |
Definitions in ANS Forth for these assertion words are provided |
doc---object-dispose |
in @file{compat/assert.fs}. |
|
|
|
|
@item |
@node Singlestep Debugger, , Assertions, Programming Tools |
@code{new}, @code{new[]}, @code{:}, @code{ptr}, @code{asptr}, and |
@subsection Singlestep Debugger |
@code{[]} to create named and unnamed objects and object arrays or |
@cindex singlestep Debugger |
object pointers. |
@cindex debugging Singlestep |
|
|
doc---object-new |
When you create a new word there's often the need to check whether it |
doc---object-new[] |
behaves correctly or not. You can do this by typing @code{dbg |
doc---object-: |
badword}. A debug session might look like this: |
doc---object-ptr |
|
doc---object-asptr |
|
doc---object-[] |
|
|
|
|
@example |
|
: badword 0 DO i . LOOP ; ok |
|
2 dbg badword |
|
: badword |
|
Scanning code... |
|
|
@item |
Nesting debugger ready! |
@code{::} and @code{super} for explicit scoping. You should use explicit |
|
scoping only for super classes or classes with the same set of instance |
|
variables. Explicitly-scoped selectors use early binding. |
|
|
|
doc---object-:: |
400D4738 8049BC4 0 -> [ 2 ] 00002 00000 |
doc---object-super |
400D4740 8049F68 DO -> [ 0 ] |
|
400D4744 804A0C8 i -> [ 1 ] 00000 |
|
400D4748 400C5E60 . -> 0 [ 0 ] |
|
400D474C 8049D0C LOOP -> [ 0 ] |
|
400D4744 804A0C8 i -> [ 1 ] 00001 |
|
400D4748 400C5E60 . -> 1 [ 0 ] |
|
400D474C 8049D0C LOOP -> [ 0 ] |
|
400D4758 804B384 ; -> ok |
|
@end example |
|
|
|
Each line displayed is one step. You always have to hit return to |
|
execute the next word that is displayed. If you don't want to execute |
|
the next word in a whole, you have to type @kbd{n} for @code{nest}. Here is |
|
an overview what keys are available: |
|
|
@item |
@table @i |
@code{self} to get the address of the object |
|
|
|
doc---object-self |
@item @key{RET} |
|
Next; Execute the next word. |
|
|
|
@item n |
|
Nest; Single step through next word. |
|
|
@item |
@item u |
@code{bind}, @code{bound}, @code{link}, and @code{is} to assign object |
Unnest; Stop debugging and execute rest of word. If we got to this word |
pointers and instance defers. |
with nest, continue debugging with the calling word. |
|
|
doc---object-bind |
@item d |
doc---object-bound |
Done; Stop debugging and execute rest. |
doc---object-link |
|
doc---object-is |
|
|
|
|
@item s |
|
Stop; Abort immediately. |
|
|
@item |
@end table |
@code{'} to obtain selector tokens, @code{send} to invocate selectors |
|
form the stack, and @code{postpone} to generate selector invocation code. |
|
|
|
doc---object-' |
Debugging large application with this mechanism is very difficult, because |
doc---object-postpone |
you have to nest very deeply into the program before the interesting part |
|
begins. This takes a lot of time. |
|
|
|
To do it more directly put a @code{BREAK:} command into your source code. |
|
When program execution reaches @code{BREAK:} the single step debugger is |
|
invoked and you have all the features described above. |
|
|
@item |
If you have more than one part to debug it is useful to know where the |
@code{with} and @code{endwith} to select the active object from the |
program has stopped at the moment. You can do this by the |
stack, and enable its scope. Using @code{with} and @code{endwith} |
@code{BREAK" string"} command. This behaves like @code{BREAK:} except that |
also allows you to create code using selector @code{postpone} without being |
string is typed out when the ``breakpoint'' is reached. |
trapped by the state-smart objects. |
|
|
|
doc---object-with |
|
doc---object-endwith |
|
|
|
|
doc-dbg |
|
doc-break: |
|
doc-break" |
|
|
@end itemize |
|
|
|
@node Class Declaration, Class Implementation, The OOF base class, OOF |
|
@subsubsection Class Declaration |
|
@cindex class declaration |
|
|
|
@itemize @bullet |
@c ------------------------------------------------------------- |
@item |
@node Assembler and Code Words, Threading Words, Programming Tools, Words |
Instance variables |
@section Assembler and Code Words |
|
@cindex assembler |
|
@cindex code words |
|
|
doc---oof-var |
@menu |
|
* Code and ;code:: |
|
* Common Assembler:: Assembler Syntax |
|
* Common Disassembler:: |
|
* 386 Assembler:: Deviations and special cases |
|
* Alpha Assembler:: Deviations and special cases |
|
* MIPS assembler:: Deviations and special cases |
|
* Other assemblers:: How to write them |
|
@end menu |
|
|
|
@node Code and ;code, Common Assembler, Assembler and Code Words, Assembler and Code Words |
|
@subsection @code{Code} and @code{;code} |
|
|
@item |
Gforth provides some words for defining primitives (words written in |
Object pointers |
machine code), and for defining the machine-code equivalent of |
|
@code{DOES>}-based defining words. However, the machine-independent |
|
nature of Gforth poses a few problems: First of all, Gforth runs on |
|
several architectures, so it can provide no standard assembler. What's |
|
worse is that the register allocation not only depends on the processor, |
|
but also on the @code{gcc} version and options used. |
|
|
doc---oof-ptr |
The words that Gforth offers encapsulate some system dependences (e.g., |
doc---oof-asptr |
the header structure), so a system-independent assembler may be used in |
|
Gforth. If you do not have an assembler, you can compile machine code |
|
directly with @code{,} and @code{c,}@footnote{This isn't portable, |
|
because these words emit stuff in @i{data} space; it works because |
|
Gforth has unified code/data spaces. Assembler isn't likely to be |
|
portable anyway.}. |
|
|
|
|
@item |
doc-assembler |
Instance defers |
doc-init-asm |
|
doc-code |
|
doc-end-code |
|
doc-;code |
|
doc-flush-icache |
|
|
doc---oof-defer |
|
|
|
|
If @code{flush-icache} does not work correctly, @code{code} words |
|
etc. will not work (reliably), either. |
|
|
|
The typical usage of these @code{code} words can be shown most easily by |
|
analogy to the equivalent high-level defining words: |
|
|
@item |
@example |
Method selectors |
: foo code foo |
|
<high-level Forth words> <assembler> |
|
; end-code |
|
|
|
: bar : bar |
|
<high-level Forth words> <high-level Forth words> |
|
CREATE CREATE |
|
<high-level Forth words> <high-level Forth words> |
|
DOES> ;code |
|
<high-level Forth words> <assembler> |
|
; end-code |
|
@end example |
|
|
doc---oof-early |
@c anton: the following stuff is also in "Common Assembler", in less detail. |
doc---oof-method |
|
|
@cindex registers of the inner interpreter |
|
In the assembly code you will want to refer to the inner interpreter's |
|
registers (e.g., the data stack pointer) and you may want to use other |
|
registers for temporary storage. Unfortunately, the register allocation |
|
is installation-dependent. |
|
|
|
In particular, @code{ip} (Forth instruction pointer) and @code{rp} |
|
(return stack pointer) are in different places in @code{gforth} and |
|
@code{gforth-fast}. This means that you cannot write a @code{NEXT} |
|
routine that works on both versions; so for doing @code{NEXT}, I |
|
recomment jumping to @code{' noop >code-address}, which contains nothing |
|
but a @code{NEXT}. |
|
|
@item |
For general accesses to the inner interpreter's registers, the easiest |
Class-wide variables |
solution is to use explicit register declarations (@pxref{Explicit Reg |
|
Vars, , Variables in Specified Registers, gcc.info, GNU C Manual}) for |
|
all of the inner interpreter's registers: You have to compile Gforth |
|
with @code{-DFORCE_REG} (configure option @code{--enable-force-reg}) and |
|
the appropriate declarations must be present in the @code{machine.h} |
|
file (see @code{mips.h} for an example; you can find a full list of all |
|
declarable register symbols with @code{grep register engine.c}). If you |
|
give explicit registers to all variables that are declared at the |
|
beginning of @code{engine()}, you should be able to use the other |
|
caller-saved registers for temporary storage. Alternatively, you can use |
|
the @code{gcc} option @code{-ffixed-REG} (@pxref{Code Gen Options, , |
|
Options for Code Generation Conventions, gcc.info, GNU C Manual}) to |
|
reserve a register (however, this restriction on register allocation may |
|
slow Gforth significantly). |
|
|
doc---oof-static |
If this solution is not viable (e.g., because @code{gcc} does not allow |
|
you to explicitly declare all the registers you need), you have to find |
|
out by looking at the code where the inner interpreter's registers |
|
reside and which registers can be used for temporary storage. You can |
|
get an assembly listing of the engine's code with @code{make engine.s}. |
|
|
|
In any case, it is good practice to abstract your assembly code from the |
|
actual register allocation. E.g., if the data stack pointer resides in |
|
register @code{$17}, create an alias for this register called @code{sp}, |
|
and use that in your assembly code. |
|
|
@item |
@cindex code words, portable |
End declaration |
Another option for implementing normal and defining words efficiently |
|
is to add the desired functionality to the source of Gforth. For normal |
|
words you just have to edit @file{primitives} (@pxref{Automatic |
|
Generation}). Defining words (equivalent to @code{;CODE} words, for fast |
|
defined words) may require changes in @file{engine.c}, @file{kernel.fs}, |
|
@file{prims2x.fs}, and possibly @file{cross.fs}. |
|
|
doc---oof-how: |
@node Common Assembler, Common Disassembler, Code and ;code, Assembler and Code Words |
doc---oof-class; |
@subsection Common Assembler |
|
|
|
The assemblers in Gforth generally use a postfix syntax, i.e., the |
|
instruction name follows the operands. |
|
|
@end itemize |
The operands are passed in the usual order (the same that is used in the |
|
manual of the architecture). Since they all are Forth words, they have |
|
to be separated by spaces; you can also use Forth words to compute the |
|
operands. |
|
|
@c ------------------------------------------------------------- |
The instruction names usually end with a @code{,}. This makes it easier |
@node Class Implementation, , Class Declaration, OOF |
to visually separate instructions if you put several of them on one |
@subsubsection Class Implementation |
line; it also avoids shadowing other Forth words (e.g., @code{and}). |
@cindex class implementation |
|
|
|
@c ------------------------------------------------------------- |
Registers are usually specified by number; e.g., (decimal) @code{11} |
@node Mini-OOF, Comparison with other object models, OOF, Object-oriented Forth |
specifies registers R11 and F11 on the Alpha architecture (which one, |
@subsection The @file{mini-oof.fs} model |
depends on the instruction). The usual names are also available, e.g., |
@cindex mini-oof |
@code{s2} for R11 on Alpha. |
|
|
Gforth's third object oriented Forth package is a 12-liner. It uses a |
Control flow is specified similar to normal Forth code (@pxref{Arbitrary |
mixture of the @file{object.fs} and the @file{oof.fs} syntax, |
control structures}), with @code{if,}, @code{ahead,}, @code{then,}, |
and reduces to the bare minimum of features. This is based on a posting |
@code{begin,}, @code{until,}, @code{again,}, @code{cs-roll}, |
of Bernd Paysan in comp.lang.forth. |
@code{cs-pick}, @code{else,}, @code{while,}, and @code{repeat,}. The |
|
conditions are specified in a way specific to each assembler. |
|
|
@menu |
Note that the register assignments of the Gforth engine can change |
* Basic Mini-OOF Usage:: |
between Gforth versions, or even between different compilations of the |
* Mini-OOF Example:: |
same Gforth version (e.g., if you use a different GCC version). So if |
* Mini-OOF Implementation:: |
you want to refer to Gforth's registers (e.g., the stack pointer or |
@end menu |
TOS), I recommend defining your own words for refering to these |
|
registers, and using them later on; then you can easily adapt to a |
|
changed register assignment. The stability of the register assignment |
|
is usually better if you build Gforth with @code{--enable-force-reg}. |
|
|
@c ------------------------------------------------------------- |
In particular, the return stack pointer and the instruction pointer are |
@node Basic Mini-OOF Usage, Mini-OOF Example, Mini-OOF, Mini-OOF |
in memory in @code{gforth}, and usually in registers in |
@subsubsection Basic @file{mini-oof.fs} Usage |
@code{gforth-fast}. The most common use of these registers is to |
@cindex mini-oof usage |
dispatch to the next word (the @code{next} routine). A portable way to |
|
do this is to jump to @code{' noop >code-address} (of course, this is |
|
less efficient than integrating the @code{next} code and scheduling it |
|
well). |
|
|
There is a base class (@code{class}, which allocates one cell for the |
@node Common Disassembler, 386 Assembler, Common Assembler, Assembler and Code Words |
object pointer) plus seven other words: to define a method, a variable, |
@subsection Common Disassembler |
a class; to end a class, to resolve binding, to allocate an object and |
|
to compile a class method. |
|
@comment TODO better description of the last one |
|
|
|
|
You can disassemble a @code{code} word with @code{see} |
|
(@pxref{Debugging}). You can disassemble a section of memory with |
|
|
doc-object |
doc-disasm |
doc-method |
|
doc-var |
|
doc-class |
|
doc-end-class |
|
doc-defines |
|
doc-new |
|
doc-:: |
|
|
|
|
The disassembler generally produces output that can be fed into the |
|
assembler (i.e., same syntax, etc.). It also includes additional |
|
information in comments. In particular, the address of the instruction |
|
is given in a comment before the instruction. |
|
|
|
@code{See} may display more or less than the actual code of the word, |
|
because the recognition of the end of the code is unreliable. You can |
|
use @code{disasm} if it did not display enough. It may display more, if |
|
the code word is not immediately followed by a named word. If you have |
|
something else there, you can follow the word with @code{align last @ ,} |
|
to ensure that the end is recognized. |
|
|
@c ------------------------------------------------------------- |
@node 386 Assembler, Alpha Assembler, Common Disassembler, Assembler and Code Words |
@node Mini-OOF Example, Mini-OOF Implementation, Basic Mini-OOF Usage, Mini-OOF |
@subsection 386 Assembler |
@subsubsection Mini-OOF Example |
|
@cindex mini-oof example |
|
|
|
A short example shows how to use this package. This example, in slightly |
The 386 assembler included in Gforth was written by Bernd Paysan, it's |
extended form, is supplied as @file{moof-exm.fs} |
available under GPL, and originally part of bigFORTH. |
@comment TODO could flesh this out with some comments from the Forthwrite article |
|
|
|
@example |
The 386 disassembler included in Gforth was written by Andrew McKewan |
object class |
and is in the public domain. |
method init |
|
method draw |
|
end-class graphical |
|
@end example |
|
|
|
This code defines a class @code{graphical} with an |
The disassembler displays code in prefix Intel syntax. |
operation @code{draw}. We can perform the operation |
|
@code{draw} on any @code{graphical} object, e.g.: |
|
|
|
@example |
The assembler uses a postfix syntax with reversed parameters. |
100 100 t-rex draw |
|
@end example |
|
|
|
where @code{t-rex} is an object or object pointer, created with e.g. |
The assembler includes all instruction of the Athlon, i.e. 486 core |
@code{graphical new Constant t-rex}. |
instructions, Pentium and PPro extensions, floating point, MMX, 3Dnow!, |
|
but not ISSE. It's an integrated 16- and 32-bit assembler. Default is 32 |
|
bit, you can switch to 16 bit with .86 and back to 32 bit with .386. |
|
|
For concrete graphical objects, we define child classes of the |
There are several prefixes to switch between different operation sizes, |
class @code{graphical}, e.g.: |
@code{.b} for byte accesses, @code{.w} for word accesses, @code{.d} for |
|
double-word accesses. Addressing modes can be switched with @code{.wa} |
|
for 16 bit addresses, and @code{.da} for 32 bit addresses. You don't |
|
need a prefix for byte register names (@code{AL} et al). |
|
|
@example |
For floating point operations, the prefixes are @code{.fs} (IEEE |
graphical class |
single), @code{.fl} (IEEE double), @code{.fx} (extended), @code{.fw} |
cell var circle-radius |
(word), @code{.fd} (double-word), and @code{.fq} (quad-word). |
end-class circle \ "graphical" is the parent class |
|
|
|
:noname ( x y -- ) |
The MMX opcodes don't have size prefixes, they are spelled out like in |
circle-radius @@ draw-circle ; circle defines draw |
the Intel assembler. Instead of move from and to memory, there are |
:noname ( r -- ) |
PLDQ/PLDD and PSTQ/PSTD. |
circle-radius ! ; circle defines init |
|
@end example |
|
|
|
There is no implicit init method, so we have to define one. The creation |
The registers lack the 'e' prefix; even in 32 bit mode, eax is called |
code of the object now has to call init explicitely. |
ax. Immediate values are indicated by postfixing them with @code{#}, |
|
e.g., @code{3 #}. Here are some examples of addressing modes: |
|
|
@example |
@example |
circle new Constant my-circle |
3 # \ immediate |
50 my-circle init |
ax \ register |
|
100 di d) \ 100[edi] |
|
4 bx cx di) \ 4[ebx][ecx] |
|
di ax *4 i) \ [edi][eax*4] |
|
20 ax *4 i#) \ 20[eax*4] |
@end example |
@end example |
|
|
It is also possible to add a function to create named objects with |
Some example of instructions are: |
automatic call of @code{init}, given that all objects have @code{init} |
|
on the same place: |
|
|
|
@example |
@example |
: new: ( .. o "name" -- ) |
ax bx mov \ move ebx,eax |
new dup Constant init ; |
3 # ax mov \ mov eax,3 |
80 circle new: large-circle |
100 di ) ax mov \ mov eax,100[edi] |
|
4 bx cx di) ax mov \ mov eax,4[ebx][ecx] |
|
.w ax bx mov \ mov bx,ax |
@end example |
@end example |
|
|
We can draw this new circle at (100,100) with: |
The following forms are supported for binary instructions: |
|
|
@example |
@example |
100 100 my-circle draw |
<reg> <reg> <inst> |
|
<n> # <reg> <inst> |
|
<mem> <reg> <inst> |
|
<reg> <mem> <inst> |
@end example |
@end example |
|
|
@node Mini-OOF Implementation, , Mini-OOF Example, Mini-OOF |
Immediate to memory is not supported. The shift/rotate syntax is: |
@subsubsection @file{mini-oof.fs} Implementation |
|
|
|
Object-oriented systems with late binding typically use a |
|
``vtable''-approach: the first variable in each object is a pointer to a |
|
table, which contains the methods as function pointers. The vtable |
|
may also contain other information. |
|
|
|
So first, let's declare methods: |
|
|
|
@example |
@example |
: method ( m v -- m' v ) Create over , swap cell+ swap |
<reg/mem> 1 # shl \ shortens to shift without immediate |
DOES> ( ... o -- ... ) @@ over @@ + @@ execute ; |
<reg/mem> 4 # shl |
|
<reg/mem> cl shl |
@end example |
@end example |
|
|
During method declaration, the number of methods and instance |
Precede string instructions (@code{movs} etc.) with @code{.b} to get |
variables is on the stack (in address units). @code{method} creates |
the byte version. |
one method and increments the method number. To execute a method, it |
|
takes the object, fetches the vtable pointer, adds the offset, and |
|
executes the @i{xt} stored there. Each method takes the object it is |
|
invoked from as top of stack parameter. The method itself should |
|
consume that object. |
|
|
|
Now, we also have to declare instance variables |
|
|
|
@example |
|
: var ( m v size -- m v' ) Create over , + |
|
DOES> ( o -- addr ) @@ + ; |
|
@end example |
|
|
|
As before, a word is created with the current offset. Instance |
The control structure words @code{IF} @code{UNTIL} etc. must be preceded |
variables can have different sizes (cells, floats, doubles, chars), so |
by one of these conditions: @code{vs vc u< u>= 0= 0<> u<= u> 0< 0>= ps |
all we do is take the size and add it to the offset. If your machine |
pc < >= <= >}. (Note that most of these words shadow some Forth words |
has alignment restrictions, put the proper @code{aligned} or |
when @code{assembler} is in front of @code{forth} in the search path, |
@code{faligned} before the variable, to adjust the variable |
e.g., in @code{code} words). Currently the control structure words use |
offset. That's why it is on the top of stack. |
one stack item, so you have to use @code{roll} instead of @code{cs-roll} |
|
to shuffle them (you can also use @code{swap} etc.). |
|
|
We need a starting point (the base object) and some syntactic sugar: |
Here is an example of a @code{code} word (assumes that the stack pointer |
|
is in esi and the TOS is in ebx): |
|
|
@example |
@example |
Create object 1 cells , 2 cells , |
code my+ ( n1 n2 -- n ) |
: class ( class -- class methods vars ) dup 2@@ ; |
4 si D) bx add |
|
4 # si add |
|
Next |
|
end-code |
@end example |
@end example |
|
|
For inheritance, the vtable of the parent object has to be |
@node Alpha Assembler, MIPS assembler, 386 Assembler, Assembler and Code Words |
copied when a new, derived class is declared. This gives all the |
@subsection Alpha Assembler |
methods of the parent class, which can be overridden, though. |
|
|
|
@example |
The Alpha assembler and disassembler were originally written by Bernd |
: end-class ( class methods vars -- ) |
Thallner. |
Create here >r , dup , 2 cells ?DO ['] noop , 1 cells +LOOP |
|
cell+ dup cell+ r> rot @@ 2 cells /string move ; |
|
@end example |
|
|
|
The first line creates the vtable, initialized with |
The register names @code{a0}--@code{a5} are not available to avoid |
@code{noop}s. The second line is the inheritance mechanism, it |
shadowing hex numbers. |
copies the xts from the parent vtable. |
|
|
|
We still have no way to define new methods, let's do that now: |
Immediate forms of arithmetic instructions are distinguished by a |
|
@code{#} just before the @code{,}, e.g., @code{and#,} (note: @code{lda,} |
|
does not count as arithmetic instruction). |
|
|
@example |
You have to specify all operands to an instruction, even those that |
: defines ( xt class -- ) ' >body @@ + ! ; |
other assemblers consider optional, e.g., the destination register for |
@end example |
@code{br,}, or the destination register and hint for @code{jmp,}. |
|
|
To allocate a new object, we need a word, too: |
You can specify conditions for @code{if,} by removing the first @code{b} |
|
and the trailing @code{,} from a branch with a corresponding name; e.g., |
|
|
@example |
@example |
: new ( class -- o ) here over @@ allot swap over ! ; |
11 fgt if, \ if F11>0e |
|
... |
|
endif, |
@end example |
@end example |
|
|
Sometimes derived classes want to access the method of the |
@code{fbgt,} gives @code{fgt}. |
parent object. There are two ways to achieve this with Mini-OOF: |
|
first, you could use named words, and second, you could look up the |
|
vtable of the parent object. |
|
|
|
@example |
@node MIPS assembler, Other assemblers, Alpha Assembler, Assembler and Code Words |
: :: ( class "name" -- ) ' >body @@ + @@ compile, ; |
@subsection MIPS assembler |
@end example |
|
|
|
|
The MIPS assembler was originally written by Christian Pirker. |
|
|
Nothing can be more confusing than a good example, so here is |
Currently the assembler and disassembler only cover the MIPS-I |
one. First let's declare a text object (called |
architecture (R3000), and don't support FP instructions. |
@code{button}), that stores text and position: |
|
|
|
@example |
The register names @code{$a0}--@code{$a3} are not available to avoid |
object class |
shadowing hex numbers. |
cell var text |
|
cell var len |
|
cell var x |
|
cell var y |
|
method init |
|
method draw |
|
end-class button |
|
@end example |
|
|
|
@noindent |
Because there is no way to distinguish registers from immediate values, |
Now, implement the two methods, @code{draw} and @code{init}: |
you have to explicitly use the immediate forms of instructions, i.e., |
|
@code{addiu,}, not just @code{addu,} (@command{as} does this |
|
implicitly). |
|
|
@example |
If the architecture manual specifies several formats for the instruction |
:noname ( o -- ) |
(e.g., for @code{jalr,}), you usually have to use the one with more |
>r r@@ x @@ r@@ y @@ at-xy r@@ text @@ r> len @@ type ; |
arguments (i.e., two for @code{jalr,}). When in doubt, see |
button defines draw |
@code{arch/mips/testasm.fs} for an example of correct use. |
:noname ( addr u o -- ) |
|
>r 0 r@@ x ! 0 r@@ y ! r@@ len ! r> text ! ; |
|
button defines init |
|
@end example |
|
|
|
@noindent |
Branches and jumps in the MIPS architecture have a delay slot. You have |
To demonstrate inheritance, we define a class @code{bold-button}, with no |
to fill it yourself (the simplest way is to use @code{nop,}), the |
new data and no new methods: |
assembler does not do it for you (unlike @command{as}). Even |
|
@code{if,}, @code{ahead,}, @code{until,}, @code{again,}, @code{while,}, |
|
@code{else,} and @code{repeat,} need a delay slot. Since @code{begin,} |
|
and @code{then,} just specify branch targets, they are not affected. |
|
|
@example |
Note that you must not put branches, jumps, or @code{li,} into the delay |
button class |
slot: @code{li,} may expand to several instructions, and control flow |
end-class bold-button |
instructions may not be put into the branch delay slot in any case. |
|
|
: bold 27 emit ." [1m" ; |
For branches the argument specifying the target is a relative address; |
: normal 27 emit ." [0m" ; |
You have to add the address of the delay slot to get the absolute |
@end example |
address. |
|
|
@noindent |
The MIPS architecture also has load delay slots and restrictions on |
The class @code{bold-button} has a different draw method to |
using @code{mfhi,} and @code{mflo,}; you have to order the instructions |
@code{button}, but the new method is defined in terms of the draw method |
yourself to satisfy these restrictions, the assembler does not do it for |
for @code{button}: |
you. |
|
|
|
You can specify the conditions for @code{if,} etc. by taking a |
|
conditional branch and leaving away the @code{b} at the start and the |
|
@code{,} at the end. E.g., |
|
|
@example |
@example |
:noname bold [ button :: draw ] normal ; bold-button defines draw |
4 5 eq if, |
|
... \ do something if $4 equals $5 |
|
then, |
@end example |
@end example |
|
|
@noindent |
@node Other assemblers, , MIPS assembler, Assembler and Code Words |
Finally, create two objects and apply methods: |
@subsection Other assemblers |
|
|
@example |
If you want to contribute another assembler/disassembler, please contact |
button new Constant foo |
us (@email{bug-gforth@@gnu.org}) to check if we have such an assembler |
s" thin foo" foo init |
already. If you are writing them from scratch, please use a similar |
page |
syntax style as the one we use (i.e., postfix, commas at the end of the |
foo draw |
instruction names, @pxref{Common Assembler}); make the output of the |
bold-button new Constant bar |
disassembler be valid input for the assembler, and keep the style |
s" fat bar" bar init |
similar to the style we used. |
1 bar y ! |
|
bar draw |
|
@end example |
|
|
|
|
Hints on implementation: The most important part is to have a good test |
|
suite that contains all instructions. Once you have that, the rest is |
|
easy. For actual coding you can take a look at |
|
@file{arch/mips/disasm.fs} to get some ideas on how to use data for both |
|
the assembler and disassembler, avoiding redundancy and some potential |
|
bugs. You can also look at that file (and @pxref{Advanced does> usage |
|
example}) to get ideas how to factor a disassembler. |
|
|
@node Comparison with other object models, , Mini-OOF, Object-oriented Forth |
Start with the disassembler, because it's easier to reuse data from the |
@subsection Comparison with other object models |
disassembler for the assembler than the other way round. |
@cindex comparison of object models |
|
@cindex object models, comparison |
|
|
|
Many object-oriented Forth extensions have been proposed (@cite{A survey |
For the assembler, take a look at @file{arch/alpha/asm.fs}, which shows |
of object-oriented Forths} (SIGPLAN Notices, April 1996) by Bradford |
how simple it can be. |
J. Rodriguez and W. F. S. Poehlman lists 17). This section discusses the |
|
relation of the object models described here to two well-known and two |
|
closely-related (by the use of method maps) models. |
|
|
|
@cindex Neon model |
@c ------------------------------------------------------------- |
The most popular model currently seems to be the Neon model (see |
@node Threading Words, Passing Commands to the OS, Assembler and Code Words, Words |
@cite{Object-oriented programming in ANS Forth} (Forth Dimensions, March |
@section Threading Words |
1997) by Andrew McKewan) but this model has a number of limitations |
@cindex threading words |
@footnote{A longer version of this critique can be |
|
found in @cite{On Standardizing Object-Oriented Forth Extensions} (Forth |
|
Dimensions, May 1997) by Anton Ertl.}: |
|
|
|
@itemize @bullet |
@cindex code address |
@item |
These words provide access to code addresses and other threading stuff |
It uses a @code{@emph{selector object}} syntax, which makes it unnatural |
in Gforth (and, possibly, other interpretive Forths). It more or less |
to pass objects on the stack. |
abstracts away the differences between direct and indirect threading |
|
(and, for direct threading, the machine dependences). However, at |
|
present this wordset is still incomplete. It is also pretty low-level; |
|
some day it will hopefully be made unnecessary by an internals wordset |
|
that abstracts implementation details away completely. |
|
|
@item |
The terminology used here stems from indirect threaded Forth systems; in |
It requires that the selector parses the input stream (at |
such a system, the XT of a word is represented by the CFA (code field |
compile time); this leads to reduced extensibility and to bugs that are+ |
address) of a word; the CFA points to a cell that contains the code |
hard to find. |
address. The code address is the address of some machine code that |
|
performs the run-time action of invoking the word (e.g., the |
|
@code{dovar:} routine pushes the address of the body of the word (a |
|
variable) on the stack |
|
). |
|
|
@item |
@cindex code address |
It allows using every selector to every object; |
@cindex code field address |
this eliminates the need for classes, but makes it harder to create |
In an indirect threaded Forth, you can get the code address of @i{name} |
efficient implementations. |
with @code{' @i{name} @@}; in Gforth you can get it with @code{' @i{name} |
@end itemize |
>code-address}, independent of the threading method. |
|
|
@cindex Pountain's object-oriented model |
doc-threading-method |
Another well-known publication is @cite{Object-Oriented Forth} (Academic |
doc->code-address |
Press, London, 1987) by Dick Pountain. However, it is not really about |
doc-code-address! |
object-oriented programming, because it hardly deals with late |
|
binding. Instead, it focuses on features like information hiding and |
|
overloading that are characteristic of modular languages like Ada (83). |
|
|
|
@cindex Zsoter's object-oriented model |
@cindex @code{does>}-handler |
In @cite{Does late binding have to be slow?} (Forth Dimensions 18(1) |
@cindex @code{does>}-code |
1996, pages 31-35) Andras Zsoter describes a model that makes heavy use |
For a word defined with @code{DOES>}, the code address usually points to |
of an active object (like @code{this} in @file{objects.fs}): The active |
a jump instruction (the @dfn{does-handler}) that jumps to the dodoes |
object is not only used for accessing all fields, but also specifies the |
routine (in Gforth on some platforms, it can also point to the dodoes |
receiving object of every selector invocation; you have to change the |
routine itself). What you are typically interested in, though, is |
active object explicitly with @code{@{ ... @}}, whereas in |
whether a word is a @code{DOES>}-defined word, and what Forth code it |
@file{objects.fs} it changes more or less implicitly at @code{m: |
executes; @code{>does-code} tells you that. |
... ;m}. Such a change at the method entry point is unnecessary with the |
|
Zsoter's model, because the receiving object is the active object |
|
already. On the other hand, the explicit change is absolutely necessary |
|
in that model, because otherwise no one could ever change the active |
|
object. An ANS Forth implementation of this model is available at |
|
@uref{http://www.forth.org/fig/oopf.html}. |
|
|
|
@cindex @file{oof.fs}, differences to other models |
doc->does-code |
The @file{oof.fs} model combines information hiding and overloading |
|
resolution (by keeping names in various word lists) with object-oriented |
|
programming. It sets the active object implicitly on method entry, but |
|
also allows explicit changing (with @code{>o...o>} or with |
|
@code{with...endwith}). It uses parsing and state-smart objects and |
|
classes for resolving overloading and for early binding: the object or |
|
class parses the selector and determines the method from this. If the |
|
selector is not parsed by an object or class, it performs a call to the |
|
selector for the active object (late binding), like Zsoter's model. |
|
Fields are always accessed through the active object. The big |
|
disadvantage of this model is the parsing and the state-smartness, which |
|
reduces extensibility and increases the opportunities for subtle bugs; |
|
essentially, you are only safe if you never tick or @code{postpone} an |
|
object or class (Bernd disagrees, but I (Anton) am not convinced). |
|
|
|
@cindex @file{mini-oof.fs}, differences to other models |
To create a @code{DOES>}-defined word with the following basic words, |
The @file{mini-oof.fs} model is quite similar to a very stripped-down |
you have to set up a @code{DOES>}-handler with @code{does-handler!}; |
version of the @file{objects.fs} model, but syntactically it is a |
@code{/does-handler} aus behind you have to place your executable Forth |
mixture of the @file{objects.fs} and @file{oof.fs} models. |
code. Finally you have to create a word and modify its behaviour with |
|
@code{does-handler!}. |
|
|
|
doc-does-code! |
|
doc-does-handler! |
|
doc-/does-handler |
|
|
|
The code addresses produced by various defining words are produced by |
|
the following words: |
|
|
|
doc-docol: |
|
doc-docon: |
|
doc-dovar: |
|
doc-douser: |
|
doc-dodefer: |
|
doc-dofield: |
|
|
@c ------------------------------------------------------------- |
@c ------------------------------------------------------------- |
@node Passing Commands to the OS, Keeping track of Time, Object-oriented Forth, Words |
@node Passing Commands to the OS, Keeping track of Time, Threading Words, Words |
@section Passing Commands to the Operating System |
@section Passing Commands to the Operating System |
@cindex operating system - passing commands |
@cindex operating system - passing commands |
@cindex shell commands |
@cindex shell commands |
Line 11827 from primitives (e.g., invalid memory ad
|
Line 11862 from primitives (e.g., invalid memory ad
|
@code{gforth-fast} is only able to do a return stack dump from a |
@code{gforth-fast} is only able to do a return stack dump from a |
directly called @code{throw} (including @code{abort} etc.). This is the |
directly called @code{throw} (including @code{abort} etc.). This is the |
only difference (apart from a speed factor of between 1.15 (K6-2) and |
only difference (apart from a speed factor of between 1.15 (K6-2) and |
1.6 (21164A)) between @code{gforth} and @code{gforth-fast}. Given an |
2 (21264)) between @code{gforth} and @code{gforth-fast}. Given an |
exception caused by a primitive in @code{gforth-fast}, you will |
exception caused by a primitive in @code{gforth-fast}, you will |
typically see no return stack dump at all; however, if the exception is |
typically see no return stack dump at all; however, if the exception is |
caught by @code{catch} (e.g., for restoring some state), and then |
caught by @code{catch} (e.g., for restoring some state), and then |