version 1.2, 1994/11/14 19:01:16
|
version 1.3, 1994/11/23 16:54:39
|
Line 746 entry represents a backward branch targe
|
Line 746 entry represents a backward branch targe
|
building any control structure possible (except control structures that |
building any control structure possible (except control structures that |
need storage, like calls, coroutines, and backtracking). |
need storage, like calls, coroutines, and backtracking). |
|
|
if |
doc-if |
ahead |
doc-ahead |
then |
doc-then |
begin |
doc-begin |
until |
doc-until |
again |
doc-again |
cs-pick |
doc-cs-pick |
cs-roll |
doc-cs-roll |
|
|
On many systems control-flow stack items take one word, in gforth they |
On many systems control-flow stack items take one word, in gforth they |
currently take three (this may change in the future). Therefore it is a |
currently take three (this may change in the future). Therefore it is a |
Line 763 words.
|
Line 763 words.
|
|
|
Some standard control structure words are built from these words: |
Some standard control structure words are built from these words: |
|
|
else |
doc-else |
while |
doc-while |
repeat |
doc-repeat |
|
|
Counted loop words constitute a separate group of words: |
Counted loop words constitute a separate group of words: |
|
|
?do |
doc-?do |
do |
doc-do |
for |
doc-for |
loop |
doc-loop |
s+loop |
doc-s+loop |
+loop |
doc-+loop |
next |
doc-next |
leave |
doc-leave |
?leave |
doc-?leave |
unloop |
doc-unloop |
undo |
doc-undo |
|
|
The standard does not allow using @code{cs-pick} and @code{cs-roll} on |
The standard does not allow using @code{cs-pick} and @code{cs-roll} on |
@i{do-sys}. Our system allows it, but it's your job to ensure that for |
@i{do-sys}. Our system allows it, but it's your job to ensure that for |
every @code{?DO} etc. there is exactly one @code{UNLOOP} on any path |
every @code{?DO} etc. there is exactly one @code{UNLOOP} on any path |
through the program (@code{LOOP} etc. compile an @code{UNLOOP}). Also, |
through the definition (@code{LOOP} etc. compile an @code{UNLOOP} on the |
you have to ensure that all @code{LEAVE}s are resolved (by using one of |
fall-through path). Also, you have to ensure that all @code{LEAVE}s are |
the loop-ending words or @code{UNDO}). |
resolved (by using one of the loop-ending words or @code{UNDO}). |
|
|
Another group of control structure words are |
Another group of control structure words are |
|
|
case |
doc-case |
endcase |
doc-endcase |
of |
doc-of |
endof |
doc-endof |
|
|
@i{case-sys} and @i{of-sys} cannot be processed using @code{cs-pick} and |
@i{case-sys} and @i{of-sys} cannot be processed using @code{cs-pick} and |
@code{cs-roll}. |
@code{cs-roll}. |
|
|
|
@subsubsection Programming Style |
|
|
|
In order to ensure readability we recommend that you do not create |
|
arbitrary control structures directly, but define new control structure |
|
words for the control structure you want and use these words in your |
|
program. |
|
|
|
E.g., instead of writing |
|
|
|
@example |
|
begin |
|
... |
|
if [ 1 cs-roll ] |
|
... |
|
again then |
|
@end example |
|
|
|
we recommend defining control structure words, e.g., |
|
|
|
@example |
|
: while ( dest -- orig dest ) |
|
POSTPONE if |
|
1 cs-roll ; immediate |
|
|
|
: repeat ( orig dest -- ) |
|
POSTPONE again |
|
POSTPONE then ; immediate |
|
@end example |
|
|
|
and then using these to create the control structure: |
|
|
|
@example |
|
begin |
|
... |
|
while |
|
... |
|
repeat |
|
@end example |
|
|
|
That's much easier to read, isn't it? Of course, @code{BEGIN} and |
|
@code{WHILE} are predefined, so in this example it would not be |
|
necessary to define them. |
|
|
|
@subsection Calls and returns |
|
|
|
A definition can be called simply be writing the name of the |
|
definition. When the end of the definition is reached, it returns. An earlier return can be forced using |
|
|
|
doc-exit |
|
|
|
Don't forget to clean up the return stack and @code{UNLOOP} any |
|
outstanding @code{?DO}...@code{LOOP}s before @code{EXIT}ing. The |
|
primitive compiled by @code{EXIT} is |
|
|
|
doc-;s |
|
|
|
@subsection Exception Handling |
|
|
|
doc-catch |
|
doc-throw |
|
|
@node Locals |
@node Locals |
@section Locals |
@section Locals |
|
|
Line 923 says. If @code{UNREACHABLE} is used wher
|
Line 984 says. If @code{UNREACHABLE} is used wher
|
lie to the compiler), buggy code will be produced. |
lie to the compiler), buggy code will be produced. |
|
|
Another problem with this rule is that at @code{BEGIN}, the compiler |
Another problem with this rule is that at @code{BEGIN}, the compiler |
does not know which locals will be visible on the incoming back-edge |
does not know which locals will be visible on the incoming |
. All problems discussed in the following are due to this ignorance of |
back-edge. All problems discussed in the following are due to this |
the compiler (we discuss the problems using @code{BEGIN} loops as |
ignorance of the compiler (we discuss the problems using @code{BEGIN} |
examples; the discussion also applies to @code{?DO} and other |
loops as examples; the discussion also applies to @code{?DO} and other |
loops). Perhaps the most insidious example is: |
loops). Perhaps the most insidious example is: |
@example |
@example |
AHEAD |
AHEAD |
Line 1299 programs harder to read, and easier to m
|
Line 1360 programs harder to read, and easier to m
|
merit of this syntax is that it is easy to implement using the ANS Forth |
merit of this syntax is that it is easy to implement using the ANS Forth |
locals wordset. |
locals wordset. |
|
|
|
@node Internals |
|
@chapter Internals |
|
|
|
Reading this section is not necessary for programming with gforth. It |
|
should be helpful for finding your way in the gforth sources. |
|
|
|
@section Portability |
|
|
|
One of the main goals of the effort is availability across a wide range |
|
of personal machines. fig-Forth, and, to a lesser extent, F83, achieved |
|
this goal by manually coding the engine in assembly language for several |
|
then-popular processors. This approach is very labor-intensive and the |
|
results are short-lived due to progress in computer architecture. |
|
|
|
Others have avoided this problem by coding in C, e.g., Mitch Bradley |
|
(cforth), Mikael Patel (TILE) and Dirk Zoller (pfe). This approach is |
|
particularly popular for UNIX-based Forths due to the large variety of |
|
architectures of UNIX machines. Unfortunately an implementation in C |
|
does not mix well with the goals of efficiency and with using |
|
traditional techniques: Indirect or direct threading cannot be expressed |
|
in C, and switch threading, the fastest technique available in C, is |
|
significantly slower. Another problem with C is that it's very |
|
cumbersome to express double integer arithmetic. |
|
|
|
Fortunately, there is a portable language that does not have these |
|
limitations: GNU C, the version of C processed by the GNU C compiler |
|
(@pxref{C Extensions, , Extensions to the C Language Family, gcc.info, |
|
GNU C Manual}). Its labels as values feature (@pxref{Labels as Values, , |
|
Labels as Values, gcc.info, GNU C Manual}) makes direct and indirect |
|
threading possible, its @code{long long} type (@pxref{Long Long, , |
|
Double-Word Integers, gcc.info, GNU C Manual}) corresponds to Forths |
|
double numbers. GNU C is available for free on all important (and many |
|
unimportant) UNIX machines, VMS, 80386s running MS-DOS, the Amiga, and |
|
the Atari ST, so a Forth written in GNU C can run on all these |
|
machines@footnote{Due to Apple's look-and-feel lawsuit it is not |
|
available on the Mac (@pxref{Boycott, , Protect Your Freedom--Fight |
|
``Look And Feel'', gcc.info, GNU C Manual}).}. |
|
|
|
Writing in a portable language has the reputation of producing code that |
|
is slower than assembly. For our Forth engine we repeatedly looked at |
|
the code produced by the compiler and eliminated most compiler-induced |
|
inefficiencies by appropriate changes in the source-code. |
|
|
|
However, register allocation cannot be portably influenced by the |
|
programmer, leading to some inefficiencies on register-starved |
|
machines. We use explicit register declarations (@pxref{Explicit Reg |
|
Vars, , Variables in Specified Registers, gcc.info, GNU C Manual}) to |
|
improve the speed on some machines. They are turned on by using the |
|
@code{gcc} switch @code{-DFORCE_REG}. Unfortunately, this feature not |
|
only depends on the machine, but also on the compiler version: On some |
|
machines some compiler versions produce incorrect code when certain |
|
explicit register declarations are used. So by default |
|
@code{-DFORCE_REG} is not used. |
|
|
|
@section Threading |
|
|
|
GNU C's labels as values extension (available since @code{gcc-2.0}, |
|
@pxref{Labels as Values, , Labels as Values, gcc.info, GNU C Manual}) |
|
makes it possible to take the address of @var{label} by writing |
|
@code{&&@var{label}}. This address can then be used in a statement like |
|
@code{goto *@var{address}}. I.e., @code{goto *&&x} is the same as |
|
@code{goto x}. |
|
|
|
With this feature an indirect threaded NEXT looks like: |
|
@example |
|
cfa = *ip++; |
|
ca = *cfa; |
|
goto *ca; |
|
@end example |
|
For those unfamiliar with the names: @code{ip} is the Forth instruction |
|
pointer; the @code{cfa} (code-field address) corresponds to ANS Forths |
|
execution token and points to the code field of the next word to be |
|
executed; The @code{ca} (code address) fetched from there points to some |
|
executable code, e.g., a primitive or the colon definition handler |
|
@code{docol}. |
|
|
|
Direct threading is even simpler: |
|
@example |
|
ca = *ip++; |
|
goto *ca; |
|
@end example |
|
|
|
Of course we have packaged the whole thing neatly in macros called |
|
@code{NEXT} and @code{NEXT1} (the part of NEXT after fetching the cfa). |
|
|
|
@subsection Scheduling |
|
|
|
There is a little complication: Pipelined and superscalar processors, |
|
i.e., RISC and some modern CISC machines can process independent |
|
instructions while waiting for the results of an instruction. The |
|
compiler usually reorders (schedules) the instructions in a way that |
|
achieves good usage of these delay slots. However, on our first tries |
|
the compiler did not do well on scheduling primitives. E.g., for |
|
@code{+} implemented as |
|
@example |
|
n=sp[0]+sp[1]; |
|
sp++; |
|
sp[0]=n; |
|
NEXT; |
|
@end example |
|
the NEXT comes strictly after the other code, i.e., there is nearly no |
|
scheduling. After a little thought the problem becomes clear: The |
|
compiler cannot know that sp and ip point to different addresses (and |
|
the version of @code{gcc} we used would not know it even if it could), |
|
so it could not move the load of the cfa above the store to the |
|
TOS. Indeed the pointers could be the same, if code on or very near the |
|
top of stack were executed. In the interest of speed we chose to forbid |
|
this probably unused ``feature'' and helped the compiler in scheduling: |
|
NEXT is divided into the loading part (@code{NEXT_P1}) and the goto part |
|
(@code{NEXT_P2}). @code{+} now looks like: |
|
@example |
|
n=sp[0]+sp[1]; |
|
sp++; |
|
NEXT_P1; |
|
sp[0]=n; |
|
NEXT_P2; |
|
@end example |
|
This can be scheduled optimally by the compiler (see \sect{TOS}). |
|
|
|
This division can be turned off with the switch @code{-DCISC_NEXT}. This |
|
switch is on by default on machines that do not profit from scheduling |
|
(e.g., the 80386), in order to preserve registers. |
|
|
|
@subsection Direct or Indirect Threaded? |
|
|
|
Both! After packaging the nasty details in macro definitions we |
|
realized that we could switch between direct and indirect threading by |
|
simply setting a compilation flag (@code{-DDIRECT_THREADED}) and |
|
defining a few machine-specific macros for the direct-threading case. |
|
On the Forth level we also offer access words that hide the |
|
differences between the threading methods (@pxref{Threading Words}). |
|
|
|
Indirect threading is implemented completely |
|
machine-independently. Direct threading needs routines for creating |
|
jumps to the executable code (e.g. to docol or dodoes). These routines |
|
are inherently machine-dependent, but they do not amount to many source |
|
lines. I.e., even porting direct threading to a new machine is a small |
|
effort. |
|
|
|
@subsection DOES> |
|
One of the most complex parts of a Forth engine is @code{dodoes}, i.e., |
|
the chunk of code executed by every word defined by a |
|
@code{CREATE}...@code{DOES>} pair. The main problem here is: How to find |
|
the Forth code to be executed, i.e. the code after the @code{DOES>} (the |
|
DOES-code)? There are two solutions: |
|
|
|
In fig-Forth the code field points directly to the dodoes and the |
|
DOES-code address is stored in the cell after the code address |
|
(i.e. at cfa cell+). It may seem that this solution is illegal in the |
|
Forth-79 and all later standards, because in fig-Forth this address |
|
lies in the body (which is illegal in these standards). However, by |
|
making the code field larger for all words this solution becomes legal |
|
again. We use this approach for the indirect threaded version. Leaving |
|
a cell unused in most words is a bit wasteful, but on the machines we |
|
are targetting this is hardly a problem. The other reason for having a |
|
code field size of two cells is to avoid having different image files |
|
for direct and indirect threaded systems (@pxref{image-format}). |
|
|
|
The other approach is that the code field points or jumps to the cell |
|
after @code{DOES}. In this variant there is a jump to @code{dodoes} at |
|
this address. @code{dodoes} can then get the DOES-code address by |
|
computing the code address, i.e., the address of the jump to dodoes, |
|
and add the length of that jump field. A variant of this is to have a |
|
call to @code{dodoes} after the @code{DOES>}; then the return address |
|
(which can be found in the return register on RISCs) is the DOES-code |
|
address. Since the two cells available in the code field are usually |
|
used up by the jump to the code address in direct threading, we use |
|
this approach for direct threading. We did not want to add another |
|
cell to the code field. |
|
|
|
@section Primitives |
|
|
|
@subsection Automatic Generation |
|
|
|
Since the primitives are implemented in a portable language, there is no |
|
longer any need to minimize the number of primitives. On the contrary, |
|
having many primitives is an advantage: speed. In order to reduce the |
|
number of errors in primitives and to make programming them easier, we |
|
provide a tool, the primitive generator (@file{prims2x.fs}), that |
|
automatically generates most (and sometimes all) of the C code for a |
|
primitive from the stack effect notation. The source for a primitive |
|
has the following form: |
|
|
|
@format |
|
@var{Forth-name} @var{stack-effect} @var{category} [@var{pronounc.}] |
|
[@code{""}@var{glossary entry}@code{""}] |
|
@var{C code} |
|
[@code{:} |
|
@var{Forth code}] |
|
@end format |
|
|
|
The items in brackets are optional. The category and glossary fields |
|
are there for generating the documentation, the Forth code is there |
|
for manual implementations on machines without GNU C. E.g., the source |
|
for the primitive @code{+} is: |
|
@example |
|
+ n1 n2 -- n core plus |
|
n = n1+n2; |
|
@end example |
|
|
|
This looks like a specification, but in fact @code{n = n1+n2} is C |
|
code. Our primitive generation tool extracts a lot of information from |
|
the stack effect notations@footnote{We use a one-stack notation, even |
|
though we have separate data and floating-point stacks; The separate |
|
notation can be generated easily from the unified notation.}: The number |
|
of items popped from and pushed on the stack, their type, and by what |
|
name they are referred to in the C code. It then generates a C code |
|
prelude and postlude for each primitive. The final C code for @code{+} |
|
looks like this: |
|
|
|
@example |
|
I_plus: /* + ( n1 n2 -- n ) */ /* label, stack effect */ |
|
/* */ /* documentation */ |
|
{ |
|
DEF_CA /* definition of variable ca (indirect threading) */ |
|
Cell n1; /* definitions of variables */ |
|
Cell n2; |
|
Cell n; |
|
n1 = (Cell) sp[1]; /* input */ |
|
n2 = (Cell) TOS; |
|
sp += 1; /* stack adjustment */ |
|
NAME("+") /* debugging output (with -DDEBUG) */ |
|
{ |
|
n = n1+n2; /* C code taken from the source */ |
|
} |
|
NEXT_P1; /* NEXT part 1 */ |
|
TOS = (Cell)n; /* output */ |
|
NEXT_P2; /* NEXT part 2 */ |
|
} |
|
@end example |
|
|
|
This looks long and inefficient, but the GNU C compiler optimizes quite |
|
well and produces optimal code for @code{+} on, e.g., the R3000 and the |
|
HP RISC machines: Defining the @code{n}s does not produce any code, and |
|
using them as intermediate storage also adds no cost. |
|
|
|
There are also other optimizations, that are not illustrated by this |
|
example: Assignments between simple variables are usually for free (copy |
|
propagation). If one of the stack items is not used by the primitive |
|
(e.g. in @code{drop}), the compiler eliminates the load from the stack |
|
(dead code elimination). On the other hand, there are some things that |
|
the compiler does not do, therefore they are performed by |
|
@file{prims2x.fs}: The compiler does not optimize code away that stores |
|
a stack item to the place where it just came from (e.g., @code{over}). |
|
|
|
While programming a primitive is usually easy, there are a few cases |
|
where the programmer has to take the actions of the generator into |
|
account, most notably @code{?dup}, but also words that do not (always) |
|
fall through to NEXT. |
|
|
|
@subsection TOS Optimization |
|
|
|
An important optimization for stack machine emulators, e.g., Forth |
|
engines, is keeping one or more of the top stack items in |
|
registers. If a word has the stack effect {@var{in1}...@var{inx} @code{--} |
|
@var{out1}...@var{outy}}, keeping the top @var{n} items in registers |
|
@itemize |
|
@item |
|
is better than keeping @var{n-1} items, if @var{x>=n} and @var{y>=n}, |
|
due to fewer loads from and stores to the stack. |
|
@item is slower than keeping @var{n-1} items, if @var{x<>y} and @var{x<n} and |
|
@var{y<n}, due to additional moves between registers. |
|
@end itemize |
|
|
|
In particular, keeping one item in a register is never a disadvantage, |
|
if there are enough registers. Keeping two items in registers is a |
|
disadvantage for frequent words like @code{?branch}, constants, |
|
variables, literals and @code{i}. Therefore our generator only produces |
|
code that keeps zero or one items in registers. The generated C code |
|
covers both cases; the selection between these alternatives is made at |
|
C-compile time using the switch @code{-DUSE_TOS}. @code{TOS} in the C |
|
code for @code{+} is just a simple variable name in the one-item case, |
|
otherwise it is a macro that expands into @code{sp[0]}. Note that the |
|
GNU C compiler tries to keep simple variables like @code{TOS} in |
|
registers, and it usually succeeds, if there are enough registers. |
|
|
|
The primitive generator performs the TOS optimization for the |
|
floating-point stack, too (@code{-DUSE_FTOS}). For floating-point |
|
operations the benefit of this optimization is even larger: |
|
floating-point operations take quite long on most processors, but can be |
|
performed in parallel with other operations as long as their results are |
|
not used. If the FP-TOS is kept in a register, this works. If |
|
it is kept on the stack, i.e., in memory, the store into memory has to |
|
wait for the result of the floating-point operation, lengthening the |
|
execution time of the primitive considerably. |
|
|
|
The TOS optimization makes the automatic generation of primitives a |
|
bit more complicated. Just replacing all occurrences of @code{sp[0]} by |
|
@code{TOS} is not sufficient. There are some special cases to |
|
consider: |
|
@itemize |
|
@item In the case of @code{dup ( w -- w w )} the generator must not |
|
eliminate the store to the original location of the item on the stack, |
|
if the TOS optimization is turned on. |
|
@item Primitives with stack effects of the form {@code{--} |
|
@var{out1}...@var{outy}} must store the TOS to the stack at the start. |
|
Likewise, primitives with the stack effect {@var{in1}...@var{inx} @code{--}} |
|
must load the TOS from the stack at the end. But for the null stack |
|
effect @code{--} no stores or loads should be generated. |
|
@end itemize |
|
|
|
@subsection Produced code |
|
|
|
To see what assembly code is produced for the primitives on your machine |
|
with your compiler and your flag settings, type @code{make engine.s} and |
|
look at the resulting file @file{engine.c}. |
|
|
|
@section System Architecture |
|
|
|
Our Forth system consists not only of primitives, but also of |
|
definitions written in Forth. Since the Forth compiler itself belongs |
|
to those definitions, it is not possible to start the system with the |
|
primitives and the Forth source alone. Therefore we provide the Forth |
|
code as an image file in nearly executable form. At the start of the |
|
system a C routine loads the image file into memory, sets up the |
|
memory (stacks etc.) according to information in the image file, and |
|
starts executing Forth code. |
|
|
|
The image file format is a compromise between the goals of making it |
|
easy to generate image files and making them portable. The easiest way |
|
to generate an image file is to just generate a memory dump. However, |
|
this kind of image file cannot be used on a different machine, or on |
|
the next version of the engine on the same machine, it even might not |
|
work with the same engine compiled by a different version of the C |
|
compiler. We would like to have as few versions of the image file as |
|
possible, because we do not want to distribute many versions of the |
|
same image file, and to make it easy for the users to use their image |
|
files on many machines. We currently need to create a different image |
|
file for machines with different cell sizes and different byte order |
|
(little- or big-endian)@footnote{We consider adding information to the |
|
image file that enables the loader to change the byte order.}. |
|
|
|
Forth code that is going to end up in a portable image file has to |
|
comply to some restrictions: addresses have to be stored in memory |
|
with special words (@code{A!}, @code{A,}, etc.) in order to make the |
|
code relocatable. Cells, floats, etc., have to be stored at the |
|
natural alignment boundaries@footnote{E.g., store floats (8 bytes) at |
|
an address dividable by~8. This happens automatically in our system |
|
when you use the ANSI alignment words.}, in order to avoid alignment |
|
faults on machines with stricter alignment. The image file is produced |
|
by a metacompiler (@file{cross.fs}). |
|
|
|
So, unlike the image file of Mitch Bradleys @code{cforth}, our image |
|
file is not directly executable, but has to undergo some manipulations |
|
during loading. Address relocation is performed at image load-time, not |
|
at run-time. The loader also has to replace tokens standing for |
|
primitive calls with the appropriate code-field addresses (or code |
|
addresses in the case of direct threading). |
|
|
@contents |
@contents |
@bye |
@bye |
|
|