Node:TOS Optimization, Next:Produced code, Previous:Automatic Generation, Up:Primitives
An important optimization for stack machine emulators, e.g., Forth
engines, is keeping  one or more of the top stack items in
registers.  If a word has the stack effect in1...inx --
out1...outy, keeping the top n items in registers
In particular, keeping one item in a register is never a disadvantage,
if there are enough registers. Keeping two items in registers is a
disadvantage for frequent words like ?branch, constants,
variables, literals and i. Therefore our generator only produces
code that keeps zero or one items in registers. The generated C code
covers both cases; the selection between these alternatives is made at
C-compile time using the switch -DUSE_TOS. TOS in the C
code for + is just a simple variable name in the one-item case,
otherwise it is a macro that expands into sp[0]. Note that the
GNU C compiler tries to keep simple variables like TOS in
registers, and it usually succeeds, if there are enough registers.
The primitive generator performs the TOS optimization for the
floating-point stack, too (-DUSE_FTOS). For floating-point
operations the benefit of this optimization is even larger:
floating-point operations take quite long on most processors, but can be
performed in parallel with other operations as long as their results are
not used. If the FP-TOS is kept in a register, this works. If
it is kept on the stack, i.e., in memory, the store into memory has to
wait for the result of the floating-point operation, lengthening the
execution time of the primitive considerably.
The TOS optimization makes the automatic generation of primitives a
bit more complicated. Just replacing all occurrences of sp[0] by
TOS is not sufficient. There are some special cases to
consider:
dup ( w -- w w ) the generator must not
eliminate the store to the original location of the item on the stack,
if the TOS optimization is turned on. 
--
out1...outy must store the TOS to the stack at the start. 
Likewise, primitives with the stack effect in1...inx --
must load the TOS from the stack at the end. But for the null stack
effect -- no stores or loads should be generated.