gforth/gforth.ds - diff

Return to gforth.ds CVS log

Up to [gforth] / gforth

Diff for /gforth/Attic/gforth.ds between versions 1.2 and 1.3

-version 1.2, 1994/11/14 19:01:16
+version 1.3, 1994/11/23 16:54:39
  Line 746  entry represents a backward branch targe
  building any control structure possible (except control structures that
  need storage, like calls, coroutines, and backtracking).
- if
+ doc-if
- ahead
+ doc-ahead
- then
+ doc-then
- begin
+ doc-begin
- until
+ doc-until
- again
+ doc-again
- cs-pick
+ doc-cs-pick
- cs-roll
+ doc-cs-roll
  On many systems control-flow stack items take one word, in gforth they
  currently take three (this may change in the future). Therefore it is a
  Line 763  words.
  Some standard control structure words are built from these words:
- else
+ doc-else
- while
+ doc-while
- repeat
+ doc-repeat
  Counted loop words constitute a separate group of words:
- ?do
+ doc-?do
- do
+ doc-do
- for
+ doc-for
- loop
+ doc-loop
- s+loop
+ doc-s+loop
- +loop
+ doc-+loop
- next
+ doc-next
- leave
+ doc-leave
- ?leave
+ doc-?leave
- unloop
+ doc-unloop
- undo
+ doc-undo
  The standard does not allow using @code{cs-pick} and @code{cs-roll} on
  @i{do-sys}. Our system allows it, but it's your job to ensure that for
  every @code{?DO} etc. there is exactly one @code{UNLOOP} on any path
- through the program (@code{LOOP} etc. compile an @code{UNLOOP}). Also,
+ through the definition (@code{LOOP} etc. compile an @code{UNLOOP} on the
- you have to ensure that all @code{LEAVE}s are resolved (by using one of
+ fall-through path). Also, you have to ensure that all @code{LEAVE}s are
- the loop-ending words or @code{UNDO}).
+ resolved (by using one of the loop-ending words or @code{UNDO}).
  Another group of control structure words are
- case
+ doc-case
- endcase
+ doc-endcase
- of
+ doc-of
- endof
+ doc-endof
  @i{case-sys} and @i{of-sys} cannot be processed using @code{cs-pick} and
  @code{cs-roll}.
+ @subsubsection Programming Style
+ In order to ensure readability we recommend that you do not create
+ arbitrary control structures directly, but define new control structure
+ words for the control structure you want and use these words in your
+ program.
+ E.g., instead of writing
+ @example
+ begin
+   ...
+ if [ 1 cs-roll ]
+   ...
+ again then
+ @end example
+ we recommend defining control structure words, e.g.,
+ @example
+ : while ( dest -- orig dest )
+  POSTPONE if
+cs-roll ; immediate
+ : repeat ( orig dest -- )
+  POSTPONE again
+  POSTPONE then ; immediate
+ @end example
+ and then using these to create the control structure:
+ @example
+ begin
+   ...
+ while
+   ...
+ repeat
+ @end example
+ That's much easier to read, isn't it? Of course, @code{BEGIN} and
+ @code{WHILE} are predefined, so in this example it would not be
+ necessary to define them.
+ @subsection Calls and returns
+ A definition can be called simply be writing the name of the
+ definition. When the end of the definition is reached, it returns. An earlier return can be forced using
+ doc-exit
+ Don't forget to clean up the return stack and @code{UNLOOP} any
+ outstanding @code{?DO}...@code{LOOP}s before @code{EXIT}ing. The
+ primitive compiled by @code{EXIT} is
+ doc-;s
+ @subsection Exception Handling
+ doc-catch
+ doc-throw
  @node Locals
  @section Locals
- Line 923  says. If @code{UNREACHABLE} is used wher
+ Line 984  says. If @code{UNREACHABLE} is used wher
  lie to the compiler), buggy code will be produced.
  Another problem with this rule is that at @code{BEGIN}, the compiler
- does not know which locals will be visible on the incoming back-edge
+ does not know which locals will be visible on the incoming
- . All problems discussed in the following are due to this ignorance of
+ back-edge. All problems discussed in the following are due to this
- the compiler (we discuss the problems using @code{BEGIN} loops as
+ ignorance of the compiler (we discuss the problems using @code{BEGIN}
- examples; the discussion also applies to @code{?DO} and other
+ loops as examples; the discussion also applies to @code{?DO} and other
  loops). Perhaps the most insidious example is:
  @example
  AHEAD
- Line 1299  programs harder to read, and easier to m
+ Line 1360  programs harder to read, and easier to m
  merit of this syntax is that it is easy to implement using the ANS Forth
  locals wordset.
+ @node Internals
+ @chapter Internals
+ Reading this section is not necessary for programming with gforth. It
+ should be helpful for finding your way in the gforth sources.
+ @section Portability
+ One of the main goals of the effort is availability across a wide range
+ of personal machines. fig-Forth, and, to a lesser extent, F83, achieved
+ this goal by manually coding the engine in assembly language for several
+ then-popular processors. This approach is very labor-intensive and the
+ results are short-lived due to progress in computer architecture.
+ Others have avoided this problem by coding in C, e.g., Mitch Bradley
+ (cforth), Mikael Patel (TILE) and Dirk Zoller (pfe). This approach is
+ particularly popular for UNIX-based Forths due to the large variety of
+ architectures of UNIX machines. Unfortunately an implementation in C
+ does not mix well with the goals of efficiency and with using
+ traditional techniques: Indirect or direct threading cannot be expressed
+ in C, and switch threading, the fastest technique available in C, is
+ significantly slower. Another problem with C is that it's very
+ cumbersome to express double integer arithmetic.
+ Fortunately, there is a portable language that does not have these
+ limitations: GNU C, the version of C processed by the GNU C compiler
+ (@pxref{C Extensions, , Extensions to the C Language Family, gcc.info,
+ GNU C Manual}). Its labels as values feature (@pxref{Labels as Values, ,
+ Labels as Values, gcc.info, GNU C Manual}) makes direct and indirect
+ threading possible, its @code{long long} type (@pxref{Long Long, ,
+ Double-Word Integers, gcc.info, GNU C Manual}) corresponds to Forths
+ double numbers. GNU C is available for free on all important (and many
+ unimportant) UNIX machines, VMS, 80386s running MS-DOS, the Amiga, and
+ the Atari ST, so a Forth written in GNU C can run on all these
+ machines@footnote{Due to Apple's look-and-feel lawsuit it is not
+ available on the Mac (@pxref{Boycott, , Protect Your Freedom--Fight
+ ``Look And Feel'', gcc.info, GNU C Manual}).}.
+ Writing in a portable language has the reputation of producing code that
+ is slower than assembly. For our Forth engine we repeatedly looked at
+ the code produced by the compiler and eliminated most compiler-induced
+ inefficiencies by appropriate changes in the source-code.
+ However, register allocation cannot be portably influenced by the
+ programmer, leading to some inefficiencies on register-starved
+ machines. We use explicit register declarations (@pxref{Explicit Reg
+ Vars, , Variables in Specified Registers, gcc.info, GNU C Manual}) to
+ improve the speed on some machines. They are turned on by using the
+ @code{gcc} switch @code{-DFORCE_REG}. Unfortunately, this feature not
+ only depends on the machine, but also on the compiler version: On some
+ machines some compiler versions produce incorrect code when certain
+ explicit register declarations are used. So by default
+ @code{-DFORCE_REG} is not used.
+ @section Threading
+ GNU C's labels as values extension (available since @code{gcc-2.0},
+ @pxref{Labels as Values, , Labels as Values, gcc.info, GNU C Manual})
+ makes it possible to take the address of @var{label} by writing
+ @code{&&@var{label}}.  This address can then be used in a statement like
+ @code{goto *@var{address}}. I.e., @code{goto *&&x} is the same as
+ @code{goto x}.
+ With this feature an indirect threaded NEXT looks like:
+ @example
+ cfa = *ip++;
+ ca = *cfa;
+ goto *ca;
+ @end example
+ For those unfamiliar with the names: @code{ip} is the Forth instruction
+ pointer; the @code{cfa} (code-field address) corresponds to ANS Forths
+ execution token and points to the code field of the next word to be
+ executed; The @code{ca} (code address) fetched from there points to some
+ executable code, e.g., a primitive or the colon definition handler
+ @code{docol}.
+ Direct threading is even simpler:
+ @example
+ ca = *ip++;
+ goto *ca;
+ @end example
+ Of course we have packaged the whole thing neatly in macros called
+ @code{NEXT} and @code{NEXT1} (the part of NEXT after fetching the cfa).
+ @subsection Scheduling
+ There is a little complication: Pipelined and superscalar processors,
+ i.e., RISC and some modern CISC machines can process independent
+ instructions while waiting for the results of an instruction. The
+ compiler usually reorders (schedules) the instructions in a way that
+ achieves good usage of these delay slots. However, on our first tries
+ the compiler did not do well on scheduling primitives. E.g., for
+ @code{+} implemented as
+ @example
+ n=sp[0]+sp[1];
+ sp++;
+ sp[0]=n;
+ NEXT;
+ @end example
+ the NEXT comes strictly after the other code, i.e., there is nearly no
+ scheduling. After a little thought the problem becomes clear: The
+ compiler cannot know that sp and ip point to different addresses (and
+ the version of @code{gcc} we used would not know it even if it could),
+ so it could not move the load of the cfa above the store to the
+ TOS. Indeed the pointers could be the same, if code on or very near the
+ top of stack were executed. In the interest of speed we chose to forbid
+ this probably unused ``feature'' and helped the compiler in scheduling:
+ NEXT is divided into the loading part (@code{NEXT_P1}) and the goto part
+ (@code{NEXT_P2}). @code{+} now looks like:
+ @example
+ n=sp[0]+sp[1];
+ sp++;
+ NEXT_P1;
+ sp[0]=n;
+ NEXT_P2;
+ @end example
+ This can be scheduled optimally by the compiler (see \sect{TOS}).
+ This division can be turned off with the switch @code{-DCISC_NEXT}. This
+ switch is on by default on machines that do not profit from scheduling
+ (e.g., the 80386), in order to preserve registers.
+ @subsection Direct or Indirect Threaded?
+ Both! After packaging the nasty details in macro definitions we
+ realized that we could switch between direct and indirect threading by
+ simply setting a compilation flag (@code{-DDIRECT_THREADED}) and
+ defining a few machine-specific macros for the direct-threading case.
+ On the Forth level we also offer access words that hide the
+ differences between the threading methods (@pxref{Threading Words}).
+ Indirect threading is implemented completely
+ machine-independently. Direct threading needs routines for creating
+ jumps to the executable code (e.g. to docol or dodoes). These routines
+ are inherently machine-dependent, but they do not amount to many source
+ lines. I.e., even porting direct threading to a new machine is a small
+ effort.
+ @subsection DOES>
+ One of the most complex parts of a Forth engine is @code{dodoes}, i.e.,
+ the chunk of code executed by every word defined by a
+ @code{CREATE}...@code{DOES>} pair. The main problem here is: How to find
+ the Forth code to be executed, i.e. the code after the @code{DOES>} (the
+ DOES-code)? There are two solutions:
+ In fig-Forth the code field points directly to the dodoes and the
+ DOES-code address is stored in the cell after the code address
+ (i.e. at cfa cell+). It may seem that this solution is illegal in the
+ Forth-79 and all later standards, because in fig-Forth this address
+ lies in the body (which is illegal in these standards). However, by
+ making the code field larger for all words this solution becomes legal
+ again. We use this approach for the indirect threaded version. Leaving
+ a cell unused in most words is a bit wasteful, but on the machines we
+ are targetting this is hardly a problem. The other reason for having a
+ code field size of two cells is to avoid having different image files
+ for direct and indirect threaded systems (@pxref{image-format}).
+ The other approach is that the code field points or jumps to the cell
+ after @code{DOES}. In this variant there is a jump to @code{dodoes} at
+ this address. @code{dodoes} can then get the DOES-code address by
+ computing the code address, i.e., the address of the jump to dodoes,
+ and add the length of that jump field. A variant of this is to have a
+ call to @code{dodoes} after the @code{DOES>}; then the return address
+ (which can be found in the return register on RISCs) is the DOES-code
+ address. Since the two cells available in the code field are usually
+ used up by the jump to the code address in direct threading, we use
+ this approach for direct threading. We did not want to add another
+ cell to the code field.
+ @section Primitives
+ @subsection Automatic Generation
+ Since the primitives are implemented in a portable language, there is no
+ longer any need to minimize the number of primitives. On the contrary,
+ having many primitives is an advantage: speed. In order to reduce the
+ number of errors in primitives and to make programming them easier, we
+ provide a tool, the primitive generator (@file{prims2x.fs}), that
+ automatically generates most (and sometimes all) of the C code for a
+ primitive from the stack effect notation.  The source for a primitive
+ has the following form:
+ @format
+ @var{Forth-name}        @var{stack-effect}      @var{category}  [@var{pronounc.}]
+ [@code{""}@var{glossary entry}@code{""}]
+ @var{C code}
+ [@code{:}
+ @var{Forth code}]
+ @end format
+ The items in brackets are optional. The category and glossary fields
+ are there for generating the documentation, the Forth code is there
+ for manual implementations on machines without GNU C. E.g., the source
+ for the primitive @code{+} is:
+ @example
+ +    n1 n2 -- n    core    plus
+ n = n1+n2;
+ @end example
+ This looks like a specification, but in fact @code{n = n1+n2} is C
+ code. Our primitive generation tool extracts a lot of information from
+ the stack effect notations@footnote{We use a one-stack notation, even
+ though we have separate data and floating-point stacks; The separate
+ notation can be generated easily from the unified notation.}: The number
+ of items popped from and pushed on the stack, their type, and by what
+ name they are referred to in the C code. It then generates a C code
+ prelude and postlude for each primitive. The final C code for @code{+}
+ looks like this:
+ @example
+ I_plus: /* + ( n1 n2 -- n ) */  /* label, stack effect */
+ /*  */                          /* documentation */
+ {
+ DEF_CA                          /* definition of variable ca (indirect threading) */
+ Cell n1;                        /* definitions of variables */
+ Cell n2;
+ Cell n;
+ n1 = (Cell) sp[1];              /* input */
+ n2 = (Cell) TOS;
+ sp += 1;                        /* stack adjustment */
+ NAME("+")                       /* debugging output (with -DDEBUG) */
+ {
+ n = n1+n2;                      /* C code taken from the source */
+ }
+ NEXT_P1;                        /* NEXT part 1 */
+ TOS = (Cell)n;                  /* output */
+ NEXT_P2;                        /* NEXT part 2 */
+ }
+ @end example
+ This looks long and inefficient, but the GNU C compiler optimizes quite
+ well and produces optimal code for @code{+} on, e.g., the R3000 and the
+ HP RISC machines: Defining the @code{n}s does not produce any code, and
+ using them as intermediate storage also adds no cost.
+ There are also other optimizations, that are not illustrated by this
+ example: Assignments between simple variables are usually for free (copy
+ propagation). If one of the stack items is not used by the primitive
+ (e.g.  in @code{drop}), the compiler eliminates the load from the stack
+ (dead code elimination). On the other hand, there are some things that
+ the compiler does not do, therefore they are performed by
+ @file{prims2x.fs}: The compiler does not optimize code away that stores
+ a stack item to the place where it just came from (e.g., @code{over}).
+ While programming a primitive is usually easy, there are a few cases
+ where the programmer has to take the actions of the generator into
+ account, most notably @code{?dup}, but also words that do not (always)
+ fall through to NEXT.
+ @subsection TOS Optimization
+ An important optimization for stack machine emulators, e.g., Forth
+ engines, is keeping  one or more of the top stack items in
+ registers.  If a word has the stack effect {@var{in1}...@var{inx} @code{--}
+ @var{out1}...@var{outy}}, keeping the top @var{n} items in registers
+ @itemize
+ @item
+ is better than keeping @var{n-1} items, if @var{x>=n} and @var{y>=n},
+ due to fewer loads from and stores to the stack.
+ @item is slower than keeping @var{n-1} items, if @var{x<>y} and @var{x<n} and
+ @var{y<n}, due to additional moves between registers.
+ @end itemize
+ In particular, keeping one item in a register is never a disadvantage,
+ if there are enough registers. Keeping two items in registers is a
+ disadvantage for frequent words like @code{?branch}, constants,
+ variables, literals and @code{i}. Therefore our generator only produces
+ code that keeps zero or one items in registers. The generated C code
+ covers both cases; the selection between these alternatives is made at
+ C-compile time using the switch @code{-DUSE_TOS}. @code{TOS} in the C
+ code for @code{+} is just a simple variable name in the one-item case,
+ otherwise it is a macro that expands into @code{sp[0]}. Note that the
+ GNU C compiler tries to keep simple variables like @code{TOS} in
+ registers, and it usually succeeds, if there are enough registers.
+ The primitive generator performs the TOS optimization for the
+ floating-point stack, too (@code{-DUSE_FTOS}). For floating-point
+ operations the benefit of this optimization is even larger:
+ floating-point operations take quite long on most processors, but can be
+ performed in parallel with other operations as long as their results are
+ not used. If the FP-TOS is kept in a register, this works. If
+ it is kept on the stack, i.e., in memory, the store into memory has to
+ wait for the result of the floating-point operation, lengthening the
+ execution time of the primitive considerably.
+ The TOS optimization makes the automatic generation of primitives a
+ bit more complicated. Just replacing all occurrences of @code{sp[0]} by
+ @code{TOS} is not sufficient. There are some special cases to
+ consider:
+ @itemize
+ @item In the case of @code{dup ( w -- w w )} the generator must not
+ eliminate the store to the original location of the item on the stack,
+ if the TOS optimization is turned on.
+ @item Primitives with stack effects of the form {@code{--}
+ @var{out1}...@var{outy}} must store the TOS to the stack at the start.
+ Likewise, primitives with the stack effect {@var{in1}...@var{inx} @code{--}}
+ must load the TOS from the stack at the end. But for the null stack
+ effect @code{--} no stores or loads should be generated.
+ @end itemize
+ @subsection Produced code
+ To see what assembly code is produced for the primitives on your machine
+ with your compiler and your flag settings, type @code{make engine.s} and
+ look at the resulting file @file{engine.c}.
+ @section System Architecture
+ Our Forth system consists not only of primitives, but also of
+ definitions written in Forth. Since the Forth compiler itself belongs
+ to those definitions, it is not possible to start the system with the
+ primitives and the Forth source alone. Therefore we provide the Forth
+ code as an image file in nearly executable form. At the start of the
+ system a C routine loads the image file into memory, sets up the
+ memory (stacks etc.) according to information in the image file, and
+ starts executing Forth code.
+ The image file format is a compromise between the goals of making it
+ easy to generate image files and making them portable. The easiest way
+ to generate an image file is to just generate a memory dump. However,
+ this kind of image file cannot be used on a different machine, or on
+ the next version of the engine on the same machine, it even might not
+ work with the same engine compiled by a different version of the C
+ compiler. We would like to have as few versions of the image file as
+ possible, because we do not want to distribute many versions of the
+ same image file, and to make it easy for the users to use their image
+ files on many machines. We currently need to create a different image
+ file for machines with different cell sizes and different byte order
+ (little- or big-endian)@footnote{We consider adding information to the
+ image file that enables the loader to change the byte order.}.
+ Forth code that is going to end up in a portable image file has to
+ comply to some restrictions: addresses have to be stored in memory
+ with special words (@code{A!}, @code{A,}, etc.) in order to make the
+ code relocatable. Cells, floats, etc., have to be stored at the
+ natural alignment boundaries@footnote{E.g., store floats (8 bytes) at
+ an address dividable by~8. This happens automatically in our system
+ when you use the ANSI alignment words.}, in order to avoid alignment
+ faults on machines with stricter alignment. The image file is produced
+ by a metacompiler (@file{cross.fs}).
+ So, unlike the image file of Mitch Bradleys @code{cforth}, our image
+ file is not directly executable, but has to undergo some manipulations
+ during loading. Address relocation is performed at image load-time, not
+ at run-time. The loader also has to replace tokens standing for
+ primitive calls with the appropriate code-field addresses (or code
+ addresses in the case of direct threading).
  @contents
  @bye

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>

Removed from v.1.2
changed lines
	Added in v.1.3