--- gforth/Attic/gforth.ds 1995/11/15 17:29:07 1.24 +++ gforth/Attic/gforth.ds 1995/11/30 18:04:27 1.26 @@ -44,6 +44,7 @@ Copyright @copyright{} 1995 Free Softwar @center for version 0.1 @sp 2 @center Anton Ertl +@center Bernd Paysan @sp 3 @center This manual is under construction @@ -2088,9 +2089,9 @@ machine code), and for defining the the nature of Gforth poses a few problems: First of all. Gforth runs on several architectures, so it can provide no standard assembler. What's worse is that the register allocation not only depends on the processor, -but also on the gcc version and options used. +but also on the @code{gcc} version and options used. -The words Gforth offers encapsulate some system dependences (e.g., the +The words that Gforth offers encapsulate some system dependences (e.g., the header structure), so a system-independent assembler may be used in Gforth. If you do not have an assembler, you can compile machine code directly with @code{,} and @code{c,}. @@ -2108,10 +2109,42 @@ These words are rarely used. Therefore t which is usually not loaded (except @code{flush-icache}, which is always present). You can load them with @code{require code.fs}. +In the assembly code you will want to refer to the inner interpreter's +registers (e.g., the data stack pointer) and you may want to use other +registers for temporary storage. Unfortunately, the register allocation +is installation-dependent. + +The easiest solution is to use explicit register declarations +(@pxref{Explicit Reg Vars, , Variables in Specified Registers, gcc.info, +GNU C Manual}) for all of the inner interpreter's registers: You have to +compile Gforth with @code{-DFORCE_REG} (configure option +@code{--enable-force-reg}) and the appropriate declarations must be +present in the @code{machine.h} file (see @code{mips.h} for an example; +you can find a full list of all declarable register symbols with +@code{grep register engine.c}). If you give explicit registers to all +variables that are declared at the beginning of @code{engine()}, you +should be able to use the other caller-saved registers for temporary +storage. Alternatively, you can use the @code{gcc} option +@code{-ffixed-REG} (@pxref{Code Gen Options, , Options for Code +Generation Conventions, gcc.info, GNU C Manual}) to reserve a register +(however, this restriction on register allocation may slow Gforth +significantly). + +If this solution is not viable (e.g., because @code{gcc} does not allow +you to explicitly declare all the registers you need), you have to find +out by looking at the code where the inner interpreter's registers +reside and which registers can be used for temporary storage. You can +get an assembly listing of the engine's code with @code{make engine.s}. + +In any case, it is good practice to abstract your assembly code from the +actual register allocation. E.g., if the data stack pointer resides in +register @code{$17}, create an alias for this register called @code{sp}, +and use that in your assembly code. + Another option for implementing normal and defining words efficiently is: adding the wanted functionality to the source of Gforth. For normal words you just have to edit @file{primitives}, defining words (for fast -defined words) probably require changes in @file{engine.c}, +defined words) may require changes in @file{engine.c}, @file{kernal.fs}, @file{prims2x.fs}, and possibly @file{cross.fs}. @@ -2384,7 +2417,7 @@ characters is determined by the locale y @item division rounding: installation dependent. @code{s" floored" environment? drop .}. We leave -the choice to gcc (what to use for @code{/}) and to you (whether to use +the choice to @code{gcc} (what to use for @code{/}) and to you (whether to use @code{fm/mod}, @code{sm/rem} or simply @code{/}). @item values of @code{STATE} when true: @@ -2495,9 +2528,9 @@ The next invocation of a parsing word re Compiles a recursive call to the defining word not to the defined word. @item argument input source different than current input source for @code{RESTORE-INPUT}: -!!???If the argument input source is a valid input source then it gets -restored. Otherwise causes @code{-12 THROW}, which, unless caught, issues -the message "argument type mismatch" and aborts. +If the argument input source is a valid input source then it gets +restored. Otherwise the result is undefined. +@comment causes @code{-12 THROW}, which, unless caught, issues the message "argument type mismatch" and aborts. !! not all of the state is restored (e.g., sourcefilename). @item data space containing definitions gets de-allocated: Deallocation with @code{allot} is not checked. This typically resuls in @@ -2569,14 +2602,17 @@ Not checked. As usual, you can expect me None. @item operator's terminal facilities available: -!!?? +After processing the command line, Gforth goes into interactive mode, +and you can give commands to Gforth interactively. The actual facilities +available depend on how you invoke Gforth. @item program data space available: @code{sp@ here - .} gives the space remaining for dictionary and data stack together. @item return stack space available: -!!?? +By default 16 KBytes. The default can be overridden with the @code{-r} +switch (@pxref{Invocation}) when Gforth starts up. @item stack space available: @code{sp@ here - .} gives the space remaining for dictionary and data @@ -2897,7 +2933,10 @@ System dependent; @code{REPRESENT} is im function @code{ecvt()} and inherits its behaviour in this respect. @item rounding or truncation of floating-point numbers: -What's the question?!! +System dependent; the rounding behaviour is inherited from the hosting C +compiler. IEEE-FP-based (i.e., most) systems by default round to +nearest, and break ties by rounding to even (i.e., such that the last +bit of the mantissa is 0). @item size of floating-point stack: @code{s" FLOATING-STACK" environment? drop .}. Can be changed at startup @@ -3608,15 +3647,21 @@ Sieve benchmark on a 486DX2/66 than Gfor However, this potential advantage of assembly language implementations is not necessarily realized in complete Forth systems: We compared -Gforth (compiled with @code{gcc-2.6.3} and @code{-DFORCE_REG}) with -Win32Forth 1.2093 and LMI's NT Forth (Beta, May 1994), two systems -written in assembly, and with two systems written in C: PFE-0.9.11 -(compiled with @code{gcc-2.6.3} with the default configuration for -Linux: @code{-O2 -fomit-frame-pointer -DUSE_REGS}) and ThisForth Beta -(compiled with gcc-2.6.3 -O3 -fomit-frame-pointer). We benchmarked -Gforth, PFE and ThisForth on a 486DX2/66 under Linux. Kenneth O'Heskin -kindly provided the results for Win32Forth and NT Forth on a 486DX2/66 -with similar memory performance under Windows NT. +Gforth (direct threaded, compiled with @code{gcc-2.6.3} and +@code{-DFORCE_REG}) with Win32Forth 1.2093, LMI's NT Forth (Beta, May +1994) and Eforth (with and without peephole (aka pinhole) optimization +of the threaded code); all these systems were written in assembly +language. We also compared Gforth with two systems written in C: +PFE-0.9.11 (compiled with @code{gcc-2.6.3} with the default +configuration for Linux: @code{-O2 -fomit-frame-pointer -DUSE_REGS}) and +ThisForth Beta (compiled with gcc-2.6.3 -O3 -fomit-frame-pointer; +ThisForth employs peephole optimization of the threaded code). We +benchmarked Gforth, PFE and ThisForth on a 486DX2/66 under +Linux. Kenneth O'Heskin kindly provided the results for Win32Forth and +NT Forth on a 486DX2/66 with similar memory performance under Windows +NT. Marcel Hendrix ported Eforth to Linux, then extended it to run the +benchmarks, added the peephole optimizer, ran the benchmarks and +reported the results. We used four small benchmarks: the ubiquitous Sieve; bubble-sorting and matrix multiplication come from the Stanford integer benchmarks and have @@ -3628,12 +3673,12 @@ other words, it shows the speedup factor other systems). @example -relative Win32- NT This- - time Gforth Forth Forth PFE Forth -sieve 1.00 1.30 1.07 1.67 2.98 -bubble 1.00 1.30 1.40 1.66 -matmul 1.00 1.40 1.29 2.24 -fib 1.00 1.44 1.26 1.82 2.82 +relative Win32- NT eforth This- +time Gforth Forth Forth eforth +opt PFE Forth +sieve 1.00 1.39 1.14 1.39 0.85 1.78 3.18 +bubble 1.00 1.33 1.43 1.51 0.89 1.70 +matmul 1.00 1.43 1.31 1.42 1.12 2.28 +fib 1.00 1.55 1.36 1.24 1.15 1.97 3.04 @end example You may find the good performance of Gforth compared with the systems @@ -3645,6 +3690,11 @@ method for relocating the Forth image: l the actual addresses at run time, resulting in two address computations per NEXT (@pxref{System Architecture}). +Only Eforth with the peephole optimizer performs comparable to +Gforth. The speedups achieved with peephole optimization of threaded +code are quite remarkable. Adding a peephole optimizer to Gforth should +cause similar speedups. + The speedup of Gforth over PFE and ThisForth can be easily explained with the self-imposed restriction to standard C (although the measured implementation of PFE uses a GNU C extension: global register @@ -3659,9 +3709,11 @@ machine registers by itself and would no register declarations, giving a 1.3 times slower engine (on a 486DX2/66 running the Sieve) than the one measured above. -The numbers in this section have also been published in the paper -@cite{Translating Forth to Efficient C} by M. Anton Ertl and Martin -Maierhofer, presented at EuroForth '95. It is available at +In @cite{Translating Forth to Efficient C} by M. Anton Ertl and Martin +Maierhofer (presented at EuroForth '95), an indirect threaded version of +Gforth is compared with Win32Forth, NT Forth, PFE, and ThisForth; that +version of Gforth is 2\%@minus{}8\% slower on a 486 than the version +used here. The paper available at @*@file{http://www.complang.tuwien.ac.at/papers/ertl&maierhofer95.ps.gz}; it also contains numbers for some native code systems. You can find numbers for Gforth on various machines in @file{Benchres}. @@ -3701,7 +3753,7 @@ VolksForth descends from F83. It was wri Pennemann, Georg Rehfeld and Dietrich Weineck for the C64 (called UltraForth there) in the mid-80s and ported to the Atari ST in 1986. -Laxen and Perry wrote F83 as a model implementation of the +Hennry Laxen and Mike Perry wrote F83 as a model implementation of the Forth-83 standard. !! Pedigree? When? A team led by Bill Ragsdale implemented fig-Forth on many processors in