Annotation of gforth/gforth.ds, revision 1.19
1.1 anton 1: \input texinfo @c -*-texinfo-*-
2: @comment The source is gforth.ds, from which gforth.texi is generated
3: @comment %**start of header (This is for running Texinfo on a region.)
1.4 anton 4: @setfilename gforth.info
1.17 anton 5: @settitle Gforth Manual
1.4 anton 6: @comment @setchapternewpage odd
1.1 anton 7: @comment %**end of header (This is for running Texinfo on a region.)
8:
9: @ifinfo
1.17 anton 10: This file documents Gforth 0.1
1.1 anton 11:
1.17 anton 12: Copyright @copyright{} 1994 Gforth Development Group
1.1 anton 13:
14: Permission is granted to make and distribute verbatim copies of
15: this manual provided the copyright notice and this permission notice
16: are preserved on all copies.
17:
1.4 anton 18: @ignore
1.1 anton 19: Permission is granted to process this file through TeX and print the
20: results, provided the printed document carries a copying permission
21: notice identical to this one except for the removal of this paragraph
22: (this paragraph not being relevant to the printed manual).
23:
1.4 anton 24: @end ignore
1.1 anton 25: Permission is granted to copy and distribute modified versions of this
26: manual under the conditions for verbatim copying, provided also that the
27: sections entitled "Distribution" and "General Public License" are
28: included exactly as in the original, and provided that the entire
29: resulting derived work is distributed under the terms of a permission
30: notice identical to this one.
31:
32: Permission is granted to copy and distribute translations of this manual
33: into another language, under the above conditions for modified versions,
34: except that the sections entitled "Distribution" and "General Public
35: License" may be included in a translation approved by the author instead
36: of in the original English.
37: @end ifinfo
38:
39: @titlepage
40: @sp 10
1.17 anton 41: @center @titlefont{Gforth Manual}
1.1 anton 42: @sp 2
1.17 anton 43: @center for version 0.1
1.1 anton 44: @sp 2
45: @center Anton Ertl
1.17 anton 46: @sp 3
47: @center This manual is under construction
1.1 anton 48:
49: @comment The following two commands start the copyright page.
50: @page
51: @vskip 0pt plus 1filll
1.17 anton 52: Copyright @copyright{} 1994 Gforth Development Group
1.1 anton 53:
54: @comment !! Published by ... or You can get a copy of this manual ...
55:
56: Permission is granted to make and distribute verbatim copies of
57: this manual provided the copyright notice and this permission notice
58: are preserved on all copies.
59:
60: Permission is granted to copy and distribute modified versions of this
61: manual under the conditions for verbatim copying, provided also that the
62: sections entitled "Distribution" and "General Public License" are
63: included exactly as in the original, and provided that the entire
64: resulting derived work is distributed under the terms of a permission
65: notice identical to this one.
66:
67: Permission is granted to copy and distribute translations of this manual
68: into another language, under the above conditions for modified versions,
69: except that the sections entitled "Distribution" and "General Public
70: License" may be included in a translation approved by the author instead
71: of in the original English.
72: @end titlepage
73:
74:
75: @node Top, License, (dir), (dir)
76: @ifinfo
1.17 anton 77: Gforth is a free implementation of ANS Forth available on many
1.1 anton 78: personal machines. This manual corresponds to version 0.0.
79: @end ifinfo
80:
81: @menu
1.4 anton 82: * License::
1.17 anton 83: * Goals:: About the Gforth Project
1.4 anton 84: * Other Books:: Things you might want to read
1.17 anton 85: * Invocation:: Starting Gforth
86: * Words:: Forth words available in Gforth
1.4 anton 87: * ANS conformance:: Implementation-defined options etc.
1.17 anton 88: * Model:: The abstract machine of Gforth
89: * Emacs and Gforth:: The Gforth Mode
1.4 anton 90: * Internals:: Implementation details
91: * Bugs:: How to report them
1.17 anton 92: * Pedigree:: Ancestors of Gforth
1.4 anton 93: * Word Index:: An item for each Forth word
94: * Node Index:: An item for each node
1.1 anton 95: @end menu
96:
97: @node License, Goals, Top, Top
98: @unnumbered License
99: !! Insert GPL here
100:
101: @iftex
102: @unnumbered Preface
1.17 anton 103: This manual documents Gforth. The reader is expected to know
1.1 anton 104: Forth. This manual is primarily a reference manual. @xref{Other Books}
105: for introductory material.
106: @end iftex
107:
108: @node Goals, Other Books, License, Top
109: @comment node-name, next, previous, up
1.17 anton 110: @chapter Goals of Gforth
1.1 anton 111: @cindex Goals
1.17 anton 112: The goal of the Gforth Project is to develop a standard model for
1.1 anton 113: ANSI Forth. This can be split into several subgoals:
114:
115: @itemize @bullet
116: @item
1.17 anton 117: Gforth should conform to the ANSI Forth standard.
1.1 anton 118: @item
119: It should be a model, i.e. it should define all the
120: implementation-dependent things.
121: @item
122: It should become standard, i.e. widely accepted and used. This goal
123: is the most difficult one.
124: @end itemize
125:
1.17 anton 126: To achieve these goals Gforth should be
1.1 anton 127: @itemize @bullet
128: @item
129: Similar to previous models (fig-Forth, F83)
130: @item
131: Powerful. It should provide for all the things that are considered
132: necessary today and even some that are not yet considered necessary.
133: @item
134: Efficient. It should not get the reputation of being exceptionally
135: slow.
136: @item
137: Free.
138: @item
139: Available on many machines/easy to port.
140: @end itemize
141:
1.17 anton 142: Have we achieved these goals? Gforth conforms to the ANS Forth
143: standard. It may be considered a model, but we have not yet documented
1.1 anton 144: which parts of the model are stable and which parts we are likely to
1.17 anton 145: change. It certainly has not yet become a de facto standard. It has some
146: similarities and some differences to previous models. It has some
147: powerful features, but not yet everything that we envisioned. We
148: certainly have achieved our execution speed goals (@pxref{Performance}).
149: It is free and available on many machines.
1.1 anton 150:
151: @node Other Books, Invocation, Goals, Top
152: @chapter Other books on ANS Forth
153:
154: As the standard is relatively new, there are not many books out yet. It
1.17 anton 155: is not recommended to learn Forth by using Gforth and a book that is
1.1 anton 156: not written for ANS Forth, as you will not know your mistakes from the
157: deviations of the book.
158:
159: There is, of course, the standard, the definite reference if you want to
1.19 ! anton 160: write ANS Forth programs. It is available in printed form from the
! 161: National Standards Institute Sales Department (Tel.: USA (212) 642-4900;
! 162: Fax.: USA (212) 302-1286) as document @cite{X3.215-1994} for about $200. You
! 163: can also get it from Global Engineering Documents (Tel.: USA (800)
! 164: 854-7179; Fax.: (303) 843-9880) for about $300.
! 165:
! 166: @cite{dpANS6}, the last draft of the standard, which was then submitted to ANSI
! 167: for publication is available electronically and for free in some MS Word
! 168: format, and it has been converted to HTML. Some pointers to these
! 169: versions can be found through
! 170: http://www.complang.tuwien.ac.at/projects/forth.html.
1.1 anton 171:
172: @cite{Forth: The new model} by Jack Woehr (!! Publisher) is an
173: introductory book based on a draft version of the standard. It does not
174: cover the whole standard. It also contains interesting background
175: information (Jack Woehr was in the ANS Forth Technical Committe). It is
176: not appropriate for complete newbies, but programmers experienced in
177: other languages should find it ok.
178:
179: @node Invocation, Words, Other Books, Top
180: @chapter Invocation
181:
182: You will usually just say @code{gforth}. In many other cases the default
1.17 anton 183: Gforth image will be invoked like this:
1.1 anton 184:
185: @example
186: gforth [files] [-e forth-code]
187: @end example
188:
189: executing the contents of the files and the Forth code in the order they
190: are given.
191:
192: In general, the command line looks like this:
193:
194: @example
195: gforth [initialization options] [image-specific options]
196: @end example
197:
198: The initialization options must come before the rest of the command
199: line. They are:
200:
201: @table @code
202: @item --image-file @var{file}
203: Loads the Forth image @var{file} instead of the default
204: @file{gforth.fi}.
205:
206: @item --path @var{path}
207: Uses @var{path} for searching the image file and Forth source code
208: files instead of the default in the environment variable
209: @code{GFORTHPATH} or the path specified at installation time (typically
210: @file{/usr/local/lib/gforth:.}). A path is given as a @code{:}-separated
211: list.
212:
213: @item --dictionary-size @var{size}
214: @item -m @var{size}
215: Allocate @var{size} space for the Forth dictionary space instead of
216: using the default specified in the image (typically 256K). The
217: @var{size} specification consists of an integer and a unit (e.g.,
218: @code{4M}). The unit can be one of @code{b} (bytes), @code{e} (element
219: size, in this case Cells), @code{k} (kilobytes), and @code{M}
220: (Megabytes). If no unit is specified, @code{e} is used.
221:
222: @item --data-stack-size @var{size}
223: @item -d @var{size}
224: Allocate @var{size} space for the data stack instead of using the
225: default specified in the image (typically 16K).
226:
227: @item --return-stack-size @var{size}
228: @item -r @var{size}
229: Allocate @var{size} space for the return stack instead of using the
230: default specified in the image (typically 16K).
231:
232: @item --fp-stack-size @var{size}
233: @item -f @var{size}
234: Allocate @var{size} space for the floating point stack instead of
235: using the default specified in the image (typically 16K). In this case
236: the unit specifier @code{e} refers to floating point numbers.
237:
238: @item --locals-stack-size @var{size}
239: @item -l @var{size}
240: Allocate @var{size} space for the locals stack instead of using the
241: default specified in the image (typically 16K).
242:
243: @end table
244:
245: As explained above, the image-specific command-line arguments for the
246: default image @file{gforth.fi} consist of a sequence of filenames and
247: @code{-e @var{forth-code}} options that are interpreted in the seqence
248: in which they are given. The @code{-e @var{forth-code}} or
249: @code{--evaluate @var{forth-code}} option evaluates the forth
250: code. This option takes only one argument; if you want to evaluate more
251: Forth words, you have to quote them or use several @code{-e}s. To exit
252: after processing the command line (instead of entering interactive mode)
253: append @code{-e bye} to the command line.
254:
255: Not yet implemented:
256: On startup the system first executes the system initialization file
257: (unless the option @code{--no-init-file} is given; note that the system
258: resulting from using this option may not be ANS Forth conformant). Then
259: the user initialization file @file{.gforth.fs} is executed, unless the
260: option @code{--no-rc} is given; this file is first searched in @file{.},
261: then in @file{~}, then in the normal path (see above).
262:
1.4 anton 263: @node Words, ANS conformance, Invocation, Top
1.1 anton 264: @chapter Forth Words
265:
266: @menu
1.4 anton 267: * Notation::
268: * Arithmetic::
269: * Stack Manipulation::
270: * Memory access::
271: * Control Structures::
272: * Locals::
273: * Defining Words::
274: * Wordlists::
275: * Files::
276: * Blocks::
277: * Other I/O::
278: * Programming Tools::
1.18 anton 279: * Assembler and Code words::
1.4 anton 280: * Threading Words::
1.1 anton 281: @end menu
282:
283: @node Notation, Arithmetic, Words, Words
284: @section Notation
285:
286: The Forth words are described in this section in the glossary notation
287: that has become a de-facto standard for Forth texts, i.e.
288:
1.4 anton 289: @format
1.1 anton 290: @var{word} @var{Stack effect} @var{wordset} @var{pronunciation}
1.4 anton 291: @end format
1.1 anton 292: @var{Description}
293:
294: @table @var
295: @item word
1.17 anton 296: The name of the word. BTW, Gforth is case insensitive, so you can
1.14 anton 297: type the words in in lower case (However, @pxref{core-idef}).
1.1 anton 298:
299: @item Stack effect
300: The stack effect is written in the notation @code{@var{before} --
301: @var{after}}, where @var{before} and @var{after} describe the top of
302: stack entries before and after the execution of the word. The rest of
303: the stack is not touched by the word. The top of stack is rightmost,
1.17 anton 304: i.e., a stack sequence is written as it is typed in. Note that Gforth
1.1 anton 305: uses a separate floating point stack, but a unified stack
306: notation. Also, return stack effects are not shown in @var{stack
307: effect}, but in @var{Description}. The name of a stack item describes
308: the type and/or the function of the item. See below for a discussion of
309: the types.
310:
1.19 ! anton 311: All words have two stack effects: A compile-time stack effect and a
! 312: run-time stack effect. The compile-time stack-effect of most words is
! 313: @var{ -- }. If the compile-time stack-effect of a word deviates from
! 314: this standard behaviour, or the word does other unusual things at
! 315: compile time, both stack effects are shown; otherwise only the run-time
! 316: stack effect is shown.
! 317:
1.1 anton 318: @item pronunciation
319: How the word is pronounced
320:
321: @item wordset
322: The ANS Forth standard is divided into several wordsets. A standard
323: system need not support all of them. So, the fewer wordsets your program
324: uses the more portable it will be in theory. However, we suspect that
325: most ANS Forth systems on personal machines will feature all
326: wordsets. Words that are not defined in the ANS standard have
1.19 ! anton 327: @code{gforth} or @code{gforth-internal} as wordset. @code{gforth}
! 328: describes words that will work in future releases of Gforth;
! 329: @code{gforth-internal} words are more volatile. Environmental query
! 330: strings are also displayed like words; you can recognize them by the
! 331: @code{environment} in the wordset field.
1.1 anton 332:
333: @item Description
334: A description of the behaviour of the word.
335: @end table
336:
1.4 anton 337: The type of a stack item is specified by the character(s) the name
338: starts with:
1.1 anton 339:
340: @table @code
341: @item f
342: Bool, i.e. @code{false} or @code{true}.
343: @item c
344: Char
345: @item w
346: Cell, can contain an integer or an address
347: @item n
348: signed integer
349: @item u
350: unsigned integer
351: @item d
352: double sized signed integer
353: @item ud
354: double sized unsigned integer
355: @item r
356: Float
357: @item a_
358: Cell-aligned address
359: @item c_
360: Char-aligned address (note that a Char is two bytes in Windows NT)
361: @item f_
362: Float-aligned address
363: @item df_
364: Address aligned for IEEE double precision float
365: @item sf_
366: Address aligned for IEEE single precision float
367: @item xt
368: Execution token, same size as Cell
369: @item wid
370: Wordlist ID, same size as Cell
371: @item f83name
372: Pointer to a name structure
373: @end table
374:
1.4 anton 375: @node Arithmetic, Stack Manipulation, Notation, Words
1.1 anton 376: @section Arithmetic
377: Forth arithmetic is not checked, i.e., you will not hear about integer
378: overflow on addition or multiplication, you may hear about division by
379: zero if you are lucky. The operator is written after the operands, but
380: the operands are still in the original order. I.e., the infix @code{2-1}
381: corresponds to @code{2 1 -}. Forth offers a variety of division
382: operators. If you perform division with potentially negative operands,
383: you do not want to use @code{/} or @code{/mod} with its undefined
384: behaviour, but rather @code{fm/mod} or @code{sm/mod} (probably the
1.4 anton 385: former, @pxref{Mixed precision}).
386:
387: @menu
388: * Single precision::
389: * Bitwise operations::
390: * Mixed precision:: operations with single and double-cell integers
391: * Double precision:: Double-cell integer arithmetic
392: * Floating Point::
393: @end menu
1.1 anton 394:
1.4 anton 395: @node Single precision, Bitwise operations, Arithmetic, Arithmetic
1.1 anton 396: @subsection Single precision
397: doc-+
398: doc--
399: doc-*
400: doc-/
401: doc-mod
402: doc-/mod
403: doc-negate
404: doc-abs
405: doc-min
406: doc-max
407:
1.4 anton 408: @node Bitwise operations, Mixed precision, Single precision, Arithmetic
1.1 anton 409: @subsection Bitwise operations
410: doc-and
411: doc-or
412: doc-xor
413: doc-invert
414: doc-2*
415: doc-2/
416:
1.4 anton 417: @node Mixed precision, Double precision, Bitwise operations, Arithmetic
1.1 anton 418: @subsection Mixed precision
419: doc-m+
420: doc-*/
421: doc-*/mod
422: doc-m*
423: doc-um*
424: doc-m*/
425: doc-um/mod
426: doc-fm/mod
427: doc-sm/rem
428:
1.4 anton 429: @node Double precision, Floating Point, Mixed precision, Arithmetic
1.1 anton 430: @subsection Double precision
1.16 anton 431:
432: The outer (aka text) interpreter converts numbers containing a dot into
433: a double precision number. Note that only numbers with the dot as last
434: character are standard-conforming.
435:
1.1 anton 436: doc-d+
437: doc-d-
438: doc-dnegate
439: doc-dabs
440: doc-dmin
441: doc-dmax
442:
1.4 anton 443: @node Floating Point, , Double precision, Arithmetic
444: @subsection Floating Point
1.16 anton 445:
446: The format of floating point numbers recognized by the outer (aka text)
447: interpreter is: a signed decimal number, possibly containing a decimal
448: point (@code{.}), followed by @code{E} or @code{e}, optionally followed
449: by a signed integer (the exponent). E.g., @code{1e} ist the same as
450: @code{+1.0e+1}. Note that a number without @code{e}
451: is not interpreted as floating-point number, but as double (if the
452: number contains a @code{.}) or single precision integer. Also,
453: conversions between string and floating point numbers always use base
454: 10, irrespective of the value of @code{BASE}. If @code{BASE} contains a
455: value greater then 14, the @code{E} may be interpreted as digit and the
456: number will be interpreted as integer, unless it has a signed exponent
457: (both @code{+} and @code{-} are allowed as signs).
1.4 anton 458:
459: Angles in floating point operations are given in radians (a full circle
1.17 anton 460: has 2 pi radians). Note, that Gforth has a separate floating point
1.4 anton 461: stack, but we use the unified notation.
462:
463: Floating point numbers have a number of unpleasant surprises for the
464: unwary (e.g., floating point addition is not associative) and even a few
465: for the wary. You should not use them unless you know what you are doing
466: or you don't care that the results you get are totally bogus. If you
467: want to learn about the problems of floating point numbers (and how to
1.11 anton 468: avoid them), you might start with @cite{David Goldberg, What Every
1.6 anton 469: Computer Scientist Should Know About Floating-Point Arithmetic, ACM
470: Computing Surveys 23(1):5@minus{}48, March 1991}.
1.4 anton 471:
472: doc-f+
473: doc-f-
474: doc-f*
475: doc-f/
476: doc-fnegate
477: doc-fabs
478: doc-fmax
479: doc-fmin
480: doc-floor
481: doc-fround
482: doc-f**
483: doc-fsqrt
484: doc-fexp
485: doc-fexpm1
486: doc-fln
487: doc-flnp1
488: doc-flog
1.6 anton 489: doc-falog
1.4 anton 490: doc-fsin
491: doc-fcos
492: doc-fsincos
493: doc-ftan
494: doc-fasin
495: doc-facos
496: doc-fatan
497: doc-fatan2
498: doc-fsinh
499: doc-fcosh
500: doc-ftanh
501: doc-fasinh
502: doc-facosh
503: doc-fatanh
504:
505: @node Stack Manipulation, Memory access, Arithmetic, Words
1.1 anton 506: @section Stack Manipulation
507:
1.17 anton 508: Gforth has a data stack (aka parameter stack) for characters, cells,
1.1 anton 509: addresses, and double cells, a floating point stack for floating point
510: numbers, a return stack for storing the return addresses of colon
511: definitions and other data, and a locals stack for storing local
512: variables. Note that while every sane Forth has a separate floating
513: point stack, this is not strictly required; an ANS Forth system could
514: theoretically keep floating point numbers on the data stack. As an
515: additional difficulty, you don't know how many cells a floating point
516: number takes. It is reportedly possible to write words in a way that
517: they work also for a unified stack model, but we do not recommend trying
1.4 anton 518: it. Instead, just say that your program has an environmental dependency
519: on a separate FP stack.
520:
521: Also, a Forth system is allowed to keep the local variables on the
1.1 anton 522: return stack. This is reasonable, as local variables usually eliminate
523: the need to use the return stack explicitly. So, if you want to produce
524: a standard complying program and if you are using local variables in a
525: word, forget about return stack manipulations in that word (see the
526: standard document for the exact rules).
527:
1.4 anton 528: @menu
529: * Data stack::
530: * Floating point stack::
531: * Return stack::
532: * Locals stack::
533: * Stack pointer manipulation::
534: @end menu
535:
536: @node Data stack, Floating point stack, Stack Manipulation, Stack Manipulation
1.1 anton 537: @subsection Data stack
538: doc-drop
539: doc-nip
540: doc-dup
541: doc-over
542: doc-tuck
543: doc-swap
544: doc-rot
545: doc--rot
546: doc-?dup
547: doc-pick
548: doc-roll
549: doc-2drop
550: doc-2nip
551: doc-2dup
552: doc-2over
553: doc-2tuck
554: doc-2swap
555: doc-2rot
556:
1.4 anton 557: @node Floating point stack, Return stack, Data stack, Stack Manipulation
1.1 anton 558: @subsection Floating point stack
559: doc-fdrop
560: doc-fnip
561: doc-fdup
562: doc-fover
563: doc-ftuck
564: doc-fswap
565: doc-frot
566:
1.4 anton 567: @node Return stack, Locals stack, Floating point stack, Stack Manipulation
1.1 anton 568: @subsection Return stack
569: doc->r
570: doc-r>
571: doc-r@
572: doc-rdrop
573: doc-2>r
574: doc-2r>
575: doc-2r@
576: doc-2rdrop
577:
1.4 anton 578: @node Locals stack, Stack pointer manipulation, Return stack, Stack Manipulation
1.1 anton 579: @subsection Locals stack
580:
1.4 anton 581: @node Stack pointer manipulation, , Locals stack, Stack Manipulation
1.1 anton 582: @subsection Stack pointer manipulation
583: doc-sp@
584: doc-sp!
585: doc-fp@
586: doc-fp!
587: doc-rp@
588: doc-rp!
589: doc-lp@
590: doc-lp!
591:
1.4 anton 592: @node Memory access, Control Structures, Stack Manipulation, Words
1.1 anton 593: @section Memory access
594:
1.4 anton 595: @menu
596: * Stack-Memory transfers::
597: * Address arithmetic::
598: * Memory block access::
599: @end menu
600:
601: @node Stack-Memory transfers, Address arithmetic, Memory access, Memory access
1.1 anton 602: @subsection Stack-Memory transfers
603:
604: doc-@
605: doc-!
606: doc-+!
607: doc-c@
608: doc-c!
609: doc-2@
610: doc-2!
611: doc-f@
612: doc-f!
613: doc-sf@
614: doc-sf!
615: doc-df@
616: doc-df!
617:
1.4 anton 618: @node Address arithmetic, Memory block access, Stack-Memory transfers, Memory access
1.1 anton 619: @subsection Address arithmetic
620:
621: ANS Forth does not specify the sizes of the data types. Instead, it
622: offers a number of words for computing sizes and doing address
623: arithmetic. Basically, address arithmetic is performed in terms of
624: address units (aus); on most systems the address unit is one byte. Note
625: that a character may have more than one au, so @code{chars} is no noop
626: (on systems where it is a noop, it compiles to nothing).
627:
628: ANS Forth also defines words for aligning addresses for specific
629: addresses. Many computers require that accesses to specific data types
630: must only occur at specific addresses; e.g., that cells may only be
631: accessed at addresses divisible by 4. Even if a machine allows unaligned
632: accesses, it can usually perform aligned accesses faster.
633:
1.17 anton 634: For the performance-conscious: alignment operations are usually only
1.1 anton 635: necessary during the definition of a data structure, not during the
636: (more frequent) accesses to it.
637:
638: ANS Forth defines no words for character-aligning addresses. This is not
639: an oversight, but reflects the fact that addresses that are not
640: char-aligned have no use in the standard and therefore will not be
641: created.
642:
643: The standard guarantees that addresses returned by @code{CREATE}d words
1.17 anton 644: are cell-aligned; in addition, Gforth guarantees that these addresses
1.1 anton 645: are aligned for all purposes.
646:
1.9 anton 647: Note that the standard defines a word @code{char}, which has nothing to
648: do with address arithmetic.
649:
1.1 anton 650: doc-chars
651: doc-char+
652: doc-cells
653: doc-cell+
654: doc-align
655: doc-aligned
656: doc-floats
657: doc-float+
658: doc-falign
659: doc-faligned
660: doc-sfloats
661: doc-sfloat+
662: doc-sfalign
663: doc-sfaligned
664: doc-dfloats
665: doc-dfloat+
666: doc-dfalign
667: doc-dfaligned
1.10 anton 668: doc-maxalign
669: doc-maxaligned
670: doc-cfalign
671: doc-cfaligned
1.1 anton 672: doc-address-unit-bits
673:
1.4 anton 674: @node Memory block access, , Address arithmetic, Memory access
1.1 anton 675: @subsection Memory block access
676:
677: doc-move
678: doc-erase
679:
680: While the previous words work on address units, the rest works on
681: characters.
682:
683: doc-cmove
684: doc-cmove>
685: doc-fill
686: doc-blank
687:
1.4 anton 688: @node Control Structures, Locals, Memory access, Words
1.1 anton 689: @section Control Structures
690:
691: Control structures in Forth cannot be used in interpret state, only in
692: compile state, i.e., in a colon definition. We do not like this
693: limitation, but have not seen a satisfying way around it yet, although
694: many schemes have been proposed.
695:
1.4 anton 696: @menu
697: * Selection::
698: * Simple Loops::
699: * Counted Loops::
700: * Arbitrary control structures::
701: * Calls and returns::
702: * Exception Handling::
703: @end menu
704:
705: @node Selection, Simple Loops, Control Structures, Control Structures
1.1 anton 706: @subsection Selection
707:
708: @example
709: @var{flag}
710: IF
711: @var{code}
712: ENDIF
713: @end example
714: or
715: @example
716: @var{flag}
717: IF
718: @var{code1}
719: ELSE
720: @var{code2}
721: ENDIF
722: @end example
723:
1.4 anton 724: You can use @code{THEN} instead of @code{ENDIF}. Indeed, @code{THEN} is
1.1 anton 725: standard, and @code{ENDIF} is not, although it is quite popular. We
726: recommend using @code{ENDIF}, because it is less confusing for people
727: who also know other languages (and is not prone to reinforcing negative
728: prejudices against Forth in these people). Adding @code{ENDIF} to a
729: system that only supplies @code{THEN} is simple:
730: @example
731: : endif POSTPONE then ; immediate
732: @end example
733:
734: [According to @cite{Webster's New Encyclopedic Dictionary}, @dfn{then
735: (adv.)} has the following meanings:
736: @quotation
737: ... 2b: following next after in order ... 3d: as a necessary consequence
738: (if you were there, then you saw them).
739: @end quotation
740: Forth's @code{THEN} has the meaning 2b, whereas @code{THEN} in Pascal
741: and many other programming languages has the meaning 3d.]
742:
743: We also provide the words @code{?dup-if} and @code{?dup-0=-if}, so you
744: can avoid using @code{?dup}.
745:
746: @example
747: @var{n}
748: CASE
749: @var{n1} OF @var{code1} ENDOF
750: @var{n2} OF @var{code2} ENDOF
1.4 anton 751: @dots{}
1.1 anton 752: ENDCASE
753: @end example
754:
755: Executes the first @var{codei}, where the @var{ni} is equal to
756: @var{n}. A default case can be added by simply writing the code after
757: the last @code{ENDOF}. It may use @var{n}, which is on top of the stack,
758: but must not consume it.
759:
1.4 anton 760: @node Simple Loops, Counted Loops, Selection, Control Structures
1.1 anton 761: @subsection Simple Loops
762:
763: @example
764: BEGIN
765: @var{code1}
766: @var{flag}
767: WHILE
768: @var{code2}
769: REPEAT
770: @end example
771:
772: @var{code1} is executed and @var{flag} is computed. If it is true,
773: @var{code2} is executed and the loop is restarted; If @var{flag} is false, execution continues after the @code{REPEAT}.
774:
775: @example
776: BEGIN
777: @var{code}
778: @var{flag}
779: UNTIL
780: @end example
781:
782: @var{code} is executed. The loop is restarted if @code{flag} is false.
783:
784: @example
785: BEGIN
786: @var{code}
787: AGAIN
788: @end example
789:
790: This is an endless loop.
791:
1.4 anton 792: @node Counted Loops, Arbitrary control structures, Simple Loops, Control Structures
1.1 anton 793: @subsection Counted Loops
794:
795: The basic counted loop is:
796: @example
797: @var{limit} @var{start}
798: ?DO
799: @var{body}
800: LOOP
801: @end example
802:
803: This performs one iteration for every integer, starting from @var{start}
804: and up to, but excluding @var{limit}. The counter, aka index, can be
805: accessed with @code{i}. E.g., the loop
806: @example
807: 10 0 ?DO
808: i .
809: LOOP
810: @end example
811: prints
812: @example
813: 0 1 2 3 4 5 6 7 8 9
814: @end example
815: The index of the innermost loop can be accessed with @code{i}, the index
816: of the next loop with @code{j}, and the index of the third loop with
817: @code{k}.
818:
819: The loop control data are kept on the return stack, so there are some
820: restrictions on mixing return stack accesses and counted loop
821: words. E.g., if you put values on the return stack outside the loop, you
822: cannot read them inside the loop. If you put values on the return stack
823: within a loop, you have to remove them before the end of the loop and
824: before accessing the index of the loop.
825:
826: There are several variations on the counted loop:
827:
828: @code{LEAVE} leaves the innermost counted loop immediately.
829:
1.18 anton 830: If @var{start} is greater than @var{limit}, a @code{?DO} loop is entered
831: (and @code{LOOP} iterates until they become equal by wrap-around
832: arithmetic). This behaviour is usually not what you want. Therefore,
833: Gforth offers @code{+DO} and @code{U+DO} (as replacements for
834: @code{?DO}), which do not enter the loop if @var{start} is greater than
835: @var{limit}; @code{+DO} is for signed loop parameters, @code{U+DO} for
836: unsigned loop parameters. These words can be implemented easily on
837: standard systems, so using them does not make your programs hard to
838: port; e.g.:
839: @example
840: : +DO ( compile-time: -- do-sys; run-time: n1 n2 -- )
841: POSTPONE over POSTPONE min POSTPONE ?DO ; immediate
842: @end example
843:
1.1 anton 844: @code{LOOP} can be replaced with @code{@var{n} +LOOP}; this updates the
845: index by @var{n} instead of by 1. The loop is terminated when the border
846: between @var{limit-1} and @var{limit} is crossed. E.g.:
847:
1.18 anton 848: @code{4 0 +DO i . 2 +LOOP} prints @code{0 2}
1.1 anton 849:
1.18 anton 850: @code{4 1 +DO i . 2 +LOOP} prints @code{1 3}
1.1 anton 851:
852: The behaviour of @code{@var{n} +LOOP} is peculiar when @var{n} is negative:
853:
1.2 anton 854: @code{-1 0 ?DO i . -1 +LOOP} prints @code{0 -1}
1.1 anton 855:
1.2 anton 856: @code{ 0 0 ?DO i . -1 +LOOP} prints nothing
1.1 anton 857:
1.18 anton 858: Therefore we recommend avoiding @code{@var{n} +LOOP} with negative
859: @var{n}. One alternative is @code{@var{u} -LOOP}, which reduces the
860: index by @var{u} each iteration. The loop is terminated when the border
861: between @var{limit+1} and @var{limit} is crossed. Gforth also provides
862: @code{-DO} and @code{U-DO} for down-counting loops. E.g.:
1.1 anton 863:
1.18 anton 864: @code{-2 0 -DO i . 1 -LOOP} prints @code{0 -1}
1.1 anton 865:
1.18 anton 866: @code{-1 0 -DO i . 1 -LOOP} prints @code{0}
1.1 anton 867:
1.18 anton 868: @code{ 0 0 -DO i . 1 -LOOP} prints nothing
1.1 anton 869:
1.18 anton 870: Another alternative is @code{@var{n} S+LOOP}, where the negative
871: case behaves symmetrical to the positive case:
1.1 anton 872:
1.18 anton 873: @code{-2 0 -DO i . -1 S+LOOP} prints @code{0 -1}
874:
875: The loop is terminated when the border between @var{limit@minus{}sgn(n)}
876: and @var{limit} is crossed. Unfortunately, neither @code{-LOOP} nor
877: @code{S+LOOP} are part of the ANS Forth standard, and they are not easy
878: to implement using standard words. If you want to write standard
879: programs, just avoid counting down.
880:
881: @code{?DO} can also be replaced by @code{DO}. @code{DO} always enters
882: the loop, independent of the loop parameters. Do not use @code{DO}, even
883: if you know that the loop is entered in any case. Such knowledge tends
884: to become invalid during maintenance of a program, and then the
885: @code{DO} will make trouble.
1.1 anton 886:
887: @code{UNLOOP} is used to prepare for an abnormal loop exit, e.g., via
888: @code{EXIT}. @code{UNLOOP} removes the loop control parameters from the
889: return stack so @code{EXIT} can get to its return address.
890:
891: Another counted loop is
892: @example
893: @var{n}
894: FOR
895: @var{body}
896: NEXT
897: @end example
898: This is the preferred loop of native code compiler writers who are too
1.17 anton 899: lazy to optimize @code{?DO} loops properly. In Gforth, this loop
1.1 anton 900: iterates @var{n+1} times; @code{i} produces values starting with @var{n}
901: and ending with 0. Other Forth systems may behave differently, even if
902: they support @code{FOR} loops.
903:
1.4 anton 904: @node Arbitrary control structures, Calls and returns, Counted Loops, Control Structures
1.2 anton 905: @subsection Arbitrary control structures
906:
907: ANS Forth permits and supports using control structures in a non-nested
908: way. Information about incomplete control structures is stored on the
909: control-flow stack. This stack may be implemented on the Forth data
1.17 anton 910: stack, and this is what we have done in Gforth.
1.2 anton 911:
912: An @i{orig} entry represents an unresolved forward branch, a @i{dest}
913: entry represents a backward branch target. A few words are the basis for
914: building any control structure possible (except control structures that
915: need storage, like calls, coroutines, and backtracking).
916:
1.3 anton 917: doc-if
918: doc-ahead
919: doc-then
920: doc-begin
921: doc-until
922: doc-again
923: doc-cs-pick
924: doc-cs-roll
1.2 anton 925:
1.17 anton 926: On many systems control-flow stack items take one word, in Gforth they
1.2 anton 927: currently take three (this may change in the future). Therefore it is a
928: really good idea to manipulate the control flow stack with
929: @code{cs-pick} and @code{cs-roll}, not with data stack manipulation
930: words.
931:
932: Some standard control structure words are built from these words:
933:
1.3 anton 934: doc-else
935: doc-while
936: doc-repeat
1.2 anton 937:
938: Counted loop words constitute a separate group of words:
939:
1.3 anton 940: doc-?do
1.18 anton 941: doc-+do
942: doc-u+do
943: doc--do
944: doc-u-do
1.3 anton 945: doc-do
946: doc-for
947: doc-loop
948: doc-s+loop
949: doc-+loop
1.18 anton 950: doc--loop
1.3 anton 951: doc-next
952: doc-leave
953: doc-?leave
954: doc-unloop
1.10 anton 955: doc-done
1.2 anton 956:
957: The standard does not allow using @code{cs-pick} and @code{cs-roll} on
958: @i{do-sys}. Our system allows it, but it's your job to ensure that for
959: every @code{?DO} etc. there is exactly one @code{UNLOOP} on any path
1.3 anton 960: through the definition (@code{LOOP} etc. compile an @code{UNLOOP} on the
961: fall-through path). Also, you have to ensure that all @code{LEAVE}s are
1.7 pazsan 962: resolved (by using one of the loop-ending words or @code{DONE}).
1.2 anton 963:
964: Another group of control structure words are
965:
1.3 anton 966: doc-case
967: doc-endcase
968: doc-of
969: doc-endof
1.2 anton 970:
971: @i{case-sys} and @i{of-sys} cannot be processed using @code{cs-pick} and
972: @code{cs-roll}.
973:
1.3 anton 974: @subsubsection Programming Style
975:
976: In order to ensure readability we recommend that you do not create
977: arbitrary control structures directly, but define new control structure
978: words for the control structure you want and use these words in your
979: program.
980:
981: E.g., instead of writing
982:
983: @example
984: begin
985: ...
986: if [ 1 cs-roll ]
987: ...
988: again then
989: @end example
990:
991: we recommend defining control structure words, e.g.,
992:
993: @example
994: : while ( dest -- orig dest )
995: POSTPONE if
996: 1 cs-roll ; immediate
997:
998: : repeat ( orig dest -- )
999: POSTPONE again
1000: POSTPONE then ; immediate
1001: @end example
1002:
1003: and then using these to create the control structure:
1004:
1005: @example
1006: begin
1007: ...
1008: while
1009: ...
1010: repeat
1011: @end example
1012:
1013: That's much easier to read, isn't it? Of course, @code{BEGIN} and
1014: @code{WHILE} are predefined, so in this example it would not be
1015: necessary to define them.
1016:
1.4 anton 1017: @node Calls and returns, Exception Handling, Arbitrary control structures, Control Structures
1.3 anton 1018: @subsection Calls and returns
1019:
1020: A definition can be called simply be writing the name of the
1.17 anton 1021: definition. When the end of the definition is reached, it returns. An
1022: earlier return can be forced using
1.3 anton 1023:
1024: doc-exit
1025:
1026: Don't forget to clean up the return stack and @code{UNLOOP} any
1027: outstanding @code{?DO}...@code{LOOP}s before @code{EXIT}ing. The
1028: primitive compiled by @code{EXIT} is
1029:
1030: doc-;s
1031:
1.4 anton 1032: @node Exception Handling, , Calls and returns, Control Structures
1.3 anton 1033: @subsection Exception Handling
1034:
1035: doc-catch
1036: doc-throw
1037:
1.4 anton 1038: @node Locals, Defining Words, Control Structures, Words
1.1 anton 1039: @section Locals
1040:
1.2 anton 1041: Local variables can make Forth programming more enjoyable and Forth
1042: programs easier to read. Unfortunately, the locals of ANS Forth are
1043: laden with restrictions. Therefore, we provide not only the ANS Forth
1044: locals wordset, but also our own, more powerful locals wordset (we
1045: implemented the ANS Forth locals wordset through our locals wordset).
1046:
1047: @menu
1.17 anton 1048: * Gforth locals::
1.4 anton 1049: * ANS Forth locals::
1.2 anton 1050: @end menu
1051:
1.17 anton 1052: @node Gforth locals, ANS Forth locals, Locals, Locals
1053: @subsection Gforth locals
1.2 anton 1054:
1055: Locals can be defined with
1056:
1057: @example
1058: @{ local1 local2 ... -- comment @}
1059: @end example
1060: or
1061: @example
1062: @{ local1 local2 ... @}
1063: @end example
1064:
1065: E.g.,
1066: @example
1067: : max @{ n1 n2 -- n3 @}
1068: n1 n2 > if
1069: n1
1070: else
1071: n2
1072: endif ;
1073: @end example
1074:
1075: The similarity of locals definitions with stack comments is intended. A
1076: locals definition often replaces the stack comment of a word. The order
1077: of the locals corresponds to the order in a stack comment and everything
1078: after the @code{--} is really a comment.
1079:
1080: This similarity has one disadvantage: It is too easy to confuse locals
1081: declarations with stack comments, causing bugs and making them hard to
1082: find. However, this problem can be avoided by appropriate coding
1083: conventions: Do not use both notations in the same program. If you do,
1084: they should be distinguished using additional means, e.g. by position.
1085:
1086: The name of the local may be preceded by a type specifier, e.g.,
1087: @code{F:} for a floating point value:
1088:
1089: @example
1090: : CX* @{ F: Ar F: Ai F: Br F: Bi -- Cr Ci @}
1091: \ complex multiplication
1092: Ar Br f* Ai Bi f* f-
1093: Ar Bi f* Ai Br f* f+ ;
1094: @end example
1095:
1.17 anton 1096: Gforth currently supports cells (@code{W:}, @code{W^}), doubles
1.2 anton 1097: (@code{D:}, @code{D^}), floats (@code{F:}, @code{F^}) and characters
1098: (@code{C:}, @code{C^}) in two flavours: a value-flavoured local (defined
1099: with @code{W:}, @code{D:} etc.) produces its value and can be changed
1100: with @code{TO}. A variable-flavoured local (defined with @code{W^} etc.)
1101: produces its address (which becomes invalid when the variable's scope is
1102: left). E.g., the standard word @code{emit} can be defined in therms of
1103: @code{type} like this:
1104:
1105: @example
1106: : emit @{ C^ char* -- @}
1107: char* 1 type ;
1108: @end example
1109:
1110: A local without type specifier is a @code{W:} local. Both flavours of
1111: locals are initialized with values from the data or FP stack.
1112:
1113: Currently there is no way to define locals with user-defined data
1114: structures, but we are working on it.
1115:
1.17 anton 1116: Gforth allows defining locals everywhere in a colon definition. This
1.7 pazsan 1117: poses the following questions:
1.2 anton 1118:
1.4 anton 1119: @menu
1120: * Where are locals visible by name?::
1.14 anton 1121: * How long do locals live?::
1.4 anton 1122: * Programming Style::
1123: * Implementation::
1124: @end menu
1125:
1.17 anton 1126: @node Where are locals visible by name?, How long do locals live?, Gforth locals, Gforth locals
1.2 anton 1127: @subsubsection Where are locals visible by name?
1128:
1129: Basically, the answer is that locals are visible where you would expect
1130: it in block-structured languages, and sometimes a little longer. If you
1131: want to restrict the scope of a local, enclose its definition in
1132: @code{SCOPE}...@code{ENDSCOPE}.
1133:
1134: doc-scope
1135: doc-endscope
1136:
1137: These words behave like control structure words, so you can use them
1138: with @code{CS-PICK} and @code{CS-ROLL} to restrict the scope in
1139: arbitrary ways.
1140:
1141: If you want a more exact answer to the visibility question, here's the
1142: basic principle: A local is visible in all places that can only be
1143: reached through the definition of the local@footnote{In compiler
1144: construction terminology, all places dominated by the definition of the
1145: local.}. In other words, it is not visible in places that can be reached
1146: without going through the definition of the local. E.g., locals defined
1147: in @code{IF}...@code{ENDIF} are visible until the @code{ENDIF}, locals
1148: defined in @code{BEGIN}...@code{UNTIL} are visible after the
1149: @code{UNTIL} (until, e.g., a subsequent @code{ENDSCOPE}).
1150:
1151: The reasoning behind this solution is: We want to have the locals
1152: visible as long as it is meaningful. The user can always make the
1153: visibility shorter by using explicit scoping. In a place that can
1154: only be reached through the definition of a local, the meaning of a
1155: local name is clear. In other places it is not: How is the local
1156: initialized at the control flow path that does not contain the
1157: definition? Which local is meant, if the same name is defined twice in
1158: two independent control flow paths?
1159:
1160: This should be enough detail for nearly all users, so you can skip the
1161: rest of this section. If you relly must know all the gory details and
1162: options, read on.
1163:
1164: In order to implement this rule, the compiler has to know which places
1165: are unreachable. It knows this automatically after @code{AHEAD},
1166: @code{AGAIN}, @code{EXIT} and @code{LEAVE}; in other cases (e.g., after
1167: most @code{THROW}s), you can use the word @code{UNREACHABLE} to tell the
1168: compiler that the control flow never reaches that place. If
1169: @code{UNREACHABLE} is not used where it could, the only consequence is
1170: that the visibility of some locals is more limited than the rule above
1171: says. If @code{UNREACHABLE} is used where it should not (i.e., if you
1172: lie to the compiler), buggy code will be produced.
1173:
1174: Another problem with this rule is that at @code{BEGIN}, the compiler
1.3 anton 1175: does not know which locals will be visible on the incoming
1176: back-edge. All problems discussed in the following are due to this
1177: ignorance of the compiler (we discuss the problems using @code{BEGIN}
1178: loops as examples; the discussion also applies to @code{?DO} and other
1.2 anton 1179: loops). Perhaps the most insidious example is:
1180: @example
1181: AHEAD
1182: BEGIN
1183: x
1184: [ 1 CS-ROLL ] THEN
1.4 anton 1185: @{ x @}
1.2 anton 1186: ...
1187: UNTIL
1188: @end example
1189:
1190: This should be legal according to the visibility rule. The use of
1191: @code{x} can only be reached through the definition; but that appears
1192: textually below the use.
1193:
1194: From this example it is clear that the visibility rules cannot be fully
1195: implemented without major headaches. Our implementation treats common
1196: cases as advertised and the exceptions are treated in a safe way: The
1197: compiler makes a reasonable guess about the locals visible after a
1198: @code{BEGIN}; if it is too pessimistic, the
1199: user will get a spurious error about the local not being defined; if the
1200: compiler is too optimistic, it will notice this later and issue a
1201: warning. In the case above the compiler would complain about @code{x}
1202: being undefined at its use. You can see from the obscure examples in
1203: this section that it takes quite unusual control structures to get the
1204: compiler into trouble, and even then it will often do fine.
1205:
1206: If the @code{BEGIN} is reachable from above, the most optimistic guess
1207: is that all locals visible before the @code{BEGIN} will also be
1208: visible after the @code{BEGIN}. This guess is valid for all loops that
1209: are entered only through the @code{BEGIN}, in particular, for normal
1210: @code{BEGIN}...@code{WHILE}...@code{REPEAT} and
1211: @code{BEGIN}...@code{UNTIL} loops and it is implemented in our
1212: compiler. When the branch to the @code{BEGIN} is finally generated by
1213: @code{AGAIN} or @code{UNTIL}, the compiler checks the guess and
1214: warns the user if it was too optimisitic:
1215: @example
1216: IF
1.4 anton 1217: @{ x @}
1.2 anton 1218: BEGIN
1219: \ x ?
1220: [ 1 cs-roll ] THEN
1221: ...
1222: UNTIL
1223: @end example
1224:
1225: Here, @code{x} lives only until the @code{BEGIN}, but the compiler
1226: optimistically assumes that it lives until the @code{THEN}. It notices
1227: this difference when it compiles the @code{UNTIL} and issues a
1228: warning. The user can avoid the warning, and make sure that @code{x}
1229: is not used in the wrong area by using explicit scoping:
1230: @example
1231: IF
1232: SCOPE
1.4 anton 1233: @{ x @}
1.2 anton 1234: ENDSCOPE
1235: BEGIN
1236: [ 1 cs-roll ] THEN
1237: ...
1238: UNTIL
1239: @end example
1240:
1241: Since the guess is optimistic, there will be no spurious error messages
1242: about undefined locals.
1243:
1244: If the @code{BEGIN} is not reachable from above (e.g., after
1245: @code{AHEAD} or @code{EXIT}), the compiler cannot even make an
1246: optimistic guess, as the locals visible after the @code{BEGIN} may be
1247: defined later. Therefore, the compiler assumes that no locals are
1.17 anton 1248: visible after the @code{BEGIN}. However, the user can use
1.2 anton 1249: @code{ASSUME-LIVE} to make the compiler assume that the same locals are
1.17 anton 1250: visible at the BEGIN as at the point where the top control-flow stack
1251: item was created.
1.2 anton 1252:
1253: doc-assume-live
1254:
1255: E.g.,
1256: @example
1.4 anton 1257: @{ x @}
1.2 anton 1258: AHEAD
1259: ASSUME-LIVE
1260: BEGIN
1261: x
1262: [ 1 CS-ROLL ] THEN
1263: ...
1264: UNTIL
1265: @end example
1266:
1267: Other cases where the locals are defined before the @code{BEGIN} can be
1268: handled by inserting an appropriate @code{CS-ROLL} before the
1269: @code{ASSUME-LIVE} (and changing the control-flow stack manipulation
1270: behind the @code{ASSUME-LIVE}).
1271:
1272: Cases where locals are defined after the @code{BEGIN} (but should be
1273: visible immediately after the @code{BEGIN}) can only be handled by
1274: rearranging the loop. E.g., the ``most insidious'' example above can be
1275: arranged into:
1276: @example
1277: BEGIN
1.4 anton 1278: @{ x @}
1.2 anton 1279: ... 0=
1280: WHILE
1281: x
1282: REPEAT
1283: @end example
1284:
1.17 anton 1285: @node How long do locals live?, Programming Style, Where are locals visible by name?, Gforth locals
1.2 anton 1286: @subsubsection How long do locals live?
1287:
1288: The right answer for the lifetime question would be: A local lives at
1289: least as long as it can be accessed. For a value-flavoured local this
1290: means: until the end of its visibility. However, a variable-flavoured
1291: local could be accessed through its address far beyond its visibility
1292: scope. Ultimately, this would mean that such locals would have to be
1293: garbage collected. Since this entails un-Forth-like implementation
1294: complexities, I adopted the same cowardly solution as some other
1295: languages (e.g., C): The local lives only as long as it is visible;
1296: afterwards its address is invalid (and programs that access it
1297: afterwards are erroneous).
1298:
1.17 anton 1299: @node Programming Style, Implementation, How long do locals live?, Gforth locals
1.2 anton 1300: @subsubsection Programming Style
1301:
1302: The freedom to define locals anywhere has the potential to change
1303: programming styles dramatically. In particular, the need to use the
1304: return stack for intermediate storage vanishes. Moreover, all stack
1305: manipulations (except @code{PICK}s and @code{ROLL}s with run-time
1306: determined arguments) can be eliminated: If the stack items are in the
1307: wrong order, just write a locals definition for all of them; then
1308: write the items in the order you want.
1309:
1310: This seems a little far-fetched and eliminating stack manipulations is
1.4 anton 1311: unlikely to become a conscious programming objective. Still, the number
1312: of stack manipulations will be reduced dramatically if local variables
1.17 anton 1313: are used liberally (e.g., compare @code{max} in @ref{Gforth locals} with
1.4 anton 1314: a traditional implementation of @code{max}).
1.2 anton 1315:
1316: This shows one potential benefit of locals: making Forth programs more
1317: readable. Of course, this benefit will only be realized if the
1318: programmers continue to honour the principle of factoring instead of
1319: using the added latitude to make the words longer.
1320:
1321: Using @code{TO} can and should be avoided. Without @code{TO},
1322: every value-flavoured local has only a single assignment and many
1323: advantages of functional languages apply to Forth. I.e., programs are
1324: easier to analyse, to optimize and to read: It is clear from the
1325: definition what the local stands for, it does not turn into something
1326: different later.
1327:
1328: E.g., a definition using @code{TO} might look like this:
1329: @example
1330: : strcmp @{ addr1 u1 addr2 u2 -- n @}
1331: u1 u2 min 0
1332: ?do
1333: addr1 c@ addr2 c@ - ?dup
1334: if
1335: unloop exit
1336: then
1337: addr1 char+ TO addr1
1338: addr2 char+ TO addr2
1339: loop
1340: u1 u2 - ;
1341: @end example
1342: Here, @code{TO} is used to update @code{addr1} and @code{addr2} at
1343: every loop iteration. @code{strcmp} is a typical example of the
1344: readability problems of using @code{TO}. When you start reading
1345: @code{strcmp}, you think that @code{addr1} refers to the start of the
1346: string. Only near the end of the loop you realize that it is something
1347: else.
1348:
1349: This can be avoided by defining two locals at the start of the loop that
1350: are initialized with the right value for the current iteration.
1351: @example
1352: : strcmp @{ addr1 u1 addr2 u2 -- n @}
1353: addr1 addr2
1354: u1 u2 min 0
1355: ?do @{ s1 s2 @}
1356: s1 c@ s2 c@ - ?dup
1357: if
1358: unloop exit
1359: then
1360: s1 char+ s2 char+
1361: loop
1362: 2drop
1363: u1 u2 - ;
1364: @end example
1365: Here it is clear from the start that @code{s1} has a different value
1366: in every loop iteration.
1367:
1.17 anton 1368: @node Implementation, , Programming Style, Gforth locals
1.2 anton 1369: @subsubsection Implementation
1370:
1.17 anton 1371: Gforth uses an extra locals stack. The most compelling reason for
1.2 anton 1372: this is that the return stack is not float-aligned; using an extra stack
1373: also eliminates the problems and restrictions of using the return stack
1374: as locals stack. Like the other stacks, the locals stack grows toward
1375: lower addresses. A few primitives allow an efficient implementation:
1376:
1377: doc-@local#
1378: doc-f@local#
1379: doc-laddr#
1380: doc-lp+!#
1381: doc-lp!
1382: doc->l
1383: doc-f>l
1384:
1385: In addition to these primitives, some specializations of these
1386: primitives for commonly occurring inline arguments are provided for
1387: efficiency reasons, e.g., @code{@@local0} as specialization of
1388: @code{@@local#} for the inline argument 0. The following compiling words
1389: compile the right specialized version, or the general version, as
1390: appropriate:
1391:
1.12 anton 1392: doc-compile-@local
1393: doc-compile-f@local
1.2 anton 1394: doc-compile-lp+!
1395:
1396: Combinations of conditional branches and @code{lp+!#} like
1397: @code{?branch-lp+!#} (the locals pointer is only changed if the branch
1398: is taken) are provided for efficiency and correctness in loops.
1399:
1400: A special area in the dictionary space is reserved for keeping the
1401: local variable names. @code{@{} switches the dictionary pointer to this
1402: area and @code{@}} switches it back and generates the locals
1403: initializing code. @code{W:} etc.@ are normal defining words. This
1404: special area is cleared at the start of every colon definition.
1405:
1.17 anton 1406: A special feature of Gforth's dictionary is used to implement the
1.2 anton 1407: definition of locals without type specifiers: every wordlist (aka
1408: vocabulary) has its own methods for searching
1.4 anton 1409: etc. (@pxref{Wordlists}). For the present purpose we defined a wordlist
1.2 anton 1410: with a special search method: When it is searched for a word, it
1411: actually creates that word using @code{W:}. @code{@{} changes the search
1412: order to first search the wordlist containing @code{@}}, @code{W:} etc.,
1413: and then the wordlist for defining locals without type specifiers.
1414:
1415: The lifetime rules support a stack discipline within a colon
1416: definition: The lifetime of a local is either nested with other locals
1417: lifetimes or it does not overlap them.
1418:
1419: At @code{BEGIN}, @code{IF}, and @code{AHEAD} no code for locals stack
1420: pointer manipulation is generated. Between control structure words
1421: locals definitions can push locals onto the locals stack. @code{AGAIN}
1422: is the simplest of the other three control flow words. It has to
1423: restore the locals stack depth of the corresponding @code{BEGIN}
1424: before branching. The code looks like this:
1425: @format
1426: @code{lp+!#} current-locals-size @minus{} dest-locals-size
1427: @code{branch} <begin>
1428: @end format
1429:
1430: @code{UNTIL} is a little more complicated: If it branches back, it
1431: must adjust the stack just like @code{AGAIN}. But if it falls through,
1432: the locals stack must not be changed. The compiler generates the
1433: following code:
1434: @format
1435: @code{?branch-lp+!#} <begin> current-locals-size @minus{} dest-locals-size
1436: @end format
1437: The locals stack pointer is only adjusted if the branch is taken.
1438:
1439: @code{THEN} can produce somewhat inefficient code:
1440: @format
1441: @code{lp+!#} current-locals-size @minus{} orig-locals-size
1442: <orig target>:
1443: @code{lp+!#} orig-locals-size @minus{} new-locals-size
1444: @end format
1445: The second @code{lp+!#} adjusts the locals stack pointer from the
1.4 anton 1446: level at the @var{orig} point to the level after the @code{THEN}. The
1.2 anton 1447: first @code{lp+!#} adjusts the locals stack pointer from the current
1448: level to the level at the orig point, so the complete effect is an
1449: adjustment from the current level to the right level after the
1450: @code{THEN}.
1451:
1452: In a conventional Forth implementation a dest control-flow stack entry
1453: is just the target address and an orig entry is just the address to be
1454: patched. Our locals implementation adds a wordlist to every orig or dest
1455: item. It is the list of locals visible (or assumed visible) at the point
1456: described by the entry. Our implementation also adds a tag to identify
1457: the kind of entry, in particular to differentiate between live and dead
1458: (reachable and unreachable) orig entries.
1459:
1460: A few unusual operations have to be performed on locals wordlists:
1461:
1462: doc-common-list
1463: doc-sub-list?
1464: doc-list-size
1465:
1466: Several features of our locals wordlist implementation make these
1467: operations easy to implement: The locals wordlists are organised as
1468: linked lists; the tails of these lists are shared, if the lists
1469: contain some of the same locals; and the address of a name is greater
1470: than the address of the names behind it in the list.
1471:
1472: Another important implementation detail is the variable
1473: @code{dead-code}. It is used by @code{BEGIN} and @code{THEN} to
1474: determine if they can be reached directly or only through the branch
1475: that they resolve. @code{dead-code} is set by @code{UNREACHABLE},
1476: @code{AHEAD}, @code{EXIT} etc., and cleared at the start of a colon
1477: definition, by @code{BEGIN} and usually by @code{THEN}.
1478:
1479: Counted loops are similar to other loops in most respects, but
1480: @code{LEAVE} requires special attention: It performs basically the same
1481: service as @code{AHEAD}, but it does not create a control-flow stack
1482: entry. Therefore the information has to be stored elsewhere;
1483: traditionally, the information was stored in the target fields of the
1484: branches created by the @code{LEAVE}s, by organizing these fields into a
1485: linked list. Unfortunately, this clever trick does not provide enough
1486: space for storing our extended control flow information. Therefore, we
1487: introduce another stack, the leave stack. It contains the control-flow
1488: stack entries for all unresolved @code{LEAVE}s.
1489:
1490: Local names are kept until the end of the colon definition, even if
1491: they are no longer visible in any control-flow path. In a few cases
1492: this may lead to increased space needs for the locals name area, but
1493: usually less than reclaiming this space would cost in code size.
1494:
1495:
1.17 anton 1496: @node ANS Forth locals, , Gforth locals, Locals
1.2 anton 1497: @subsection ANS Forth locals
1498:
1499: The ANS Forth locals wordset does not define a syntax for locals, but
1500: words that make it possible to define various syntaxes. One of the
1.17 anton 1501: possible syntaxes is a subset of the syntax we used in the Gforth locals
1.2 anton 1502: wordset, i.e.:
1503:
1504: @example
1505: @{ local1 local2 ... -- comment @}
1506: @end example
1507: or
1508: @example
1509: @{ local1 local2 ... @}
1510: @end example
1511:
1512: The order of the locals corresponds to the order in a stack comment. The
1513: restrictions are:
1.1 anton 1514:
1.2 anton 1515: @itemize @bullet
1516: @item
1.17 anton 1517: Locals can only be cell-sized values (no type specifiers are allowed).
1.2 anton 1518: @item
1519: Locals can be defined only outside control structures.
1520: @item
1521: Locals can interfere with explicit usage of the return stack. For the
1522: exact (and long) rules, see the standard. If you don't use return stack
1.17 anton 1523: accessing words in a definition using locals, you will be all right. The
1.2 anton 1524: purpose of this rule is to make locals implementation on the return
1525: stack easier.
1526: @item
1527: The whole definition must be in one line.
1528: @end itemize
1529:
1530: Locals defined in this way behave like @code{VALUE}s
1.4 anton 1531: (@xref{Values}). I.e., they are initialized from the stack. Using their
1.2 anton 1532: name produces their value. Their value can be changed using @code{TO}.
1533:
1.17 anton 1534: Since this syntax is supported by Gforth directly, you need not do
1.2 anton 1535: anything to use it. If you want to port a program using this syntax to
1536: another ANS Forth system, use @file{anslocal.fs} to implement the syntax
1537: on the other system.
1538:
1539: Note that a syntax shown in the standard, section A.13 looks
1540: similar, but is quite different in having the order of locals
1541: reversed. Beware!
1542:
1543: The ANS Forth locals wordset itself consists of the following word
1544:
1545: doc-(local)
1546:
1547: The ANS Forth locals extension wordset defines a syntax, but it is so
1548: awful that we strongly recommend not to use it. We have implemented this
1.17 anton 1549: syntax to make porting to Gforth easy, but do not document it here. The
1.2 anton 1550: problem with this syntax is that the locals are defined in an order
1551: reversed with respect to the standard stack comment notation, making
1552: programs harder to read, and easier to misread and miswrite. The only
1553: merit of this syntax is that it is easy to implement using the ANS Forth
1554: locals wordset.
1.3 anton 1555:
1.4 anton 1556: @node Defining Words, Wordlists, Locals, Words
1557: @section Defining Words
1558:
1.14 anton 1559: @menu
1560: * Values::
1561: @end menu
1562:
1.4 anton 1563: @node Values, , Defining Words, Defining Words
1564: @subsection Values
1565:
1566: @node Wordlists, Files, Defining Words, Words
1567: @section Wordlists
1568:
1569: @node Files, Blocks, Wordlists, Words
1570: @section Files
1571:
1572: @node Blocks, Other I/O, Files, Words
1573: @section Blocks
1574:
1575: @node Other I/O, Programming Tools, Blocks, Words
1576: @section Other I/O
1577:
1.18 anton 1578: @node Programming Tools, Assembler and Code words, Other I/O, Words
1.4 anton 1579: @section Programming Tools
1580:
1.5 anton 1581: @menu
1582: * Debugging:: Simple and quick.
1583: * Assertions:: Making your programs self-checking.
1584: @end menu
1585:
1586: @node Debugging, Assertions, Programming Tools, Programming Tools
1.4 anton 1587: @subsection Debugging
1588:
1589: The simple debugging aids provided in @file{debugging.fs}
1590: are meant to support a different style of debugging than the
1591: tracing/stepping debuggers used in languages with long turn-around
1592: times.
1593:
1594: A much better (faster) way in fast-compilig languages is to add
1595: printing code at well-selected places, let the program run, look at
1596: the output, see where things went wrong, add more printing code, etc.,
1597: until the bug is found.
1598:
1599: The word @code{~~} is easy to insert. It just prints debugging
1600: information (by default the source location and the stack contents). It
1601: is also easy to remove (@kbd{C-x ~} in the Emacs Forth mode to
1602: query-replace them with nothing). The deferred words
1603: @code{printdebugdata} and @code{printdebugline} control the output of
1604: @code{~~}. The default source location output format works well with
1605: Emacs' compilation mode, so you can step through the program at the
1.5 anton 1606: source level using @kbd{C-x `} (the advantage over a stepping debugger
1607: is that you can step in any direction and you know where the crash has
1608: happened or where the strange data has occurred).
1.4 anton 1609:
1610: Note that the default actions clobber the contents of the pictured
1611: numeric output string, so you should not use @code{~~}, e.g., between
1612: @code{<#} and @code{#>}.
1613:
1614: doc-~~
1615: doc-printdebugdata
1616: doc-printdebugline
1617:
1.5 anton 1618: @node Assertions, , Debugging, Programming Tools
1.4 anton 1619: @subsection Assertions
1620:
1.5 anton 1621: It is a good idea to make your programs self-checking, in particular, if
1622: you use an assumption (e.g., that a certain field of a data structure is
1.17 anton 1623: never zero) that may become wrong during maintenance. Gforth supports
1.5 anton 1624: assertions for this purpose. They are used like this:
1625:
1626: @example
1627: assert( @var{flag} )
1628: @end example
1629:
1630: The code between @code{assert(} and @code{)} should compute a flag, that
1631: should be true if everything is alright and false otherwise. It should
1632: not change anything else on the stack. The overall stack effect of the
1633: assertion is @code{( -- )}. E.g.
1634:
1635: @example
1636: assert( 1 1 + 2 = ) \ what we learn in school
1637: assert( dup 0<> ) \ assert that the top of stack is not zero
1638: assert( false ) \ this code should not be reached
1639: @end example
1640:
1641: The need for assertions is different at different times. During
1642: debugging, we want more checking, in production we sometimes care more
1643: for speed. Therefore, assertions can be turned off, i.e., the assertion
1644: becomes a comment. Depending on the importance of an assertion and the
1645: time it takes to check it, you may want to turn off some assertions and
1.17 anton 1646: keep others turned on. Gforth provides several levels of assertions for
1.5 anton 1647: this purpose:
1648:
1649: doc-assert0(
1650: doc-assert1(
1651: doc-assert2(
1652: doc-assert3(
1653: doc-assert(
1654: doc-)
1655:
1656: @code{Assert(} is the same as @code{assert1(}. The variable
1657: @code{assert-level} specifies the highest assertions that are turned
1658: on. I.e., at the default @code{assert-level} of one, @code{assert0(} and
1659: @code{assert1(} assertions perform checking, while @code{assert2(} and
1660: @code{assert3(} assertions are treated as comments.
1661:
1662: Note that the @code{assert-level} is evaluated at compile-time, not at
1663: run-time. I.e., you cannot turn assertions on or off at run-time, you
1664: have to set the @code{assert-level} appropriately before compiling a
1665: piece of code. You can compile several pieces of code at several
1666: @code{assert-level}s (e.g., a trusted library at level 1 and newly
1667: written code at level 3).
1668:
1669: doc-assert-level
1670:
1671: If an assertion fails, a message compatible with Emacs' compilation mode
1672: is produced and the execution is aborted (currently with @code{ABORT"}.
1673: If there is interest, we will introduce a special throw code. But if you
1674: intend to @code{catch} a specific condition, using @code{throw} is
1675: probably more appropriate than an assertion).
1676:
1.18 anton 1677: @node Assembler and Code words, Threading Words, Programming Tools, Words
1678: @section Assembler and Code words
1679:
1680: Gforth provides some words for defining primitives (words written in
1681: machine code), and for defining the the machine-code equivalent of
1682: @code{DOES>}-based defining words. However, the machine-independent
1683: nature of Gforth poses a few problems: First of all. Gforth runs on
1684: several architectures, so it can provide no standard assembler. What's
1685: worse is that the register allocation not only depends on the processor,
1686: but also on the gcc version and options used.
1687:
1688: The words Gforth offers encapsulate some system dependences (e.g., the
1689: header structure), so a system-independent assembler may be used in
1690: Gforth. If you do not have an assembler, you can compile machine code
1691: directly with @code{,} and @code{c,}.
1692:
1693: doc-assembler
1694: doc-code
1695: doc-end-code
1696: doc-;code
1697: doc-flush-icache
1698:
1699: If @code{flush-icache} does not work correctly, @code{code} words
1700: etc. will not work (reliably), either.
1701:
1702: These words are rarely used. Therefore they reside in @code{code.fs},
1703: which is usually not loaded (except @code{flush-icache}, which is always
1.19 ! anton 1704: present). You can load them with @code{require code.fs}.
1.18 anton 1705:
1706: Another option for implementing normal and defining words efficiently
1707: is: adding the wanted functionality to the source of Gforth. For normal
1708: words you just have to edit @file{primitives}, defining words (for fast
1709: defined words) probably require changes in @file{engine.c},
1710: @file{kernal.fs}, @file{prims2x.fs}, and possibly @file{cross.fs}.
1711:
1712:
1713: @node Threading Words, , Assembler and Code words, Words
1.4 anton 1714: @section Threading Words
1715:
1716: These words provide access to code addresses and other threading stuff
1.17 anton 1717: in Gforth (and, possibly, other interpretive Forths). It more or less
1.4 anton 1718: abstracts away the differences between direct and indirect threading
1719: (and, for direct threading, the machine dependences). However, at
1720: present this wordset is still inclomplete. It is also pretty low-level;
1721: some day it will hopefully be made unnecessary by an internals words set
1722: that abstracts implementation details away completely.
1723:
1724: doc->code-address
1725: doc->does-code
1726: doc-code-address!
1727: doc-does-code!
1728: doc-does-handler!
1729: doc-/does-handler
1730:
1.18 anton 1731: The code addresses produced by various defining words are produced by
1732: the following words:
1.14 anton 1733:
1.18 anton 1734: doc-docol:
1735: doc-docon:
1736: doc-dovar:
1737: doc-douser:
1738: doc-dodefer:
1739: doc-dofield:
1740:
1741: Currently there is no installation-independent way for recogizing words
1742: defined by a @code{CREATE}...@code{DOES>} word; however, once you know
1743: that a word is defined by a @code{CREATE}...@code{DOES>} word, you can
1744: use @code{>DOES-CODE}.
1.14 anton 1745:
1.4 anton 1746: @node ANS conformance, Model, Words, Top
1747: @chapter ANS conformance
1748:
1.17 anton 1749: To the best of our knowledge, Gforth is an
1.14 anton 1750:
1.15 anton 1751: ANS Forth System
1752: @itemize
1753: @item providing the Core Extensions word set
1754: @item providing the Block word set
1755: @item providing the Block Extensions word set
1756: @item providing the Double-Number word set
1757: @item providing the Double-Number Extensions word set
1758: @item providing the Exception word set
1759: @item providing the Exception Extensions word set
1760: @item providing the Facility word set
1761: @item providing @code{MS} and @code{TIME&DATE} from the Facility Extensions word set
1762: @item providing the File Access word set
1763: @item providing the File Access Extensions word set
1764: @item providing the Floating-Point word set
1765: @item providing the Floating-Point Extensions word set
1766: @item providing the Locals word set
1767: @item providing the Locals Extensions word set
1768: @item providing the Memory-Allocation word set
1769: @item providing the Memory-Allocation Extensions word set (that one's easy)
1770: @item providing the Programming-Tools word set
1.18 anton 1771: @item providing @code{;code}, @code{AHEAD}, @code{ASSEMBLER}, @code{BYE}, @code{CODE}, @code{CS-PICK}, @code{CS-ROLL}, @code{STATE}, @code{[ELSE]}, @code{[IF]}, @code{[THEN]} from the Programming-Tools Extensions word set
1.15 anton 1772: @item providing the Search-Order word set
1773: @item providing the Search-Order Extensions word set
1774: @item providing the String word set
1775: @item providing the String Extensions word set (another easy one)
1776: @end itemize
1777:
1778: In addition, ANS Forth systems are required to document certain
1779: implementation choices. This chapter tries to meet these
1780: requirements. In many cases it gives a way to ask the system for the
1781: information instead of providing the information directly, in
1782: particular, if the information depends on the processor, the operating
1783: system or the installation options chosen, or if they are likely to
1.17 anton 1784: change during the maintenance of Gforth.
1.15 anton 1785:
1.14 anton 1786: @comment The framework for the rest has been taken from pfe.
1787:
1788: @menu
1789: * The Core Words::
1790: * The optional Block word set::
1791: * The optional Double Number word set::
1792: * The optional Exception word set::
1793: * The optional Facility word set::
1794: * The optional File-Access word set::
1795: * The optional Floating-Point word set::
1796: * The optional Locals word set::
1797: * The optional Memory-Allocation word set::
1798: * The optional Programming-Tools word set::
1799: * The optional Search-Order word set::
1800: @end menu
1801:
1802:
1803: @c =====================================================================
1804: @node The Core Words, The optional Block word set, ANS conformance, ANS conformance
1805: @comment node-name, next, previous, up
1806: @section The Core Words
1807: @c =====================================================================
1808:
1809: @menu
1.15 anton 1810: * core-idef:: Implementation Defined Options
1811: * core-ambcond:: Ambiguous Conditions
1812: * core-other:: Other System Documentation
1.14 anton 1813: @end menu
1814:
1815: @c ---------------------------------------------------------------------
1816: @node core-idef, core-ambcond, The Core Words, The Core Words
1817: @subsection Implementation Defined Options
1818: @c ---------------------------------------------------------------------
1819:
1820: @table @i
1821:
1822: @item (Cell) aligned addresses:
1.17 anton 1823: processor-dependent. Gforth's alignment words perform natural alignment
1.14 anton 1824: (e.g., an address aligned for a datum of size 8 is divisible by
1825: 8). Unaligned accesses usually result in a @code{-23 THROW}.
1826:
1827: @item @code{EMIT} and non-graphic characters:
1828: The character is output using the C library function (actually, macro)
1829: @code{putchar}.
1830:
1831: @item character editing of @code{ACCEPT} and @code{EXPECT}:
1832: This is modeled on the GNU readline library (@pxref{Readline
1833: Interaction, , Command Line Editing, readline, The GNU Readline
1834: Library}) with Emacs-like key bindings. @kbd{Tab} deviates a little by
1835: producing a full word completion every time you type it (instead of
1836: producing the common prefix of all completions).
1837:
1838: @item character set:
1839: The character set of your computer and display device. Gforth is
1840: 8-bit-clean (but some other component in your system may make trouble).
1841:
1842: @item Character-aligned address requirements:
1843: installation-dependent. Currently a character is represented by a C
1844: @code{unsigned char}; in the future we might switch to @code{wchar_t}
1845: (Comments on that requested).
1846:
1847: @item character-set extensions and matching of names:
1.17 anton 1848: Any character except the ASCII NUL charcter can be used in a
1849: name. Matching is case-insensitive. The matching is performed using the
1850: C function @code{strncasecmp}, whose function is probably influenced by
1851: the locale. E.g., the @code{C} locale does not know about accents and
1.14 anton 1852: umlauts, so they are matched case-sensitively in that locale. For
1853: portability reasons it is best to write programs such that they work in
1854: the @code{C} locale. Then one can use libraries written by a Polish
1855: programmer (who might use words containing ISO Latin-2 encoded
1856: characters) and by a French programmer (ISO Latin-1) in the same program
1857: (of course, @code{WORDS} will produce funny results for some of the
1858: words (which ones, depends on the font you are using)). Also, the locale
1859: you prefer may not be available in other operating systems. Hopefully,
1860: Unicode will solve these problems one day.
1861:
1862: @item conditions under which control characters match a space delimiter:
1863: If @code{WORD} is called with the space character as a delimiter, all
1864: white-space characters (as identified by the C macro @code{isspace()})
1865: are delimiters. @code{PARSE}, on the other hand, treats space like other
1866: delimiters. @code{PARSE-WORD} treats space like @code{WORD}, but behaves
1867: like @code{PARSE} otherwise. @code{(NAME)}, which is used by the outer
1868: interpreter (aka text interpreter) by default, treats all white-space
1869: characters as delimiters.
1870:
1871: @item format of the control flow stack:
1872: The data stack is used as control flow stack. The size of a control flow
1873: stack item in cells is given by the constant @code{cs-item-size}. At the
1874: time of this writing, an item consists of a (pointer to a) locals list
1875: (third), an address in the code (second), and a tag for identifying the
1876: item (TOS). The following tags are used: @code{defstart},
1877: @code{live-orig}, @code{dead-orig}, @code{dest}, @code{do-dest},
1878: @code{scopestart}.
1879:
1880: @item conversion of digits > 35
1881: The characters @code{[\]^_'} are the digits with the decimal value
1882: 36@minus{}41. There is no way to input many of the larger digits.
1883:
1884: @item display after input terminates in @code{ACCEPT} and @code{EXPECT}:
1885: The cursor is moved to the end of the entered string. If the input is
1886: terminated using the @kbd{Return} key, a space is typed.
1887:
1888: @item exception abort sequence of @code{ABORT"}:
1889: The error string is stored into the variable @code{"error} and a
1890: @code{-2 throw} is performed.
1891:
1892: @item input line terminator:
1893: For interactive input, @kbd{C-m} and @kbd{C-j} terminate lines. One of
1894: these characters is typically produced when you type the @kbd{Enter} or
1895: @kbd{Return} key.
1896:
1897: @item maximum size of a counted string:
1898: @code{s" /counted-string" environment? drop .}. Currently 255 characters
1899: on all ports, but this may change.
1900:
1901: @item maximum size of a parsed string:
1902: Given by the constant @code{/line}. Currently 255 characters.
1903:
1904: @item maximum size of a definition name, in characters:
1905: 31
1906:
1907: @item maximum string length for @code{ENVIRONMENT?}, in characters:
1908: 31
1909:
1910: @item method of selecting the user input device:
1.17 anton 1911: The user input device is the standard input. There is currently no way to
1912: change it from within Gforth. However, the input can typically be
1913: redirected in the command line that starts Gforth.
1.14 anton 1914:
1915: @item method of selecting the user output device:
1916: The user output device is the standard output. It cannot be redirected
1.17 anton 1917: from within Gforth, but typically from the command line that starts
1918: Gforth. Gforth uses buffered output, so output on a terminal does not
1.14 anton 1919: become visible before the next newline or buffer overflow. Output on
1920: non-terminals is invisible until the buffer overflows.
1921:
1922: @item methods of dictionary compilation:
1.17 anton 1923: What are we expected to document here?
1.14 anton 1924:
1925: @item number of bits in one address unit:
1926: @code{s" address-units-bits" environment? drop .}. 8 in all current
1927: ports.
1928:
1929: @item number representation and arithmetic:
1930: Processor-dependent. Binary two's complement on all current ports.
1931:
1932: @item ranges for integer types:
1933: Installation-dependent. Make environmental queries for @code{MAX-N},
1934: @code{MAX-U}, @code{MAX-D} and @code{MAX-UD}. The lower bounds for
1935: unsigned (and positive) types is 0. The lower bound for signed types on
1936: two's complement and one's complement machines machines can be computed
1937: by adding 1 to the upper bound.
1938:
1939: @item read-only data space regions:
1940: The whole Forth data space is writable.
1941:
1942: @item size of buffer at @code{WORD}:
1943: @code{PAD HERE - .}. 104 characters on 32-bit machines. The buffer is
1944: shared with the pictured numeric output string. If overwriting
1945: @code{PAD} is acceptable, it is as large as the remaining dictionary
1946: space, although only as much can be sensibly used as fits in a counted
1947: string.
1948:
1949: @item size of one cell in address units:
1950: @code{1 cells .}.
1951:
1952: @item size of one character in address units:
1953: @code{1 chars .}. 1 on all current ports.
1954:
1955: @item size of the keyboard terminal buffer:
1956: Varies. You can determine the size at a specific time using @code{lp@
1957: tib - .}. It is shared with the locals stack and TIBs of files that
1958: include the current file. You can change the amount of space for TIBs
1.17 anton 1959: and locals stack at Gforth startup with the command line option
1.14 anton 1960: @code{-l}.
1961:
1962: @item size of the pictured numeric output buffer:
1963: @code{PAD HERE - .}. 104 characters on 32-bit machines. The buffer is
1964: shared with @code{WORD}.
1965:
1966: @item size of the scratch area returned by @code{PAD}:
1967: The remainder of dictionary space. You can even use the unused part of
1968: the data stack space. The current size can be computed with @code{sp@
1969: pad - .}.
1970:
1971: @item system case-sensitivity characteristics:
1972: Dictionary searches are case insensitive. However, as explained above
1973: under @i{character-set extensions}, the matching for non-ASCII
1974: characters is determined by the locale you are using. In the default
1975: @code{C} locale all non-ASCII characters are matched case-sensitively.
1976:
1977: @item system prompt:
1978: @code{ ok} in interpret state, @code{ compiled} in compile state.
1979:
1980: @item division rounding:
1981: installation dependent. @code{s" floored" environment? drop .}. We leave
1982: the choice to gcc (what to use for @code{/}) and to you (whether to use
1983: @code{fm/mod}, @code{sm/rem} or simply @code{/}).
1984:
1985: @item values of @code{STATE} when true:
1986: -1.
1987:
1988: @item values returned after arithmetic overflow:
1989: On two's complement machines, arithmetic is performed modulo
1990: 2**bits-per-cell for single arithmetic and 4**bits-per-cell for double
1991: arithmetic (with appropriate mapping for signed types). Division by zero
1992: typically results in a @code{-55 throw} (floatingpoint unidentified
1993: fault), although a @code{-10 throw} (divide by zero) would be more
1994: appropriate.
1995:
1996: @item whether the current definition can be found after @t{DOES>}:
1997: No.
1998:
1999: @end table
2000:
2001: @c ---------------------------------------------------------------------
2002: @node core-ambcond, core-other, core-idef, The Core Words
2003: @subsection Ambiguous conditions
2004: @c ---------------------------------------------------------------------
2005:
2006: @table @i
2007:
2008: @item a name is neither a word nor a number:
2009: @code{-13 throw} (Undefined word)
2010:
2011: @item a definition name exceeds the maximum length allowed:
2012: @code{-19 throw} (Word name too long)
2013:
2014: @item addressing a region not inside the various data spaces of the forth system:
2015: The stacks, code space and name space are accessible. Machine code space is
2016: typically readable. Accessing other addresses gives results dependent on
2017: the operating system. On decent systems: @code{-9 throw} (Invalid memory
2018: address).
2019:
2020: @item argument type incompatible with parameter:
2021: This is usually not caught. Some words perform checks, e.g., the control
2022: flow words, and issue a @code{ABORT"} or @code{-12 THROW} (Argument type
2023: mismatch).
2024:
2025: @item attempting to obtain the execution token of a word with undefined execution semantics:
2026: You get an execution token representing the compilation semantics
2027: instead.
2028:
2029: @item dividing by zero:
2030: typically results in a @code{-55 throw} (floating point unidentified
2031: fault), although a @code{-10 throw} (divide by zero) would be more
2032: appropriate.
2033:
2034: @item insufficient data stack or return stack space:
2035: Not checked. This typically results in mysterious illegal memory
2036: accesses, producing @code{-9 throw} (Invalid memory address) or
2037: @code{-23 throw} (Address alignment exception).
2038:
2039: @item insufficient space for loop control parameters:
2040: like other return stack overflows.
2041:
2042: @item insufficient space in the dictionary:
2043: Not checked. Similar results as stack overflows. However, typically the
2044: error appears at a different place when one inserts or removes code.
2045:
2046: @item interpreting a word with undefined interpretation semantics:
2047: For some words, we defined interpretation semantics. For the others:
2048: @code{-14 throw} (Interpreting a compile-only word). Note that this is
2049: checked only by the outer (aka text) interpreter; if the word is
2050: @code{execute}d in some other way, it will typically perform it's
2051: compilation semantics even in interpret state. (We could change @code{'}
2052: and relatives not to give the xt of such words, but we think that would
2053: be too restrictive).
2054:
2055: @item modifying the contents of the input buffer or a string literal:
2056: These are located in writable memory and can be modified.
2057:
2058: @item overflow of the pictured numeric output string:
2059: Not checked.
2060:
2061: @item parsed string overflow:
2062: @code{PARSE} cannot overflow. @code{WORD} does not check for overflow.
2063:
2064: @item producing a result out of range:
2065: On two's complement machines, arithmetic is performed modulo
2066: 2**bits-per-cell for single arithmetic and 4**bits-per-cell for double
2067: arithmetic (with appropriate mapping for signed types). Division by zero
2068: typically results in a @code{-55 throw} (floatingpoint unidentified
2069: fault), although a @code{-10 throw} (divide by zero) would be more
2070: appropriate. @code{convert} and @code{>number} currently overflow
2071: silently.
2072:
2073: @item reading from an empty data or return stack:
2074: The data stack is checked by the outer (aka text) interpreter after
2075: every word executed. If it has underflowed, a @code{-4 throw} (Stack
2076: underflow) is performed. Apart from that, the stacks are not checked and
2077: underflows can result in similar behaviour as overflows (of adjacent
2078: stacks).
2079:
2080: @item unexepected end of the input buffer, resulting in an attempt to use a zero-length string as a name:
2081: @code{Create} and its descendants perform a @code{-16 throw} (Attempt to
2082: use zero-length string as a name). Words like @code{'} probably will not
2083: find what they search. Note that it is possible to create zero-length
2084: names with @code{nextname} (should it not?).
2085:
2086: @item @code{>IN} greater than input buffer:
2087: The next invocation of a parsing word returns a string wih length 0.
2088:
2089: @item @code{RECURSE} appears after @code{DOES>}:
2090: Compiles a recursive call to the defining word not to the defined word.
2091:
2092: @item argument input source different than current input source for @code{RESTORE-INPUT}:
2093: !!???If the argument input source is a valid input source then it gets
1.19 ! anton 2094: restored. Otherwise causes @code{-12 THROW}, which, unless caught, issues
1.14 anton 2095: the message "argument type mismatch" and aborts.
2096:
2097: @item data space containing definitions gets de-allocated:
2098: Deallocation with @code{allot} is not checked. This typically resuls in
2099: memory access faults or execution of illegal instructions.
2100:
2101: @item data space read/write with incorrect alignment:
2102: Processor-dependent. Typically results in a @code{-23 throw} (Address
2103: alignment exception). Under Linux on a 486 or later processor with
2104: alignment turned on, incorrect alignment results in a @code{-9 throw}
2105: (Invalid memory address). There are reportedly some processors with
2106: alignment restrictions that do not report them.
2107:
2108: @item data space pointer not properly aligned, @code{,}, @code{C,}:
2109: Like other alignment errors.
2110:
2111: @item less than u+2 stack items (@code{PICK} and @code{ROLL}):
2112: Not checked. May cause an illegal memory access.
2113:
2114: @item loop control parameters not available:
2115: Not checked. The counted loop words simply assume that the top of return
2116: stack items are loop control parameters and behave accordingly.
2117:
2118: @item most recent definition does not have a name (@code{IMMEDIATE}):
2119: @code{abort" last word was headerless"}.
2120:
2121: @item name not defined by @code{VALUE} used by @code{TO}:
2122: @code{-32 throw} (Invalid name argument)
2123:
1.15 anton 2124: @item name not found (@code{'}, @code{POSTPONE}, @code{[']}, @code{[COMPILE]}):
1.14 anton 2125: @code{-13 throw} (Undefined word)
2126:
2127: @item parameters are not of the same type (@code{DO}, @code{?DO}, @code{WITHIN}):
2128: Gforth behaves as if they were of the same type. I.e., you can predict
2129: the behaviour by interpreting all parameters as, e.g., signed.
2130:
2131: @item @code{POSTPONE} or @code{[COMPILE]} applied to @code{TO}:
2132: Assume @code{: X POSTPONE TO ; IMMEDIATE}. @code{X} is equivalent to
2133: @code{TO}.
2134:
2135: @item String longer than a counted string returned by @code{WORD}:
2136: Not checked. The string will be ok, but the count will, of course,
2137: contain only the least significant bits of the length.
2138:
1.15 anton 2139: @item u greater than or equal to the number of bits in a cell (@code{LSHIFT}, @code{RSHIFT}):
1.14 anton 2140: Processor-dependent. Typical behaviours are returning 0 and using only
2141: the low bits of the shift count.
2142:
2143: @item word not defined via @code{CREATE}:
2144: @code{>BODY} produces the PFA of the word no matter how it was defined.
2145:
2146: @code{DOES>} changes the execution semantics of the last defined word no
2147: matter how it was defined. E.g., @code{CONSTANT DOES>} is equivalent to
2148: @code{CREATE , DOES>}.
2149:
2150: @item words improperly used outside @code{<#} and @code{#>}:
2151: Not checked. As usual, you can expect memory faults.
2152:
2153: @end table
2154:
2155:
2156: @c ---------------------------------------------------------------------
2157: @node core-other, , core-ambcond, The Core Words
2158: @subsection Other system documentation
2159: @c ---------------------------------------------------------------------
2160:
2161: @table @i
2162:
2163: @item nonstandard words using @code{PAD}:
2164: None.
2165:
2166: @item operator's terminal facilities available:
2167: !!??
2168:
2169: @item program data space available:
2170: @code{sp@ here - .} gives the space remaining for dictionary and data
2171: stack together.
2172:
2173: @item return stack space available:
2174: !!??
2175:
2176: @item stack space available:
2177: @code{sp@ here - .} gives the space remaining for dictionary and data
2178: stack together.
2179:
2180: @item system dictionary space required, in address units:
2181: Type @code{here forthstart - .} after startup. At the time of this
2182: writing, this gives 70108 (bytes) on a 32-bit system.
2183: @end table
2184:
2185:
2186: @c =====================================================================
2187: @node The optional Block word set, The optional Double Number word set, The Core Words, ANS conformance
2188: @section The optional Block word set
2189: @c =====================================================================
2190:
2191: @menu
1.15 anton 2192: * block-idef:: Implementation Defined Options
2193: * block-ambcond:: Ambiguous Conditions
2194: * block-other:: Other System Documentation
1.14 anton 2195: @end menu
2196:
2197:
2198: @c ---------------------------------------------------------------------
2199: @node block-idef, block-ambcond, The optional Block word set, The optional Block word set
2200: @subsection Implementation Defined Options
2201: @c ---------------------------------------------------------------------
2202:
2203: @table @i
2204:
2205: @item the format for display by @code{LIST}:
2206: First the screen number is displayed, then 16 lines of 64 characters,
2207: each line preceded by the line number.
2208:
2209: @item the length of a line affected by @code{\}:
2210: 64 characters.
2211: @end table
2212:
2213:
2214: @c ---------------------------------------------------------------------
2215: @node block-ambcond, block-other, block-idef, The optional Block word set
2216: @subsection Ambiguous conditions
2217: @c ---------------------------------------------------------------------
2218:
2219: @table @i
2220:
2221: @item correct block read was not possible:
2222: Typically results in a @code{throw} of some OS-derived value (between
2223: -512 and -2048). If the blocks file was just not long enough, blanks are
2224: supplied for the missing portion.
2225:
2226: @item I/O exception in block transfer:
2227: Typically results in a @code{throw} of some OS-derived value (between
2228: -512 and -2048).
2229:
2230: @item invalid block number:
2231: @code{-35 throw} (Invalid block number)
2232:
2233: @item a program directly alters the contents of @code{BLK}:
2234: The input stream is switched to that other block, at the same
2235: position. If the storing to @code{BLK} happens when interpreting
2236: non-block input, the system will get quite confused when the block ends.
2237:
2238: @item no current block buffer for @code{UPDATE}:
2239: @code{UPDATE} has no effect.
2240:
2241: @end table
2242:
2243:
2244: @c ---------------------------------------------------------------------
2245: @node block-other, , block-ambcond, The optional Block word set
2246: @subsection Other system documentation
2247: @c ---------------------------------------------------------------------
2248:
2249: @table @i
2250:
2251: @item any restrictions a multiprogramming system places on the use of buffer addresses:
2252: No restrictions (yet).
2253:
2254: @item the number of blocks available for source and data:
2255: depends on your disk space.
2256:
2257: @end table
2258:
2259:
2260: @c =====================================================================
2261: @node The optional Double Number word set, The optional Exception word set, The optional Block word set, ANS conformance
2262: @section The optional Double Number word set
2263: @c =====================================================================
2264:
2265: @menu
1.15 anton 2266: * double-ambcond:: Ambiguous Conditions
1.14 anton 2267: @end menu
2268:
2269:
2270: @c ---------------------------------------------------------------------
1.15 anton 2271: @node double-ambcond, , The optional Double Number word set, The optional Double Number word set
1.14 anton 2272: @subsection Ambiguous conditions
2273: @c ---------------------------------------------------------------------
2274:
2275: @table @i
2276:
1.15 anton 2277: @item @var{d} outside of range of @var{n} in @code{D>S}:
1.14 anton 2278: The least significant cell of @var{d} is produced.
2279:
2280: @end table
2281:
2282:
2283: @c =====================================================================
2284: @node The optional Exception word set, The optional Facility word set, The optional Double Number word set, ANS conformance
2285: @section The optional Exception word set
2286: @c =====================================================================
2287:
2288: @menu
1.15 anton 2289: * exception-idef:: Implementation Defined Options
1.14 anton 2290: @end menu
2291:
2292:
2293: @c ---------------------------------------------------------------------
1.15 anton 2294: @node exception-idef, , The optional Exception word set, The optional Exception word set
1.14 anton 2295: @subsection Implementation Defined Options
2296: @c ---------------------------------------------------------------------
2297:
2298: @table @i
2299: @item @code{THROW}-codes used in the system:
2300: The codes -256@minus{}-511 are used for reporting signals (see
2301: @file{errore.fs}). The codes -512@minus{}-2047 are used for OS errors
2302: (for file and memory allocation operations). The mapping from OS error
2303: numbers to throw code is -512@minus{}@var{errno}. One side effect of
2304: this mapping is that undefined OS errors produce a message with a
2305: strange number; e.g., @code{-1000 THROW} results in @code{Unknown error
2306: 488} on my system.
2307: @end table
2308:
2309: @c =====================================================================
2310: @node The optional Facility word set, The optional File-Access word set, The optional Exception word set, ANS conformance
2311: @section The optional Facility word set
2312: @c =====================================================================
2313:
2314: @menu
1.15 anton 2315: * facility-idef:: Implementation Defined Options
2316: * facility-ambcond:: Ambiguous Conditions
1.14 anton 2317: @end menu
2318:
2319:
2320: @c ---------------------------------------------------------------------
2321: @node facility-idef, facility-ambcond, The optional Facility word set, The optional Facility word set
2322: @subsection Implementation Defined Options
2323: @c ---------------------------------------------------------------------
2324:
2325: @table @i
2326:
2327: @item encoding of keyboard events (@code{EKEY}):
2328: Not yet implemeted.
2329:
2330: @item duration of a system clock tick
2331: System dependent. With respect to @code{MS}, the time is specified in
2332: microseconds. How well the OS and the hardware implement this, is
2333: another question.
2334:
2335: @item repeatability to be expected from the execution of @code{MS}:
2336: System dependent. On Unix, a lot depends on load. If the system is
1.17 anton 2337: lightly loaded, and the delay is short enough that Gforth does not get
1.14 anton 2338: swapped out, the performance should be acceptable. Under MS-DOS and
2339: other single-tasking systems, it should be good.
2340:
2341: @end table
2342:
2343:
2344: @c ---------------------------------------------------------------------
1.15 anton 2345: @node facility-ambcond, , facility-idef, The optional Facility word set
1.14 anton 2346: @subsection Ambiguous conditions
2347: @c ---------------------------------------------------------------------
2348:
2349: @table @i
2350:
2351: @item @code{AT-XY} can't be performed on user output device:
2352: Largely terminal dependant. No range checks are done on the arguments.
2353: No errors are reported. You may see some garbage appearing, you may see
2354: simply nothing happen.
2355:
2356: @end table
2357:
2358:
2359: @c =====================================================================
2360: @node The optional File-Access word set, The optional Floating-Point word set, The optional Facility word set, ANS conformance
2361: @section The optional File-Access word set
2362: @c =====================================================================
2363:
2364: @menu
1.15 anton 2365: * file-idef:: Implementation Defined Options
2366: * file-ambcond:: Ambiguous Conditions
1.14 anton 2367: @end menu
2368:
2369:
2370: @c ---------------------------------------------------------------------
2371: @node file-idef, file-ambcond, The optional File-Access word set, The optional File-Access word set
2372: @subsection Implementation Defined Options
2373: @c ---------------------------------------------------------------------
2374:
2375: @table @i
2376:
2377: @item File access methods used:
2378: @code{R/O}, @code{R/W} and @code{BIN} work as you would
2379: expect. @code{W/O} translates into the C file opening mode @code{w} (or
2380: @code{wb}): The file is cleared, if it exists, and created, if it does
1.15 anton 2381: not (both with @code{open-file} and @code{create-file}). Under Unix
1.14 anton 2382: @code{create-file} creates a file with 666 permissions modified by your
2383: umask.
2384:
2385: @item file exceptions:
2386: The file words do not raise exceptions (except, perhaps, memory access
2387: faults when you pass illegal addresses or file-ids).
2388:
2389: @item file line terminator:
2390: System-dependent. Gforth uses C's newline character as line
2391: terminator. What the actual character code(s) of this are is
2392: system-dependent.
2393:
2394: @item file name format
2395: System dependent. Gforth just uses the file name format of your OS.
2396:
2397: @item information returned by @code{FILE-STATUS}:
2398: @code{FILE-STATUS} returns the most powerful file access mode allowed
2399: for the file: Either @code{R/O}, @code{W/O} or @code{R/W}. If the file
2400: cannot be accessed, @code{R/O BIN} is returned. @code{BIN} is applicable
2401: along with the retured mode.
2402:
2403: @item input file state after an exception when including source:
2404: All files that are left via the exception are closed.
2405:
2406: @item @var{ior} values and meaning:
1.15 anton 2407: The @var{ior}s returned by the file and memory allocation words are
2408: intended as throw codes. They typically are in the range
2409: -512@minus{}-2047 of OS errors. The mapping from OS error numbers to
2410: @var{ior}s is -512@minus{}@var{errno}.
1.14 anton 2411:
2412: @item maximum depth of file input nesting:
2413: limited by the amount of return stack, locals/TIB stack, and the number
2414: of open files available. This should not give you troubles.
2415:
2416: @item maximum size of input line:
2417: @code{/line}. Currently 255.
2418:
2419: @item methods of mapping block ranges to files:
2420: Currently, the block words automatically access the file
2421: @file{blocks.fb} in the currend working directory. More sophisticated
2422: methods could be implemented if there is demand (and a volunteer).
2423:
2424: @item number of string buffers provided by @code{S"}:
2425: 1
2426:
2427: @item size of string buffer used by @code{S"}:
2428: @code{/line}. currently 255.
2429:
2430: @end table
2431:
2432: @c ---------------------------------------------------------------------
1.15 anton 2433: @node file-ambcond, , file-idef, The optional File-Access word set
1.14 anton 2434: @subsection Ambiguous conditions
2435: @c ---------------------------------------------------------------------
2436:
2437: @table @i
2438:
2439: @item attempting to position a file outside it's boundaries:
2440: @code{REPOSITION-FILE} is performed as usual: Afterwards,
2441: @code{FILE-POSITION} returns the value given to @code{REPOSITION-FILE}.
2442:
2443: @item attempting to read from file positions not yet written:
2444: End-of-file, i.e., zero characters are read and no error is reported.
2445:
2446: @item @var{file-id} is invalid (@code{INCLUDE-FILE}):
2447: An appropriate exception may be thrown, but a memory fault or other
2448: problem is more probable.
2449:
2450: @item I/O exception reading or closing @var{file-id} (@code{include-file}, @code{included}):
2451: The @var{ior} produced by the operation, that discovered the problem, is
2452: thrown.
2453:
2454: @item named file cannot be opened (@code{included}):
2455: The @var{ior} produced by @code{open-file} is thrown.
2456:
2457: @item requesting an unmapped block number:
2458: There are no unmapped legal block numbers. On some operating systems,
2459: writing a block with a large number may overflow the file system and
2460: have an error message as consequence.
2461:
2462: @item using @code{source-id} when @code{blk} is non-zero:
2463: @code{source-id} performs its function. Typically it will give the id of
2464: the source which loaded the block. (Better ideas?)
2465:
2466: @end table
2467:
2468:
2469: @c =====================================================================
2470: @node The optional Floating-Point word set, The optional Locals word set, The optional File-Access word set, ANS conformance
1.15 anton 2471: @section The optional Floating-Point word set
1.14 anton 2472: @c =====================================================================
2473:
2474: @menu
1.15 anton 2475: * floating-idef:: Implementation Defined Options
2476: * floating-ambcond:: Ambiguous Conditions
1.14 anton 2477: @end menu
2478:
2479:
2480: @c ---------------------------------------------------------------------
2481: @node floating-idef, floating-ambcond, The optional Floating-Point word set, The optional Floating-Point word set
2482: @subsection Implementation Defined Options
2483: @c ---------------------------------------------------------------------
2484:
2485: @table @i
2486:
1.15 anton 2487: @item format and range of floating point numbers:
2488: System-dependent; the @code{double} type of C.
1.14 anton 2489:
1.15 anton 2490: @item results of @code{REPRESENT} when @var{float} is out of range:
2491: System dependent; @code{REPRESENT} is implemented using the C library
2492: function @code{ecvt()} and inherits its behaviour in this respect.
1.14 anton 2493:
1.15 anton 2494: @item rounding or truncation of floating-point numbers:
2495: What's the question?!!
1.14 anton 2496:
1.15 anton 2497: @item size of floating-point stack:
2498: @code{s" FLOATING-STACK" environment? drop .}. Can be changed at startup
2499: with the command-line option @code{-f}.
1.14 anton 2500:
1.15 anton 2501: @item width of floating-point stack:
2502: @code{1 floats}.
1.14 anton 2503:
2504: @end table
2505:
2506:
2507: @c ---------------------------------------------------------------------
1.15 anton 2508: @node floating-ambcond, , floating-idef, The optional Floating-Point word set
2509: @subsection Ambiguous conditions
1.14 anton 2510: @c ---------------------------------------------------------------------
2511:
2512: @table @i
2513:
1.15 anton 2514: @item @code{df@@} or @code{df!} used with an address that is not double-float aligned:
2515: System-dependent. Typically results in an alignment fault like other
2516: alignment violations.
1.14 anton 2517:
1.15 anton 2518: @item @code{f@@} or @code{f!} used with an address that is not float aligned:
2519: System-dependent. Typically results in an alignment fault like other
2520: alignment violations.
1.14 anton 2521:
1.15 anton 2522: @item Floating-point result out of range:
2523: System-dependent. Can result in a @code{-55 THROW} (Floating-point
2524: unidentified fault), or can produce a special value representing, e.g.,
2525: Infinity.
1.14 anton 2526:
1.15 anton 2527: @item @code{sf@@} or @code{sf!} used with an address that is not single-float aligned:
2528: System-dependent. Typically results in an alignment fault like other
2529: alignment violations.
1.14 anton 2530:
1.15 anton 2531: @item BASE is not decimal (@code{REPRESENT}, @code{F.}, @code{FE.}, @code{FS.}):
2532: The floating-point number is converted into decimal nonetheless.
1.14 anton 2533:
1.15 anton 2534: @item Both arguments are equal to zero (@code{FATAN2}):
2535: System-dependent. @code{FATAN2} is implemented using the C library
2536: function @code{atan2()}.
1.14 anton 2537:
1.15 anton 2538: @item Using ftan on an argument @var{r1} where cos(@var{r1}) is zero:
2539: System-dependent. Anyway, typically the cos of @var{r1} will not be zero
2540: because of small errors and the tan will be a very large (or very small)
2541: but finite number.
1.14 anton 2542:
1.15 anton 2543: @item @var{d} cannot be presented precisely as a float in @code{D>F}:
2544: The result is rounded to the nearest float.
1.14 anton 2545:
1.15 anton 2546: @item dividing by zero:
2547: @code{-55 throw} (Floating-point unidentified fault)
1.14 anton 2548:
1.15 anton 2549: @item exponent too big for conversion (@code{DF!}, @code{DF@@}, @code{SF!}, @code{SF@@}):
2550: System dependent. On IEEE-FP based systems the number is converted into
2551: an infinity.
1.14 anton 2552:
1.15 anton 2553: @item @var{float}<1 (@code{facosh}):
2554: @code{-55 throw} (Floating-point unidentified fault)
1.14 anton 2555:
1.15 anton 2556: @item @var{float}=<-1 (@code{flnp1}):
2557: @code{-55 throw} (Floating-point unidentified fault). On IEEE-FP systems
2558: negative infinity is typically produced for @var{float}=-1.
1.14 anton 2559:
1.15 anton 2560: @item @var{float}=<0 (@code{fln}, @code{flog}):
2561: @code{-55 throw} (Floating-point unidentified fault). On IEEE-FP systems
2562: negative infinity is typically produced for @var{float}=0.
1.14 anton 2563:
1.15 anton 2564: @item @var{float}<0 (@code{fasinh}, @code{fsqrt}):
2565: @code{-55 throw} (Floating-point unidentified fault). @code{fasinh}
2566: produces values for these inputs on my Linux box (Bug in the C library?)
1.14 anton 2567:
1.15 anton 2568: @item |@var{float}|>1 (@code{facos}, @code{fasin}, @code{fatanh}):
2569: @code{-55 throw} (Floating-point unidentified fault).
1.14 anton 2570:
1.15 anton 2571: @item integer part of float cannot be represented by @var{d} in @code{f>d}:
2572: @code{-55 throw} (Floating-point unidentified fault).
1.14 anton 2573:
1.15 anton 2574: @item string larger than pictured numeric output area (@code{f.}, @code{fe.}, @code{fs.}):
2575: This does not happen.
2576: @end table
1.14 anton 2577:
2578:
2579:
2580: @c =====================================================================
1.15 anton 2581: @node The optional Locals word set, The optional Memory-Allocation word set, The optional Floating-Point word set, ANS conformance
2582: @section The optional Locals word set
1.14 anton 2583: @c =====================================================================
2584:
2585: @menu
1.15 anton 2586: * locals-idef:: Implementation Defined Options
2587: * locals-ambcond:: Ambiguous Conditions
1.14 anton 2588: @end menu
2589:
2590:
2591: @c ---------------------------------------------------------------------
1.15 anton 2592: @node locals-idef, locals-ambcond, The optional Locals word set, The optional Locals word set
1.14 anton 2593: @subsection Implementation Defined Options
2594: @c ---------------------------------------------------------------------
2595:
2596: @table @i
2597:
1.15 anton 2598: @item maximum number of locals in a definition:
2599: @code{s" #locals" environment? drop .}. Currently 15. This is a lower
2600: bound, e.g., on a 32-bit machine there can be 41 locals of up to 8
2601: characters. The number of locals in a definition is bounded by the size
2602: of locals-buffer, which contains the names of the locals.
1.14 anton 2603:
2604: @end table
2605:
2606:
2607: @c ---------------------------------------------------------------------
1.15 anton 2608: @node locals-ambcond, , locals-idef, The optional Locals word set
1.14 anton 2609: @subsection Ambiguous conditions
2610: @c ---------------------------------------------------------------------
2611:
2612: @table @i
2613:
1.15 anton 2614: @item executing a named local in interpretation state:
2615: @code{-14 throw} (Interpreting a compile-only word).
1.14 anton 2616:
1.15 anton 2617: @item @var{name} not defined by @code{VALUE} or @code{(LOCAL)} (@code{TO}):
2618: @code{-32 throw} (Invalid name argument)
1.14 anton 2619:
2620: @end table
2621:
2622:
2623: @c =====================================================================
1.15 anton 2624: @node The optional Memory-Allocation word set, The optional Programming-Tools word set, The optional Locals word set, ANS conformance
2625: @section The optional Memory-Allocation word set
1.14 anton 2626: @c =====================================================================
2627:
2628: @menu
1.15 anton 2629: * memory-idef:: Implementation Defined Options
1.14 anton 2630: @end menu
2631:
2632:
2633: @c ---------------------------------------------------------------------
1.15 anton 2634: @node memory-idef, , The optional Memory-Allocation word set, The optional Memory-Allocation word set
1.14 anton 2635: @subsection Implementation Defined Options
2636: @c ---------------------------------------------------------------------
2637:
2638: @table @i
2639:
1.15 anton 2640: @item values and meaning of @var{ior}:
2641: The @var{ior}s returned by the file and memory allocation words are
2642: intended as throw codes. They typically are in the range
2643: -512@minus{}-2047 of OS errors. The mapping from OS error numbers to
2644: @var{ior}s is -512@minus{}@var{errno}.
1.14 anton 2645:
2646: @end table
2647:
2648: @c =====================================================================
1.15 anton 2649: @node The optional Programming-Tools word set, The optional Search-Order word set, The optional Memory-Allocation word set, ANS conformance
2650: @section The optional Programming-Tools word set
1.14 anton 2651: @c =====================================================================
2652:
2653: @menu
1.15 anton 2654: * programming-idef:: Implementation Defined Options
2655: * programming-ambcond:: Ambiguous Conditions
1.14 anton 2656: @end menu
2657:
2658:
2659: @c ---------------------------------------------------------------------
1.15 anton 2660: @node programming-idef, programming-ambcond, The optional Programming-Tools word set, The optional Programming-Tools word set
1.14 anton 2661: @subsection Implementation Defined Options
2662: @c ---------------------------------------------------------------------
2663:
2664: @table @i
2665:
1.15 anton 2666: @item ending sequence for input following @code{;code} and @code{code}:
2667: Not implemented (yet).
1.14 anton 2668:
1.15 anton 2669: @item manner of processing input following @code{;code} and @code{code}:
2670: Not implemented (yet).
2671:
2672: @item search order capability for @code{EDITOR} and @code{ASSEMBLER}:
2673: Not implemented (yet). If they were implemented, they would use the
2674: search order wordset.
2675:
2676: @item source and format of display by @code{SEE}:
2677: The source for @code{see} is the intermediate code used by the inner
2678: interpreter. The current @code{see} tries to output Forth source code
2679: as well as possible.
2680:
1.14 anton 2681: @end table
2682:
2683: @c ---------------------------------------------------------------------
1.15 anton 2684: @node programming-ambcond, , programming-idef, The optional Programming-Tools word set
1.14 anton 2685: @subsection Ambiguous conditions
2686: @c ---------------------------------------------------------------------
2687:
2688: @table @i
2689:
1.15 anton 2690: @item deleting the compilation wordlist (@code{FORGET}):
2691: Not implemented (yet).
1.14 anton 2692:
1.15 anton 2693: @item fewer than @var{u}+1 items on the control flow stack (@code{CS-PICK}, @code{CS-ROLL}):
2694: This typically results in an @code{abort"} with a descriptive error
2695: message (may change into a @code{-22 throw} (Control structure mismatch)
2696: in the future). You may also get a memory access error. If you are
2697: unlucky, this ambiguous condition is not caught.
2698:
2699: @item @var{name} can't be found (@code{forget}):
2700: Not implemented (yet).
1.14 anton 2701:
1.15 anton 2702: @item @var{name} not defined via @code{CREATE}:
2703: @code{;code} is not implemented (yet). If it were, it would behave like
2704: @code{DOES>} in this respect, i.e., change the execution semantics of
2705: the last defined word no matter how it was defined.
1.14 anton 2706:
1.15 anton 2707: @item @code{POSTPONE} applied to @code{[IF]}:
2708: After defining @code{: X POSTPONE [IF] ; IMMEDIATE}. @code{X} is
2709: equivalent to @code{[IF]}.
1.14 anton 2710:
1.15 anton 2711: @item reaching the end of the input source before matching @code{[ELSE]} or @code{[THEN]}:
2712: Continue in the same state of conditional compilation in the next outer
2713: input source. Currently there is no warning to the user about this.
1.14 anton 2714:
1.15 anton 2715: @item removing a needed definition (@code{FORGET}):
2716: Not implemented (yet).
1.14 anton 2717:
2718: @end table
2719:
2720:
2721: @c =====================================================================
1.15 anton 2722: @node The optional Search-Order word set, , The optional Programming-Tools word set, ANS conformance
2723: @section The optional Search-Order word set
1.14 anton 2724: @c =====================================================================
2725:
2726: @menu
1.15 anton 2727: * search-idef:: Implementation Defined Options
2728: * search-ambcond:: Ambiguous Conditions
1.14 anton 2729: @end menu
2730:
2731:
2732: @c ---------------------------------------------------------------------
1.15 anton 2733: @node search-idef, search-ambcond, The optional Search-Order word set, The optional Search-Order word set
1.14 anton 2734: @subsection Implementation Defined Options
2735: @c ---------------------------------------------------------------------
2736:
2737: @table @i
2738:
1.15 anton 2739: @item maximum number of word lists in search order:
2740: @code{s" wordlists" environment? drop .}. Currently 16.
2741:
2742: @item minimum search order:
2743: @code{root root}.
1.14 anton 2744:
2745: @end table
2746:
2747: @c ---------------------------------------------------------------------
1.15 anton 2748: @node search-ambcond, , search-idef, The optional Search-Order word set
1.14 anton 2749: @subsection Ambiguous conditions
2750: @c ---------------------------------------------------------------------
2751:
2752: @table @i
2753:
1.15 anton 2754: @item changing the compilation wordlist (during compilation):
2755: The definition is put into the wordlist that is the compilation wordlist
2756: when @code{REVEAL} is executed (by @code{;}, @code{DOES>},
2757: @code{RECURSIVE}, etc.).
1.14 anton 2758:
1.15 anton 2759: @item search order empty (@code{previous}):
2760: @code{abort" Vocstack empty"}.
1.14 anton 2761:
1.15 anton 2762: @item too many word lists in search order (@code{also}):
2763: @code{abort" Vocstack full"}.
1.14 anton 2764:
2765: @end table
1.13 anton 2766:
2767:
1.17 anton 2768: @node Model, Emacs and Gforth, ANS conformance, Top
1.4 anton 2769: @chapter Model
2770:
1.17 anton 2771: @node Emacs and Gforth, Internals, Model, Top
2772: @chapter Emacs and Gforth
1.4 anton 2773:
1.17 anton 2774: Gforth comes with @file{gforth.el}, an improved version of
1.4 anton 2775: @file{forth.el} by Goran Rydqvist (icluded in the TILE package). The
2776: improvements are a better (but still not perfect) handling of
2777: indentation. I have also added comment paragraph filling (@kbd{M-q}),
1.8 anton 2778: commenting (@kbd{C-x \}) and uncommenting (@kbd{C-u C-x \}) regions and
2779: removing debugging tracers (@kbd{C-x ~}, @pxref{Debugging}). I left the
2780: stuff I do not use alone, even though some of it only makes sense for
2781: TILE. To get a description of these features, enter Forth mode and type
2782: @kbd{C-h m}.
1.4 anton 2783:
1.17 anton 2784: In addition, Gforth supports Emacs quite well: The source code locations
1.4 anton 2785: given in error messages, debugging output (from @code{~~}) and failed
2786: assertion messages are in the right format for Emacs' compilation mode
2787: (@pxref{Compilation, , Running Compilations under Emacs, emacs, Emacs
2788: Manual}) so the source location corresponding to an error or other
2789: message is only a few keystrokes away (@kbd{C-x `} for the next error,
2790: @kbd{C-c C-c} for the error under the cursor).
2791:
2792: Also, if you @code{include} @file{etags.fs}, a new @file{TAGS} file
2793: (@pxref{Tags, , Tags Tables, emacs, Emacs Manual}) will be produced that
2794: contains the definitions of all words defined afterwards. You can then
2795: find the source for a word using @kbd{M-.}. Note that emacs can use
1.17 anton 2796: several tags files at the same time (e.g., one for the Gforth sources
1.4 anton 2797: and one for your program).
2798:
2799: To get all these benefits, add the following lines to your @file{.emacs}
2800: file:
2801:
2802: @example
2803: (autoload 'forth-mode "gforth.el")
2804: (setq auto-mode-alist (cons '("\\.fs\\'" . forth-mode) auto-mode-alist))
2805: @end example
2806:
1.17 anton 2807: @node Internals, Bugs, Emacs and Gforth, Top
1.3 anton 2808: @chapter Internals
2809:
1.17 anton 2810: Reading this section is not necessary for programming with Gforth. It
2811: should be helpful for finding your way in the Gforth sources.
1.3 anton 2812:
1.4 anton 2813: @menu
2814: * Portability::
2815: * Threading::
2816: * Primitives::
2817: * System Architecture::
1.17 anton 2818: * Performance::
1.4 anton 2819: @end menu
2820:
2821: @node Portability, Threading, Internals, Internals
1.3 anton 2822: @section Portability
2823:
2824: One of the main goals of the effort is availability across a wide range
2825: of personal machines. fig-Forth, and, to a lesser extent, F83, achieved
2826: this goal by manually coding the engine in assembly language for several
2827: then-popular processors. This approach is very labor-intensive and the
2828: results are short-lived due to progress in computer architecture.
2829:
2830: Others have avoided this problem by coding in C, e.g., Mitch Bradley
2831: (cforth), Mikael Patel (TILE) and Dirk Zoller (pfe). This approach is
2832: particularly popular for UNIX-based Forths due to the large variety of
2833: architectures of UNIX machines. Unfortunately an implementation in C
2834: does not mix well with the goals of efficiency and with using
2835: traditional techniques: Indirect or direct threading cannot be expressed
2836: in C, and switch threading, the fastest technique available in C, is
2837: significantly slower. Another problem with C is that it's very
2838: cumbersome to express double integer arithmetic.
2839:
2840: Fortunately, there is a portable language that does not have these
2841: limitations: GNU C, the version of C processed by the GNU C compiler
2842: (@pxref{C Extensions, , Extensions to the C Language Family, gcc.info,
2843: GNU C Manual}). Its labels as values feature (@pxref{Labels as Values, ,
2844: Labels as Values, gcc.info, GNU C Manual}) makes direct and indirect
2845: threading possible, its @code{long long} type (@pxref{Long Long, ,
2846: Double-Word Integers, gcc.info, GNU C Manual}) corresponds to Forths
2847: double numbers. GNU C is available for free on all important (and many
2848: unimportant) UNIX machines, VMS, 80386s running MS-DOS, the Amiga, and
2849: the Atari ST, so a Forth written in GNU C can run on all these
1.17 anton 2850: machines.
1.3 anton 2851:
2852: Writing in a portable language has the reputation of producing code that
2853: is slower than assembly. For our Forth engine we repeatedly looked at
2854: the code produced by the compiler and eliminated most compiler-induced
2855: inefficiencies by appropriate changes in the source-code.
2856:
2857: However, register allocation cannot be portably influenced by the
2858: programmer, leading to some inefficiencies on register-starved
2859: machines. We use explicit register declarations (@pxref{Explicit Reg
2860: Vars, , Variables in Specified Registers, gcc.info, GNU C Manual}) to
2861: improve the speed on some machines. They are turned on by using the
2862: @code{gcc} switch @code{-DFORCE_REG}. Unfortunately, this feature not
2863: only depends on the machine, but also on the compiler version: On some
2864: machines some compiler versions produce incorrect code when certain
2865: explicit register declarations are used. So by default
2866: @code{-DFORCE_REG} is not used.
2867:
1.4 anton 2868: @node Threading, Primitives, Portability, Internals
1.3 anton 2869: @section Threading
2870:
2871: GNU C's labels as values extension (available since @code{gcc-2.0},
2872: @pxref{Labels as Values, , Labels as Values, gcc.info, GNU C Manual})
2873: makes it possible to take the address of @var{label} by writing
2874: @code{&&@var{label}}. This address can then be used in a statement like
2875: @code{goto *@var{address}}. I.e., @code{goto *&&x} is the same as
2876: @code{goto x}.
2877:
2878: With this feature an indirect threaded NEXT looks like:
2879: @example
2880: cfa = *ip++;
2881: ca = *cfa;
2882: goto *ca;
2883: @end example
2884: For those unfamiliar with the names: @code{ip} is the Forth instruction
2885: pointer; the @code{cfa} (code-field address) corresponds to ANS Forths
2886: execution token and points to the code field of the next word to be
2887: executed; The @code{ca} (code address) fetched from there points to some
2888: executable code, e.g., a primitive or the colon definition handler
2889: @code{docol}.
2890:
2891: Direct threading is even simpler:
2892: @example
2893: ca = *ip++;
2894: goto *ca;
2895: @end example
2896:
2897: Of course we have packaged the whole thing neatly in macros called
2898: @code{NEXT} and @code{NEXT1} (the part of NEXT after fetching the cfa).
2899:
1.4 anton 2900: @menu
2901: * Scheduling::
2902: * Direct or Indirect Threaded?::
2903: * DOES>::
2904: @end menu
2905:
2906: @node Scheduling, Direct or Indirect Threaded?, Threading, Threading
1.3 anton 2907: @subsection Scheduling
2908:
2909: There is a little complication: Pipelined and superscalar processors,
2910: i.e., RISC and some modern CISC machines can process independent
2911: instructions while waiting for the results of an instruction. The
2912: compiler usually reorders (schedules) the instructions in a way that
2913: achieves good usage of these delay slots. However, on our first tries
2914: the compiler did not do well on scheduling primitives. E.g., for
2915: @code{+} implemented as
2916: @example
2917: n=sp[0]+sp[1];
2918: sp++;
2919: sp[0]=n;
2920: NEXT;
2921: @end example
2922: the NEXT comes strictly after the other code, i.e., there is nearly no
2923: scheduling. After a little thought the problem becomes clear: The
2924: compiler cannot know that sp and ip point to different addresses (and
1.4 anton 2925: the version of @code{gcc} we used would not know it even if it was
2926: possible), so it could not move the load of the cfa above the store to
2927: the TOS. Indeed the pointers could be the same, if code on or very near
2928: the top of stack were executed. In the interest of speed we chose to
2929: forbid this probably unused ``feature'' and helped the compiler in
2930: scheduling: NEXT is divided into the loading part (@code{NEXT_P1}) and
2931: the goto part (@code{NEXT_P2}). @code{+} now looks like:
1.3 anton 2932: @example
2933: n=sp[0]+sp[1];
2934: sp++;
2935: NEXT_P1;
2936: sp[0]=n;
2937: NEXT_P2;
2938: @end example
1.4 anton 2939: This can be scheduled optimally by the compiler.
1.3 anton 2940:
2941: This division can be turned off with the switch @code{-DCISC_NEXT}. This
2942: switch is on by default on machines that do not profit from scheduling
2943: (e.g., the 80386), in order to preserve registers.
2944:
1.4 anton 2945: @node Direct or Indirect Threaded?, DOES>, Scheduling, Threading
1.3 anton 2946: @subsection Direct or Indirect Threaded?
2947:
2948: Both! After packaging the nasty details in macro definitions we
2949: realized that we could switch between direct and indirect threading by
2950: simply setting a compilation flag (@code{-DDIRECT_THREADED}) and
2951: defining a few machine-specific macros for the direct-threading case.
2952: On the Forth level we also offer access words that hide the
2953: differences between the threading methods (@pxref{Threading Words}).
2954:
2955: Indirect threading is implemented completely
2956: machine-independently. Direct threading needs routines for creating
2957: jumps to the executable code (e.g. to docol or dodoes). These routines
2958: are inherently machine-dependent, but they do not amount to many source
2959: lines. I.e., even porting direct threading to a new machine is a small
2960: effort.
2961:
1.4 anton 2962: @node DOES>, , Direct or Indirect Threaded?, Threading
1.3 anton 2963: @subsection DOES>
2964: One of the most complex parts of a Forth engine is @code{dodoes}, i.e.,
2965: the chunk of code executed by every word defined by a
2966: @code{CREATE}...@code{DOES>} pair. The main problem here is: How to find
2967: the Forth code to be executed, i.e. the code after the @code{DOES>} (the
2968: DOES-code)? There are two solutions:
2969:
2970: In fig-Forth the code field points directly to the dodoes and the
2971: DOES-code address is stored in the cell after the code address
2972: (i.e. at cfa cell+). It may seem that this solution is illegal in the
2973: Forth-79 and all later standards, because in fig-Forth this address
2974: lies in the body (which is illegal in these standards). However, by
2975: making the code field larger for all words this solution becomes legal
2976: again. We use this approach for the indirect threaded version. Leaving
2977: a cell unused in most words is a bit wasteful, but on the machines we
2978: are targetting this is hardly a problem. The other reason for having a
2979: code field size of two cells is to avoid having different image files
1.4 anton 2980: for direct and indirect threaded systems (@pxref{System Architecture}).
1.3 anton 2981:
2982: The other approach is that the code field points or jumps to the cell
2983: after @code{DOES}. In this variant there is a jump to @code{dodoes} at
2984: this address. @code{dodoes} can then get the DOES-code address by
2985: computing the code address, i.e., the address of the jump to dodoes,
2986: and add the length of that jump field. A variant of this is to have a
2987: call to @code{dodoes} after the @code{DOES>}; then the return address
2988: (which can be found in the return register on RISCs) is the DOES-code
2989: address. Since the two cells available in the code field are usually
2990: used up by the jump to the code address in direct threading, we use
2991: this approach for direct threading. We did not want to add another
2992: cell to the code field.
2993:
1.4 anton 2994: @node Primitives, System Architecture, Threading, Internals
1.3 anton 2995: @section Primitives
2996:
1.4 anton 2997: @menu
2998: * Automatic Generation::
2999: * TOS Optimization::
3000: * Produced code::
3001: @end menu
3002:
3003: @node Automatic Generation, TOS Optimization, Primitives, Primitives
1.3 anton 3004: @subsection Automatic Generation
3005:
3006: Since the primitives are implemented in a portable language, there is no
3007: longer any need to minimize the number of primitives. On the contrary,
3008: having many primitives is an advantage: speed. In order to reduce the
3009: number of errors in primitives and to make programming them easier, we
3010: provide a tool, the primitive generator (@file{prims2x.fs}), that
3011: automatically generates most (and sometimes all) of the C code for a
3012: primitive from the stack effect notation. The source for a primitive
3013: has the following form:
3014:
3015: @format
3016: @var{Forth-name} @var{stack-effect} @var{category} [@var{pronounc.}]
3017: [@code{""}@var{glossary entry}@code{""}]
3018: @var{C code}
3019: [@code{:}
3020: @var{Forth code}]
3021: @end format
3022:
3023: The items in brackets are optional. The category and glossary fields
3024: are there for generating the documentation, the Forth code is there
3025: for manual implementations on machines without GNU C. E.g., the source
3026: for the primitive @code{+} is:
3027: @example
3028: + n1 n2 -- n core plus
3029: n = n1+n2;
3030: @end example
3031:
3032: This looks like a specification, but in fact @code{n = n1+n2} is C
3033: code. Our primitive generation tool extracts a lot of information from
3034: the stack effect notations@footnote{We use a one-stack notation, even
3035: though we have separate data and floating-point stacks; The separate
3036: notation can be generated easily from the unified notation.}: The number
3037: of items popped from and pushed on the stack, their type, and by what
3038: name they are referred to in the C code. It then generates a C code
3039: prelude and postlude for each primitive. The final C code for @code{+}
3040: looks like this:
3041:
3042: @example
3043: I_plus: /* + ( n1 n2 -- n ) */ /* label, stack effect */
3044: /* */ /* documentation */
1.4 anton 3045: @{
1.3 anton 3046: DEF_CA /* definition of variable ca (indirect threading) */
3047: Cell n1; /* definitions of variables */
3048: Cell n2;
3049: Cell n;
3050: n1 = (Cell) sp[1]; /* input */
3051: n2 = (Cell) TOS;
3052: sp += 1; /* stack adjustment */
3053: NAME("+") /* debugging output (with -DDEBUG) */
1.4 anton 3054: @{
1.3 anton 3055: n = n1+n2; /* C code taken from the source */
1.4 anton 3056: @}
1.3 anton 3057: NEXT_P1; /* NEXT part 1 */
3058: TOS = (Cell)n; /* output */
3059: NEXT_P2; /* NEXT part 2 */
1.4 anton 3060: @}
1.3 anton 3061: @end example
3062:
3063: This looks long and inefficient, but the GNU C compiler optimizes quite
3064: well and produces optimal code for @code{+} on, e.g., the R3000 and the
3065: HP RISC machines: Defining the @code{n}s does not produce any code, and
3066: using them as intermediate storage also adds no cost.
3067:
3068: There are also other optimizations, that are not illustrated by this
3069: example: Assignments between simple variables are usually for free (copy
3070: propagation). If one of the stack items is not used by the primitive
3071: (e.g. in @code{drop}), the compiler eliminates the load from the stack
3072: (dead code elimination). On the other hand, there are some things that
3073: the compiler does not do, therefore they are performed by
3074: @file{prims2x.fs}: The compiler does not optimize code away that stores
3075: a stack item to the place where it just came from (e.g., @code{over}).
3076:
3077: While programming a primitive is usually easy, there are a few cases
3078: where the programmer has to take the actions of the generator into
3079: account, most notably @code{?dup}, but also words that do not (always)
3080: fall through to NEXT.
3081:
1.4 anton 3082: @node TOS Optimization, Produced code, Automatic Generation, Primitives
1.3 anton 3083: @subsection TOS Optimization
3084:
3085: An important optimization for stack machine emulators, e.g., Forth
3086: engines, is keeping one or more of the top stack items in
1.4 anton 3087: registers. If a word has the stack effect @var{in1}...@var{inx} @code{--}
3088: @var{out1}...@var{outy}, keeping the top @var{n} items in registers
1.3 anton 3089: @itemize
3090: @item
3091: is better than keeping @var{n-1} items, if @var{x>=n} and @var{y>=n},
3092: due to fewer loads from and stores to the stack.
3093: @item is slower than keeping @var{n-1} items, if @var{x<>y} and @var{x<n} and
3094: @var{y<n}, due to additional moves between registers.
3095: @end itemize
3096:
3097: In particular, keeping one item in a register is never a disadvantage,
3098: if there are enough registers. Keeping two items in registers is a
3099: disadvantage for frequent words like @code{?branch}, constants,
3100: variables, literals and @code{i}. Therefore our generator only produces
3101: code that keeps zero or one items in registers. The generated C code
3102: covers both cases; the selection between these alternatives is made at
3103: C-compile time using the switch @code{-DUSE_TOS}. @code{TOS} in the C
3104: code for @code{+} is just a simple variable name in the one-item case,
3105: otherwise it is a macro that expands into @code{sp[0]}. Note that the
3106: GNU C compiler tries to keep simple variables like @code{TOS} in
3107: registers, and it usually succeeds, if there are enough registers.
3108:
3109: The primitive generator performs the TOS optimization for the
3110: floating-point stack, too (@code{-DUSE_FTOS}). For floating-point
3111: operations the benefit of this optimization is even larger:
3112: floating-point operations take quite long on most processors, but can be
3113: performed in parallel with other operations as long as their results are
3114: not used. If the FP-TOS is kept in a register, this works. If
3115: it is kept on the stack, i.e., in memory, the store into memory has to
3116: wait for the result of the floating-point operation, lengthening the
3117: execution time of the primitive considerably.
3118:
3119: The TOS optimization makes the automatic generation of primitives a
3120: bit more complicated. Just replacing all occurrences of @code{sp[0]} by
3121: @code{TOS} is not sufficient. There are some special cases to
3122: consider:
3123: @itemize
3124: @item In the case of @code{dup ( w -- w w )} the generator must not
3125: eliminate the store to the original location of the item on the stack,
3126: if the TOS optimization is turned on.
1.4 anton 3127: @item Primitives with stack effects of the form @code{--}
3128: @var{out1}...@var{outy} must store the TOS to the stack at the start.
3129: Likewise, primitives with the stack effect @var{in1}...@var{inx} @code{--}
1.3 anton 3130: must load the TOS from the stack at the end. But for the null stack
3131: effect @code{--} no stores or loads should be generated.
3132: @end itemize
3133:
1.4 anton 3134: @node Produced code, , TOS Optimization, Primitives
1.3 anton 3135: @subsection Produced code
3136:
3137: To see what assembly code is produced for the primitives on your machine
3138: with your compiler and your flag settings, type @code{make engine.s} and
1.4 anton 3139: look at the resulting file @file{engine.s}.
1.3 anton 3140:
1.17 anton 3141: @node System Architecture, Performance, Primitives, Internals
1.3 anton 3142: @section System Architecture
3143:
3144: Our Forth system consists not only of primitives, but also of
3145: definitions written in Forth. Since the Forth compiler itself belongs
3146: to those definitions, it is not possible to start the system with the
3147: primitives and the Forth source alone. Therefore we provide the Forth
3148: code as an image file in nearly executable form. At the start of the
3149: system a C routine loads the image file into memory, sets up the
3150: memory (stacks etc.) according to information in the image file, and
3151: starts executing Forth code.
3152:
3153: The image file format is a compromise between the goals of making it
3154: easy to generate image files and making them portable. The easiest way
3155: to generate an image file is to just generate a memory dump. However,
3156: this kind of image file cannot be used on a different machine, or on
3157: the next version of the engine on the same machine, it even might not
3158: work with the same engine compiled by a different version of the C
3159: compiler. We would like to have as few versions of the image file as
3160: possible, because we do not want to distribute many versions of the
3161: same image file, and to make it easy for the users to use their image
3162: files on many machines. We currently need to create a different image
3163: file for machines with different cell sizes and different byte order
1.17 anton 3164: (little- or big-endian)@footnote{We are considering adding information to the
1.3 anton 3165: image file that enables the loader to change the byte order.}.
3166:
3167: Forth code that is going to end up in a portable image file has to
1.4 anton 3168: comply to some restrictions: addresses have to be stored in memory with
3169: special words (@code{A!}, @code{A,}, etc.) in order to make the code
3170: relocatable. Cells, floats, etc., have to be stored at the natural
3171: alignment boundaries@footnote{E.g., store floats (8 bytes) at an address
3172: dividable by~8. This happens automatically in our system when you use
3173: the ANS Forth alignment words.}, in order to avoid alignment faults on
3174: machines with stricter alignment. The image file is produced by a
3175: metacompiler (@file{cross.fs}).
1.3 anton 3176:
3177: So, unlike the image file of Mitch Bradleys @code{cforth}, our image
3178: file is not directly executable, but has to undergo some manipulations
3179: during loading. Address relocation is performed at image load-time, not
3180: at run-time. The loader also has to replace tokens standing for
3181: primitive calls with the appropriate code-field addresses (or code
3182: addresses in the case of direct threading).
1.4 anton 3183:
1.17 anton 3184: @node Performance, , System Architecture, Internals
3185: @section Performance
3186:
3187: On RISCs the Gforth engine is very close to optimal; i.e., it is usually
3188: impossible to write a significantly faster engine.
3189:
3190: On register-starved machines like the 386 architecture processors
3191: improvements are possible, because @code{gcc} does not utilize the
3192: registers as well as a human, even with explicit register declarations;
3193: e.g., Bernd Beuster wrote a Forth system fragment in assembly language
3194: and hand-tuned it for the 486; this system is 1.19 times faster on the
3195: Sieve benchmark on a 486DX2/66 than Gforth compiled with
3196: @code{gcc-2.6.3} with @code{-DFORCE_REG}.
3197:
3198: However, this potential advantage of assembly language implementations
3199: is not necessarily realized in complete Forth systems: We compared
3200: Gforth (compiled with @code{gcc-2.6.3} and @code{-DFORCE_REG}) with
1.18 anton 3201: Win32Forth 1.2093 and LMI's NT Forth (Beta, May 1994), two systems
3202: written in assembly, and with two systems written in C: PFE-0.9.11
3203: (compiled with @code{gcc-2.6.3} with the default configuration for
3204: Linux: @code{-O2 -fomit-frame-pointer -DUSE_REGS}) and ThisForth Beta
3205: (compiled with gcc-2.6.3 -O3 -fomit-frame-pointer). We benchmarked
3206: Gforth, PFE and ThisForth on a 486DX2/66 under Linux. Kenneth O'Heskin
3207: kindly provided the results for Win32Forth and NT Forth on a 486DX2/66
3208: with similar memory performance under Windows NT.
1.17 anton 3209:
3210: We used four small benchmarks: the ubiquitous Sieve; bubble-sorting and
3211: matrix multiplication come from the Stanford integer benchmarks and have
3212: been translated into Forth by Martin Fraeman; we used the versions
3213: included in the TILE Forth package; and a recursive Fibonacci number
3214: computation for benchmark calling performance. The following table shows
3215: the time taken for the benchmarks scaled by the time taken by Gforth (in
3216: other words, it shows the speedup factor that Gforth achieved over the
3217: other systems).
3218:
3219: @example
3220: relative Win32- NT This-
3221: time Gforth Forth Forth PFE Forth
3222: sieve 1.00 1.30 1.07 1.67 2.98
3223: bubble 1.00 1.30 1.40 1.66
3224: matmul 1.00 1.40 1.29 2.24
3225: fib 1.00 1.44 1.26 1.82 2.82
3226: @end example
3227:
3228: You may find the good performance of Gforth compared with the systems
3229: written in assembly language quite surprising. One important reason for
3230: the disappointing performance of these systems is probably that they are
3231: not written optimally for the 486 (e.g., they use the @code{lods}
3232: instruction). In addition, Win32Forth uses a comfortable, but costly
3233: method for relocating the Forth image: like @code{cforth}, it computes
3234: the actual addresses at run time, resulting in two address computations
3235: per NEXT (@pxref{System Architecture}).
3236:
3237: The speedup of Gforth over PFE and ThisForth can be easily explained
3238: with the self-imposed restriction to standard C (although the measured
3239: implementation of PFE uses a GNU C extension: global register
3240: variables), which makes efficient threading impossible. Moreover,
3241: current C compilers have a hard time optimizing other aspects of the
3242: ThisForth source.
3243:
3244: Note that the performance of Gforth on 386 architecture processors
3245: varies widely with the version of @code{gcc} used. E.g., @code{gcc-2.5.8}
3246: failed to allocate any of the virtual machine registers into real
3247: machine registers by itself and would not work correctly with explicit
3248: register declarations, giving a 1.3 times slower engine (on a 486DX2/66
3249: running the Sieve) than the one measured above.
3250:
1.4 anton 3251: @node Bugs, Pedigree, Internals, Top
3252: @chapter Bugs
3253:
1.17 anton 3254: Known bugs are described in the file BUGS in the Gforth distribution.
3255:
3256: If you find a bug, please send a bug report to !!. A bug report should
3257: describe the Gforth version used (it is announced at the start of an
3258: interactive Gforth session), the machine and operating system (on Unix
3259: systems you can use @code{uname -a} to produce this information), the
3260: installation options (!! a way to find them out), and a complete list of
3261: changes you (or your installer) have made to the Gforth sources (if
3262: any); it should contain a program (or a sequence of keyboard commands)
3263: that reproduces the bug and a description of what you think constitutes
3264: the buggy behaviour.
3265:
3266: For a thorough guide on reporting bugs read @ref{Bug Reporting, , How
3267: to Report Bugs, gcc.info, GNU C Manual}.
3268:
3269:
1.4 anton 3270: @node Pedigree, Word Index, Bugs, Top
3271: @chapter Pedigree
3272:
1.17 anton 3273: Gforth descends from BigForth (1993) and fig-Forth. Gforth and PFE (by
3274: Dirk Zoller) will cross-fertilize each other. Of course, a significant part of the design of Gforth was prescribed by ANS Forth.
3275:
3276: Bernd Paysan wrote BigForth, a child of VolksForth.
3277:
3278: VolksForth descends from F83. !! Authors? When?
3279:
3280: Laxen and Perry wrote F83 as a model implementation of the
3281: Forth-83 standard. !! Pedigree? When?
3282:
3283: A team led by Bill Ragsdale implemented fig-Forth on many processors in
3284: 1979. Dean Sanderson and Bill Ragsdale developed the original
3285: implementation of fig-Forth based on microForth.
3286:
3287: !! microForth pedigree
3288:
3289: A part of the information in this section comes from @cite{The Evolution
3290: of Forth} by Elizabeth D. Rather, Donald R. Colburn and Charles
3291: H. Moore, presented at the HOPL-II conference and preprinted in SIGPLAN
3292: Notices 28(3), 1993. You can find more historical and genealogical
3293: information about Forth there.
3294:
1.4 anton 3295: @node Word Index, Node Index, Pedigree, Top
3296: @chapter Word Index
3297:
1.18 anton 3298: This index is as incomplete as the manual. Each word is listed with
3299: stack effect and wordset.
1.17 anton 3300:
3301: @printindex fn
3302:
1.4 anton 3303: @node Node Index, , Word Index, Top
3304: @chapter Node Index
1.17 anton 3305:
3306: This index is even less complete than the manual.
1.1 anton 3307:
3308: @contents
3309: @bye
3310:
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>