--- gforth/Attic/gforth.ds 1996/08/21 14:58:40 1.34 +++ gforth/Attic/gforth.ds 1996/09/23 08:52:47 1.36 @@ -755,11 +755,11 @@ double sized signed integer @item ud double sized unsigned integer @item r -Float +Float (on the FP stack) @item a_ Cell-aligned address @item c_ -Char-aligned address (note that a Char is two bytes in Windows NT) +Char-aligned address (note that a Char may have two bytes in Windows NT) @item f_ Float-aligned address @item df_ @@ -772,6 +772,11 @@ Execution token, same size as Cell Wordlist ID, same size as Cell @item f83name Pointer to a name structure +@item " +string in the input stream (not the stack). The terminating character is +a blank by default. If it is not a blank, it is shown in @code{<>} +quotes. + @end table @node Arithmetic, Stack Manipulation, Notation, Words @@ -849,7 +854,7 @@ The format of floating point numbers rec interpreter is: a signed decimal number, possibly containing a decimal point (@code{.}), followed by @code{E} or @code{e}, optionally followed by a signed integer (the exponent). E.g., @code{1e} ist the same as -@code{+1.0e+1}. Note that a number without @code{e} +@code{+1.0e+0}. Note that a number without @code{e} is not interpreted as floating-point number, but as double (if the number contains a @code{.}) or single precision integer. Also, conversions between string and floating point numbers always use base @@ -1734,7 +1739,7 @@ E.g., a definition using @code{TO} might : strcmp @{ addr1 u1 addr2 u2 -- n @} u1 u2 min 0 ?do - addr1 c@ addr2 c@ - + addr1 c@@ addr2 c@@ - ?dup-if unloop exit then @@ -1757,7 +1762,7 @@ are initialized with the right value for addr1 addr2 u1 u2 min 0 ?do @{ s1 s2 @} - s1 c@ s2 c@ - + s1 c@@ s2 c@@ - ?dup-if unloop exit then @@ -1931,8 +1936,8 @@ stack easier. The whole definition must be in one line. @end itemize -Locals defined in this way behave like @code{VALUE}s -(@xref{Values}). I.e., they are initialized from the stack. Using their +Locals defined in this way behave like @code{VALUE}s (@xref{Simple +Defining Words}). I.e., they are initialized from the stack. Using their name produces their value. Their value can be changed using @code{TO}. Since this syntax is supported by Gforth directly, you need not do @@ -1961,11 +1966,406 @@ locals wordset. @section Defining Words @menu -* Values:: +* Simple Defining Words:: +* Colon Definitions:: +* User-defined Defining Words:: +* Supplying names:: +* Interpretation and Compilation Semantics:: @end menu -@node Values, , Defining Words, Defining Words -@subsection Values +@node Simple Defining Words, Colon Definitions, Defining Words, Defining Words +@subsection Simple Defining Words + +doc-constant +doc-2constant +doc-fconstant +doc-variable +doc-2variable +doc-fvariable +doc-create +doc-user +doc-value +doc-to +doc-defer +doc-is + +@node Colon Definitions, User-defined Defining Words, Simple Defining Words, Defining Words +@subsection Colon Definitions + +@example +: name ( ... -- ... ) + word1 word2 word3 ; +@end example + +creates a word called @code{name}, that, upon execution, executes +@code{word1 word2 word3}. @code{name} is a @dfn{(colon) definition}. + +The explanation above is somewhat superficial. @xref{Interpretation and +Compilation Semantics} for an in-depth discussion of some of the issues +involved. + +doc-: +doc-; + +@node User-defined Defining Words, Supplying names, Colon Definitions, Defining Words +@subsection User-defined Defining Words + +You can create new defining words simply by wrapping defining-time code +around existing defining words and putting the sequence in a colon +definition. + +If you want the words defined with your defining words to behave +differently from words defined with standard defining words, you can +write your defining word like this: + +@example +: def-word ( "name" -- ) + Create @var{code1} +DOES> ( ... -- ... ) + @var{code2} ; + +def-word name +@end example + +Technically, this fragment defines a defining word @code{def-word}, and +a word @code{name}; when you execute @code{name}, the address of the +body of @code{name} is put on the data stack and @var{code2} is executed +(the address of the body of @code{name} is the address @code{HERE} +returns immediately after the @code{CREATE}). + +In other words, if you make the following definitions: + +@example +: def-word1 ( "name" -- ) + Create @var{code1} ; + +: action1 ( ... -- ... ) + @var{code2} ; + +def-word name1 +@end example + +Using @code{name1 action1} is equivalent to using @code{name}. + +E.g., you can implement @code{Constant} in this way: + +@example +: constant ( w "name" -- ) + create , +DOES> ( -- w ) + @@ ; +@end example + +When you create a constant with @code{5 constant five}, first a new word +@code{five} is created, then the value 5 is laid down in the body of +@code{five} with @code{,}. When @code{five} is invoked, the address of +the body is put on the stack, and @code{@@} retrieves the value 5. + +In the example above the stack comment after the @code{DOES>} specifies +the stack effect of the defined words, not the stack effect of the +following code (the following code expects the address of the body on +the top of stack, which is not reflected in the stack comment). This is +the convention that I use and recommend (it clashes a bit with using +locals declarations for stack effect specification, though). + +@subsubsection Applications of @code{CREATE..DOES>} + +You may wonder how to use this feature. Here are some usage patterns: + +When you see a sequence of code occurring several times, and you can +identify a meaning, you will factor it out as a colon definition. When +you see similar colon definitions, you can factor them using +@code{CREATE..DOES>}. E.g., an assembler usually defines several words +that look very similar: +@example +: ori, ( reg-taget reg-source n -- ) + 0 asm-reg-reg-imm ; +: andi, ( reg-taget reg-source n -- ) + 1 asm-reg-reg-imm ; +@end example + +This could be factored with: +@example +: reg-reg-imm ( op-code -- ) + create , +DOES> ( reg-taget reg-source n -- ) + @@ asm-reg-reg-imm ; + +0 reg-reg-imm ori, +1 reg-reg-imm andi, +@end example + +Another view of @code{CREATE..DOES>} is to consider it as a crude way to +supply a part of the parameters for a word (known as @dfn{currying} in +the functional language community). E.g., @code{+} needs two +parameters. Creating versions of @code{+} with one parameter fixed can +be done like this: +@example +: curry+ ( n1 -- ) + create , +DOES> ( n2 -- n1+n2 ) + @@ + ; + + 3 curry+ 3+ +-2 curry+ 2- +@end example + +@subsubsection The gory details of @code{CREATE..DOES>} + +doc-does> + +This means that you need not use @code{CREATE} and @code{DOES>} in the +same definition; E.g., you can put the @code{DOES>}-part in a separate +definition. This allows us to, e.g., select among different DOES>-parts: +@example +: does1 +DOES> ( ... -- ... ) + ... ; + +: does2 +DOES> ( ... -- ... ) + ... ; + +: def-word ( ... -- ... ) + create ... + IF + does1 + ELSE + does2 + ENDIF ; +@end example + +In a standard program you can apply a @code{DOES>}-part only if the last +word was defined with @code{CREATE}. In Gforth, the @code{DOES>}-part +will override the behaviour of the last word defined in any case. In a +standard program, you can use @code{DOES>} only in a colon +definition. In Gforth, you can also use it in interpretation state, in a +kind of one-shot mode: +@example +CREATE name ( ... -- ... ) + @var{initialization} +DOES> + @var{code} ; +@end example +This is equivalwent to the standard +@example +:noname +DOES> + @var{code} ; +CREATE name EXECUTE ( ... -- ... ) + @var{initialization} +@end example + +You can get the address of the body of a word with + +doc->body + +@node Supplying names, Interpretation and Compilation Semantics, User-defined Defining Words, Defining Words +@subsection Supplying names for the defined words + +By default, defining words take the names for the defined words from the +input stream. Sometimes you want to supply the name from a string. You +can do this with + +doc-nextname + +E.g., + +@example +s" foo" nextname create +@end example +is equivalent to +@example +create foo +@end example + +Sometimes you want to define a word without a name. You can do this with + +doc-noname + +To make any use of the newly defined word, you need its execution +token. You can get it with + +doc-lastxt + +E.g., you can initialize a deferred word with an anonymous colon +definition: +@example +Defer deferred +noname : ( ... -- ... ) + ... ; +lastxt IS deferred +@end example + +@code{lastxt} also works when the last word was not defined as +@code{noname}. + +The standard has also recognized the need for anonymous words and +provides + +doc-:noname + +This leaves the execution token for the word on the stack after the +closing @code{;}. You can rewrite the last example with @code{:noname}: +@example +Defer deferred +:noname ( ... -- ... ) + ... ; +IS deferred +@end example + +@node Interpretation and Compilation Semantics, , Supplying names, Defining Words +@subsection Interpretation and Compilation Semantics + +The @dfn{interpretation semantics} of a word are what the text +interpreter does when it encounters the word in interpret state. It also +appears in some other contexts, e.g., the execution token returned by +@code{' @var{word}} identifies the interpretation semantics of +@var{word} (in other words, @code{' @var{word} execute} is equivalent to +interpret-state text interpretation of @code{@var{word}}). + +The @dfn{compilation semantics} of a word are what the text interpreter +does when it encounters the word in compile state. It also appears in +other contexts, e.g, @code{POSTPONE @var{word}} compiles@footnote{In +standard terminology, ``appends to the current definition''.} the +compilation semantics of @var{word}. + +The standard also talks about @dfn{execution semantics}. They are used +only for defining the interpretation and compilation semantics of many +words. By default, the interpretation semantics of a word are to +@code{execute} its execution semantics, and the compilation semantics of +a word are to @code{compile,} its execution semantics.@footnote{In +standard terminology: The default interpretation semantics are its +execution semantics; the default compilation semantics are to append its +execution semantics to the execution semantics of the current +definition.} + +You can change the compilation semantics into @code{execute}ing the +execution semantics with + +doc-immediate + +You can remove the interpretation semantics of a word with + +doc-compile-only +doc-restrict + +Note that ticking (@code{'}) compile-only words gives an error +(``Interpreting a compile-only word''). + +Gforth also allows you to define words with arbitrary combinations of +interpretation and compilation semantics. + +doc-interpret/compile: + +This feature was introduced for implementing @code{TO} and @code{S"}. I +recommend that you do not define such words, as cute as they may be: +they make it hard to get at both parts of the word in some contexts. +E.g., assume you want to get an execution token for the compilation +part. Instead, define two words, one that embodies the interpretation +part, and one that embodies the compilation part. + +There is, however, a potentially useful application of this feature: +Providing differing implementations for the default semantics. While +this introduces redundancy and is therefore usually a bad idea, a +performance improvement may be worth the trouble. E.g., consider the +word @code{foobar}: + +@example +: foobar + foo bar ; +@end example + +Let us assume that @code{foobar} is called so frequently that the +calling overhead would take a significant amount of the run-time. We can +optimize it with @code{interpret/compile:}: + +@example +:noname + foo bar ; +:noname + POSTPONE foo POSTPONE bar ; +interpret/compile: foobar +@end example + +This definition has the same interpretation semantics and essentially +the same compilation semantics as the simple definition of +@code{foobar}, but the implementation of the compilation semantics is +more efficient with respect to run-time. + +Some people try to use state-smart words to emulate the feature provided +by @code{interpret/compile:} (words are state-smart if they check +@code{STATE} during execution). E.g., they would try to code +@code{foobar} like this: + +@example +: foobar + STATE @@ + IF ( compilation state ) + POSTPONE foo POSTPONE bar + ELSE + foo bar + ENDIF ; immediate +@end example + +While this works if @code{foobar} is processed only by the text +interpreter, it does not work in other contexts (like @code{'} or +@code{POSTPONE}). E.g., @code{' foobar} will produce an execution token +for a state-smart word, not for the interpretation semantics of the +original @code{foobar}; when you execute this execution token (directly +with @code{EXECUTE} or indirectly through @code{COMPILE,}) in compile +state, the result will not be what you expected (i.e., it will not +perform @code{foo bar}). State-smart words are a bad idea. Simply don't +write them! + +It is also possible to write defining words that define words with +arbitrary combinations of interpretation and compilation semantics (or, +preferably, arbitrary combinations of implementations of the default +semantics). In general, this looks like: + +@example +: def-word + create-interpret/compile + @var{code1} +interpretation> + @var{code2} + + @var{code3} + ( -- n ) + @@ + ( compilation. -- ; run-time. -- n ) + @@ postpone literal + +doc- +doc-body} also gives you the body of a word created with +@code{create-interpret/compile}. @node Wordlists, Files, Defining Words, Words @section Wordlists @@ -2141,9 +2541,10 @@ and use that in your assembly code. Another option for implementing normal and defining words efficiently is: adding the wanted functionality to the source of Gforth. For normal -words you just have to edit @file{primitives}, defining words (for fast -defined words) may require changes in @file{engine.c}, -@file{kernal.fs}, @file{prims2x.fs}, and possibly @file{cross.fs}. +words you just have to edit @file{primitives} (@pxref{Automatic +Generation}), defining words (equivalent to @code{;CODE} words, for fast +defined words) may require changes in @file{engine.c}, @file{kernal.fs}, +@file{prims2x.fs}, and possibly @file{cross.fs}. @node Threading Words, , Assembler and Code words, Words @@ -2174,10 +2575,10 @@ doc-douser: doc-dodefer: doc-dofield: -Currently there is no installation-independent way for recogizing words -defined by a @code{CREATE}...@code{DOES>} word; however, once you know -that a word is defined by a @code{CREATE}...@code{DOES>} word, you can -use @code{>DOES-CODE}. +You can recognize words defined by a @code{CREATE}...@code{DOES>} word +with @code{>DOES-CODE}. If the word was defined in that way, the value +returned is different from 0 and identifies the @code{DOES>} used by the +defining word. @node ANS conformance, Model, Words, Top @chapter ANS conformance @@ -2262,7 +2663,7 @@ processor-dependent. Gforth's alignment @item @code{EMIT} and non-graphic characters: The character is output using the C library function (actually, macro) -@code{putchar}. +@code{putc}. @item character editing of @code{ACCEPT} and @code{EXPECT}: This is modeled on the GNU readline library (@pxref{Readline @@ -2282,18 +2683,18 @@ installation-dependent. Currently a char @item character-set extensions and matching of names: Any character except the ASCII NUL charcter can be used in a -name. Matching is case-insensitive. The matching is performed using the -C function @code{strncasecmp}, whose function is probably influenced by -the locale. E.g., the @code{C} locale does not know about accents and -umlauts, so they are matched case-sensitively in that locale. For -portability reasons it is best to write programs such that they work in -the @code{C} locale. Then one can use libraries written by a Polish -programmer (who might use words containing ISO Latin-2 encoded -characters) and by a French programmer (ISO Latin-1) in the same program -(of course, @code{WORDS} will produce funny results for some of the -words (which ones, depends on the font you are using)). Also, the locale -you prefer may not be available in other operating systems. Hopefully, -Unicode will solve these problems one day. +name. Matching is case-insensitive (except in @code{TABLE}s. The +matching is performed using the C function @code{strncasecmp}, whose +function is probably influenced by the locale. E.g., the @code{C} locale +does not know about accents and umlauts, so they are matched +case-sensitively in that locale. For portability reasons it is best to +write programs such that they work in the @code{C} locale. Then one can +use libraries written by a Polish programmer (who might use words +containing ISO Latin-2 encoded characters) and by a French programmer +(ISO Latin-1) in the same program (of course, @code{WORDS} will produce +funny results for some of the words (which ones, depends on the font you +are using)). Also, the locale you prefer may not be available in other +operating systems. Hopefully, Unicode will solve these problems one day. @item conditions under which control characters match a space delimiter: If @code{WORD} is called with the space character as a delimiter, all @@ -2326,9 +2727,9 @@ The error string is stored into the vari @code{-2 throw} is performed. @item input line terminator: -For interactive input, @kbd{C-m} and @kbd{C-j} terminate lines. One of -these characters is typically produced when you type the @kbd{Enter} or -@kbd{Return} key. +For interactive input, @kbd{C-m} (CR) and @kbd{C-j} (LF) terminate +lines. One of these characters is typically produced when you type the +@kbd{Enter} or @kbd{Return} key. @item maximum size of a counted string: @code{s" /counted-string" environment? drop .}. Currently 255 characters @@ -2349,11 +2750,11 @@ change it from within Gforth. However, t redirected in the command line that starts Gforth. @item method of selecting the user output device: -The user output device is the standard output. It cannot be redirected -from within Gforth, but typically from the command line that starts -Gforth. Gforth uses buffered output, so output on a terminal does not -become visible before the next newline or buffer overflow. Output on -non-terminals is invisible until the buffer overflows. +@code{EMIT} and @code{TYPE} output to the file-id stored in the value +@code{outfile-id} (@code{stdout} by default). Gforth uses buffered +output, so output on a terminal does not become visible before the next +newline or buffer overflow. Output on non-terminals is invisible until +the buffer overflows. @item methods of dictionary compilation: What are we expected to document here? @@ -2389,7 +2790,7 @@ string. @code{1 chars .}. 1 on all current ports. @item size of the keyboard terminal buffer: -Varies. You can determine the size at a specific time using @code{lp@ +Varies. You can determine the size at a specific time using @code{lp@@ tib - .}. It is shared with the locals stack and TIBs of files that include the current file. You can change the amount of space for TIBs and locals stack at Gforth startup with the command line option @@ -2401,14 +2802,15 @@ shared with @code{WORD}. @item size of the scratch area returned by @code{PAD}: The remainder of dictionary space. You can even use the unused part of -the data stack space. The current size can be computed with @code{sp@ +the data stack space. The current size can be computed with @code{sp@@ pad - .}. @item system case-sensitivity characteristics: -Dictionary searches are case insensitive. However, as explained above -under @i{character-set extensions}, the matching for non-ASCII -characters is determined by the locale you are using. In the default -@code{C} locale all non-ASCII characters are matched case-sensitively. +Dictionary searches are case insensitive (except in +@code{TABLE}s). However, as explained above under @i{character-set +extensions}, the matching for non-ASCII characters is determined by the +locale you are using. In the default @code{C} locale all non-ASCII +characters are matched case-sensitively. @item system prompt: @code{ ok} in interpret state, @code{ compiled} in compile state. @@ -2425,7 +2827,7 @@ the choice to @code{gcc} (what to use fo On two's complement machines, arithmetic is performed modulo 2**bits-per-cell for single arithmetic and 4**bits-per-cell for double arithmetic (with appropriate mapping for signed types). Division by zero -typically results in a @code{-55 throw} (floatingpoint unidentified +typically results in a @code{-55 throw} (Floating-point unidentified fault), although a @code{-10 throw} (divide by zero) would be more appropriate. @@ -2442,7 +2844,9 @@ No. @table @i @item a name is neither a word nor a number: -@code{-13 throw} (Undefined word) +@code{-13 throw} (Undefined word). Actually, @code{-13 bounce}, which +preserves the data and FP stack, so you don't lose more work than +necessary. @item a definition name exceeds the maximum length allowed: @code{-19 throw} (Word name too long) @@ -2459,8 +2863,9 @@ flow words, and issue a @code{ABORT"} or mismatch). @item attempting to obtain the execution token of a word with undefined execution semantics: -You get an execution token representing the compilation semantics -instead. +@code{-14 throw} (Interpreting a compile-only word). In some cases, you +get an execution token for @code{compile-only-error} (which performs a +@code{-14 throw} when executed). @item dividing by zero: typically results in a @code{-55 throw} (floating point unidentified @@ -2481,12 +2886,7 @@ error appears at a different place when @item interpreting a word with undefined interpretation semantics: For some words, we defined interpretation semantics. For the others: -@code{-14 throw} (Interpreting a compile-only word). Note that this is -checked only by the outer (aka text) interpreter; if the word is -@code{execute}d in some other way, it will typically perform it's -compilation semantics even in interpret state. (We could change @code{'} -and relatives not to give the xt of such words, but we think that would -be too restrictive). +@code{-14 throw} (Interpreting a compile-only word). @item modifying the contents of the input buffer or a string literal: These are located in writable memory and can be modified. @@ -2513,7 +2913,7 @@ underflow) is performed. Apart from that underflows can result in similar behaviour as overflows (of adjacent stacks). -@item unexepected end of the input buffer, resulting in an attempt to use a zero-length string as a name: +@item unexpected end of the input buffer, resulting in an attempt to use a zero-length string as a name: @code{Create} and its descendants perform a @code{-16 throw} (Attempt to use zero-length string as a name). Words like @code{'} probably will not find what they search. Note that it is possible to create zero-length @@ -2523,7 +2923,7 @@ names with @code{nextname} (should it no The next invocation of a parsing word returns a string wih length 0. @item @code{RECURSE} appears after @code{DOES>}: -Compiles a recursive call to the defining word not to the defined word. +Compiles a recursive call to the defining word, not to the defined word. @item argument input source different than current input source for @code{RESTORE-INPUT}: @code{-12 THROW}. Note that, once an input file is closed (e.g., because @@ -2532,7 +2932,7 @@ reused. Therefore, restoring an input so closed file may lead to unpredictable results instead of a @code{-12 THROW}. -In the future, Gforth may be able to retore input source specifications +In the future, Gforth may be able to restore input source specifications from other than the current input soruce. @item data space containing definitions gets de-allocated: @@ -2560,7 +2960,8 @@ stack items are loop control parameters @code{abort" last word was headerless"}. @item name not defined by @code{VALUE} used by @code{TO}: -@code{-32 throw} (Invalid name argument) +@code{-32 throw} (Invalid name argument) (unless name was defined by +@code{CONSTANT}; then it just changes the constant). @item name not found (@code{'}, @code{POSTPONE}, @code{[']}, @code{[COMPILE]}): @code{-13 throw} (Undefined word) @@ -2570,8 +2971,8 @@ Gforth behaves as if they were of the sa the behaviour by interpreting all parameters as, e.g., signed. @item @code{POSTPONE} or @code{[COMPILE]} applied to @code{TO}: -Assume @code{: X POSTPONE TO ; IMMEDIATE}. @code{X} is equivalent to -@code{TO}. +Assume @code{: X POSTPONE TO ; IMMEDIATE}. @code{X} performs the +compilation semantics of @code{TO}. @item String longer than a counted string returned by @code{WORD}: Not checked. The string will be ok, but the count will, of course, @@ -2610,7 +3011,7 @@ and you can give commands to Gforth inte available depend on how you invoke Gforth. @item program data space available: -@code{sp@ here - .} gives the space remaining for dictionary and data +@code{sp@@ here - .} gives the space remaining for dictionary and data stack together. @item return stack space available: @@ -2618,7 +3019,7 @@ By default 16 KBytes. The default can be switch (@pxref{Invocation}) when Gforth starts up. @item stack space available: -@code{sp@ here - .} gives the space remaining for dictionary and data +@code{sp@@ here - .} gives the space remaining for dictionary and data stack together. @item system dictionary space required, in address units: @@ -3229,7 +3630,7 @@ that are otherwise written in C, C++, or The Forth system ATLAST provides facilities for embedding it into applications; unfortunately it has several disadvantages: most -implorantly, it is not based on ANS Forth, and it is apparently dead +importantly, it is not based on ANS Forth, and it is apparently dead (i.e., not developed further and not supported). The facilities provided by Gforth in this area are inspired by ATLASTs facilities, so making the switch should not be hard. @@ -3246,7 +3647,7 @@ prefix @code{forth_}. (Global symbols th prefix @code{gforth_}). You can include the declarations of Forth types and the functions and -variables of the interface with @code{include }. +variables of the interface with @code{#include }. Types.