--- gforth/Attic/gforth.ds 1994/10/24 19:15:57 1.1 +++ gforth/Attic/gforth.ds 1994/11/14 19:01:16 1.2 @@ -689,27 +689,27 @@ There are several variations on the coun index by @var{n} instead of by 1. The loop is terminated when the border between @var{limit-1} and @var{limit} is crossed. E.g.: -4 0 ?DO i . 2 +LOOP prints 0 2 +@code{4 0 ?DO i . 2 +LOOP} prints @code{0 2} -4 1 ?DO i . 2 +LOOP prints 1 3 +@code{4 1 ?DO i . 2 +LOOP} prints @code{1 3} The behaviour of @code{@var{n} +LOOP} is peculiar when @var{n} is negative: --1 0 ?DO i . -1 +LOOP prints 0 -1 +@code{-1 0 ?DO i . -1 +LOOP} prints @code{0 -1} - 0 0 ?DO i . -1 +LOOP prints nothing +@code{ 0 0 ?DO i . -1 +LOOP} prints nothing Therefore we recommend avoiding using @code{@var{n} +LOOP} with negative @var{n}. One alternative is @code{@var{n} S+LOOP}, where the negative case behaves symmetrical to the positive case: --2 0 ?DO i . -1 +LOOP prints 0 -1 +@code{-2 0 ?DO i . -1 +LOOP} prints @code{0 -1} --1 0 ?DO i . -1 +LOOP prints 0 +@code{-1 0 ?DO i . -1 +LOOP} prints @code{0} - 0 0 ?DO i . -1 +LOOP prints nothing +@code{ 0 0 ?DO i . -1 +LOOP} prints nothing -The loop is terminated when the border between @var{limit-sgn(n)} and +The loop is terminated when the border between @var{limit@minus{}sgn(n)} and @var{limit} is crossed. However, @code{S+LOOP} is not part of the ANS Forth standard. @@ -734,10 +734,570 @@ iterates @var{n+1} times; @code{i} produ and ending with 0. Other Forth systems may behave differently, even if they support @code{FOR} loops. +@subsection Arbitrary control structures + +ANS Forth permits and supports using control structures in a non-nested +way. Information about incomplete control structures is stored on the +control-flow stack. This stack may be implemented on the Forth data +stack, and this is what we have done in gforth. + +An @i{orig} entry represents an unresolved forward branch, a @i{dest} +entry represents a backward branch target. A few words are the basis for +building any control structure possible (except control structures that +need storage, like calls, coroutines, and backtracking). + +if +ahead +then +begin +until +again +cs-pick +cs-roll + +On many systems control-flow stack items take one word, in gforth they +currently take three (this may change in the future). Therefore it is a +really good idea to manipulate the control flow stack with +@code{cs-pick} and @code{cs-roll}, not with data stack manipulation +words. + +Some standard control structure words are built from these words: + +else +while +repeat + +Counted loop words constitute a separate group of words: + +?do +do +for +loop +s+loop ++loop +next +leave +?leave +unloop +undo + +The standard does not allow using @code{cs-pick} and @code{cs-roll} on +@i{do-sys}. Our system allows it, but it's your job to ensure that for +every @code{?DO} etc. there is exactly one @code{UNLOOP} on any path +through the program (@code{LOOP} etc. compile an @code{UNLOOP}). Also, +you have to ensure that all @code{LEAVE}s are resolved (by using one of +the loop-ending words or @code{UNDO}). + +Another group of control structure words are + +case +endcase +of +endof + +@i{case-sys} and @i{of-sys} cannot be processed using @code{cs-pick} and +@code{cs-roll}. + @node Locals @section Locals +Local variables can make Forth programming more enjoyable and Forth +programs easier to read. Unfortunately, the locals of ANS Forth are +laden with restrictions. Therefore, we provide not only the ANS Forth +locals wordset, but also our own, more powerful locals wordset (we +implemented the ANS Forth locals wordset through our locals wordset). + +@menu +@end menu + +@subsection gforth locals + +Locals can be defined with + +@example +@{ local1 local2 ... -- comment @} +@end example +or +@example +@{ local1 local2 ... @} +@end example + +E.g., +@example +: max @{ n1 n2 -- n3 @} + n1 n2 > if + n1 + else + n2 + endif ; +@end example + +The similarity of locals definitions with stack comments is intended. A +locals definition often replaces the stack comment of a word. The order +of the locals corresponds to the order in a stack comment and everything +after the @code{--} is really a comment. + +This similarity has one disadvantage: It is too easy to confuse locals +declarations with stack comments, causing bugs and making them hard to +find. However, this problem can be avoided by appropriate coding +conventions: Do not use both notations in the same program. If you do, +they should be distinguished using additional means, e.g. by position. + +The name of the local may be preceded by a type specifier, e.g., +@code{F:} for a floating point value: + +@example +: CX* @{ F: Ar F: Ai F: Br F: Bi -- Cr Ci @} +\ complex multiplication + Ar Br f* Ai Bi f* f- + Ar Bi f* Ai Br f* f+ ; +@end example + +GNU Forth currently supports cells (@code{W:}, @code{W^}), doubles +(@code{D:}, @code{D^}), floats (@code{F:}, @code{F^}) and characters +(@code{C:}, @code{C^}) in two flavours: a value-flavoured local (defined +with @code{W:}, @code{D:} etc.) produces its value and can be changed +with @code{TO}. A variable-flavoured local (defined with @code{W^} etc.) +produces its address (which becomes invalid when the variable's scope is +left). E.g., the standard word @code{emit} can be defined in therms of +@code{type} like this: + +@example +: emit @{ C^ char* -- @} + char* 1 type ; +@end example + +A local without type specifier is a @code{W:} local. Both flavours of +locals are initialized with values from the data or FP stack. + +Currently there is no way to define locals with user-defined data +structures, but we are working on it. + +GNU Forth allows defining locals everywhere in a colon definition. This poses the following questions: + +@subsubsection Where are locals visible by name? + +Basically, the answer is that locals are visible where you would expect +it in block-structured languages, and sometimes a little longer. If you +want to restrict the scope of a local, enclose its definition in +@code{SCOPE}...@code{ENDSCOPE}. + +doc-scope +doc-endscope + +These words behave like control structure words, so you can use them +with @code{CS-PICK} and @code{CS-ROLL} to restrict the scope in +arbitrary ways. + +If you want a more exact answer to the visibility question, here's the +basic principle: A local is visible in all places that can only be +reached through the definition of the local@footnote{In compiler +construction terminology, all places dominated by the definition of the +local.}. In other words, it is not visible in places that can be reached +without going through the definition of the local. E.g., locals defined +in @code{IF}...@code{ENDIF} are visible until the @code{ENDIF}, locals +defined in @code{BEGIN}...@code{UNTIL} are visible after the +@code{UNTIL} (until, e.g., a subsequent @code{ENDSCOPE}). + +The reasoning behind this solution is: We want to have the locals +visible as long as it is meaningful. The user can always make the +visibility shorter by using explicit scoping. In a place that can +only be reached through the definition of a local, the meaning of a +local name is clear. In other places it is not: How is the local +initialized at the control flow path that does not contain the +definition? Which local is meant, if the same name is defined twice in +two independent control flow paths? + +This should be enough detail for nearly all users, so you can skip the +rest of this section. If you relly must know all the gory details and +options, read on. + +In order to implement this rule, the compiler has to know which places +are unreachable. It knows this automatically after @code{AHEAD}, +@code{AGAIN}, @code{EXIT} and @code{LEAVE}; in other cases (e.g., after +most @code{THROW}s), you can use the word @code{UNREACHABLE} to tell the +compiler that the control flow never reaches that place. If +@code{UNREACHABLE} is not used where it could, the only consequence is +that the visibility of some locals is more limited than the rule above +says. If @code{UNREACHABLE} is used where it should not (i.e., if you +lie to the compiler), buggy code will be produced. + +Another problem with this rule is that at @code{BEGIN}, the compiler +does not know which locals will be visible on the incoming back-edge +. All problems discussed in the following are due to this ignorance of +the compiler (we discuss the problems using @code{BEGIN} loops as +examples; the discussion also applies to @code{?DO} and other +loops). Perhaps the most insidious example is: +@example +AHEAD +BEGIN + x +[ 1 CS-ROLL ] THEN + { x } + ... +UNTIL +@end example + +This should be legal according to the visibility rule. The use of +@code{x} can only be reached through the definition; but that appears +textually below the use. + +From this example it is clear that the visibility rules cannot be fully +implemented without major headaches. Our implementation treats common +cases as advertised and the exceptions are treated in a safe way: The +compiler makes a reasonable guess about the locals visible after a +@code{BEGIN}; if it is too pessimistic, the +user will get a spurious error about the local not being defined; if the +compiler is too optimistic, it will notice this later and issue a +warning. In the case above the compiler would complain about @code{x} +being undefined at its use. You can see from the obscure examples in +this section that it takes quite unusual control structures to get the +compiler into trouble, and even then it will often do fine. + +If the @code{BEGIN} is reachable from above, the most optimistic guess +is that all locals visible before the @code{BEGIN} will also be +visible after the @code{BEGIN}. This guess is valid for all loops that +are entered only through the @code{BEGIN}, in particular, for normal +@code{BEGIN}...@code{WHILE}...@code{REPEAT} and +@code{BEGIN}...@code{UNTIL} loops and it is implemented in our +compiler. When the branch to the @code{BEGIN} is finally generated by +@code{AGAIN} or @code{UNTIL}, the compiler checks the guess and +warns the user if it was too optimisitic: +@example +IF + { x } +BEGIN + \ x ? +[ 1 cs-roll ] THEN + ... +UNTIL +@end example + +Here, @code{x} lives only until the @code{BEGIN}, but the compiler +optimistically assumes that it lives until the @code{THEN}. It notices +this difference when it compiles the @code{UNTIL} and issues a +warning. The user can avoid the warning, and make sure that @code{x} +is not used in the wrong area by using explicit scoping: +@example +IF + SCOPE + { x } + ENDSCOPE +BEGIN +[ 1 cs-roll ] THEN + ... +UNTIL +@end example + +Since the guess is optimistic, there will be no spurious error messages +about undefined locals. + +If the @code{BEGIN} is not reachable from above (e.g., after +@code{AHEAD} or @code{EXIT}), the compiler cannot even make an +optimistic guess, as the locals visible after the @code{BEGIN} may be +defined later. Therefore, the compiler assumes that no locals are +visible after the @code{BEGIN}. However, the useer can use +@code{ASSUME-LIVE} to make the compiler assume that the same locals are +visible at the BEGIN as at the point where the item was created. + +doc-assume-live + +E.g., +@example +{ x } +AHEAD +ASSUME-LIVE +BEGIN + x +[ 1 CS-ROLL ] THEN + ... +UNTIL +@end example + +Other cases where the locals are defined before the @code{BEGIN} can be +handled by inserting an appropriate @code{CS-ROLL} before the +@code{ASSUME-LIVE} (and changing the control-flow stack manipulation +behind the @code{ASSUME-LIVE}). + +Cases where locals are defined after the @code{BEGIN} (but should be +visible immediately after the @code{BEGIN}) can only be handled by +rearranging the loop. E.g., the ``most insidious'' example above can be +arranged into: +@example +BEGIN + { x } + ... 0= +WHILE + x +REPEAT +@end example + +@subsubsection How long do locals live? + +The right answer for the lifetime question would be: A local lives at +least as long as it can be accessed. For a value-flavoured local this +means: until the end of its visibility. However, a variable-flavoured +local could be accessed through its address far beyond its visibility +scope. Ultimately, this would mean that such locals would have to be +garbage collected. Since this entails un-Forth-like implementation +complexities, I adopted the same cowardly solution as some other +languages (e.g., C): The local lives only as long as it is visible; +afterwards its address is invalid (and programs that access it +afterwards are erroneous). + +@subsubsection Programming Style + +The freedom to define locals anywhere has the potential to change +programming styles dramatically. In particular, the need to use the +return stack for intermediate storage vanishes. Moreover, all stack +manipulations (except @code{PICK}s and @code{ROLL}s with run-time +determined arguments) can be eliminated: If the stack items are in the +wrong order, just write a locals definition for all of them; then +write the items in the order you want. + +This seems a little far-fetched and eliminating stack manipulations is +unlikely to become a conscious programming objective. Still, the +number of stack manipulations will be reduced dramatically if local +variables are used liberally (e.g., compare @code{max} in \sect{misc} +with a traditional implementation of @code{max}). + +This shows one potential benefit of locals: making Forth programs more +readable. Of course, this benefit will only be realized if the +programmers continue to honour the principle of factoring instead of +using the added latitude to make the words longer. + +Using @code{TO} can and should be avoided. Without @code{TO}, +every value-flavoured local has only a single assignment and many +advantages of functional languages apply to Forth. I.e., programs are +easier to analyse, to optimize and to read: It is clear from the +definition what the local stands for, it does not turn into something +different later. + +E.g., a definition using @code{TO} might look like this: +@example +: strcmp @{ addr1 u1 addr2 u2 -- n @} + u1 u2 min 0 + ?do + addr1 c@ addr2 c@ - ?dup + if + unloop exit + then + addr1 char+ TO addr1 + addr2 char+ TO addr2 + loop + u1 u2 - ; +@end example +Here, @code{TO} is used to update @code{addr1} and @code{addr2} at +every loop iteration. @code{strcmp} is a typical example of the +readability problems of using @code{TO}. When you start reading +@code{strcmp}, you think that @code{addr1} refers to the start of the +string. Only near the end of the loop you realize that it is something +else. + +This can be avoided by defining two locals at the start of the loop that +are initialized with the right value for the current iteration. +@example +: strcmp @{ addr1 u1 addr2 u2 -- n @} + addr1 addr2 + u1 u2 min 0 + ?do @{ s1 s2 @} + s1 c@ s2 c@ - ?dup + if + unloop exit + then + s1 char+ s2 char+ + loop + 2drop + u1 u2 - ; +@end example +Here it is clear from the start that @code{s1} has a different value +in every loop iteration. + +@subsubsection Implementation + +GNU Forth uses an extra locals stack. The most compelling reason for +this is that the return stack is not float-aligned; using an extra stack +also eliminates the problems and restrictions of using the return stack +as locals stack. Like the other stacks, the locals stack grows toward +lower addresses. A few primitives allow an efficient implementation: + +doc-@local# +doc-f@local# +doc-laddr# +doc-lp+!# +doc-lp! +doc->l +doc-f>l + +In addition to these primitives, some specializations of these +primitives for commonly occurring inline arguments are provided for +efficiency reasons, e.g., @code{@@local0} as specialization of +@code{@@local#} for the inline argument 0. The following compiling words +compile the right specialized version, or the general version, as +appropriate: + +doc-compile-@@local +doc-compile-f@@local +doc-compile-lp+! + +Combinations of conditional branches and @code{lp+!#} like +@code{?branch-lp+!#} (the locals pointer is only changed if the branch +is taken) are provided for efficiency and correctness in loops. + +A special area in the dictionary space is reserved for keeping the +local variable names. @code{@{} switches the dictionary pointer to this +area and @code{@}} switches it back and generates the locals +initializing code. @code{W:} etc.@ are normal defining words. This +special area is cleared at the start of every colon definition. + +A special feature of GNU Forths dictionary is used to implement the +definition of locals without type specifiers: every wordlist (aka +vocabulary) has its own methods for searching +etc. (@xref{dictionary}). For the present purpose we defined a wordlist +with a special search method: When it is searched for a word, it +actually creates that word using @code{W:}. @code{@{} changes the search +order to first search the wordlist containing @code{@}}, @code{W:} etc., +and then the wordlist for defining locals without type specifiers. + +The lifetime rules support a stack discipline within a colon +definition: The lifetime of a local is either nested with other locals +lifetimes or it does not overlap them. + +At @code{BEGIN}, @code{IF}, and @code{AHEAD} no code for locals stack +pointer manipulation is generated. Between control structure words +locals definitions can push locals onto the locals stack. @code{AGAIN} +is the simplest of the other three control flow words. It has to +restore the locals stack depth of the corresponding @code{BEGIN} +before branching. The code looks like this: +@format +@code{lp+!#} current-locals-size @minus{} dest-locals-size +@code{branch} +@end format + +@code{UNTIL} is a little more complicated: If it branches back, it +must adjust the stack just like @code{AGAIN}. But if it falls through, +the locals stack must not be changed. The compiler generates the +following code: +@format +@code{?branch-lp+!#} current-locals-size @minus{} dest-locals-size +@end format +The locals stack pointer is only adjusted if the branch is taken. + +@code{THEN} can produce somewhat inefficient code: +@format +@code{lp+!#} current-locals-size @minus{} orig-locals-size +: +@code{lp+!#} orig-locals-size @minus{} new-locals-size +@end format +The second @code{lp+!#} adjusts the locals stack pointer from the +level at the {\em orig} point to the level after the @code{THEN}. The +first @code{lp+!#} adjusts the locals stack pointer from the current +level to the level at the orig point, so the complete effect is an +adjustment from the current level to the right level after the +@code{THEN}. + +In a conventional Forth implementation a dest control-flow stack entry +is just the target address and an orig entry is just the address to be +patched. Our locals implementation adds a wordlist to every orig or dest +item. It is the list of locals visible (or assumed visible) at the point +described by the entry. Our implementation also adds a tag to identify +the kind of entry, in particular to differentiate between live and dead +(reachable and unreachable) orig entries. + +A few unusual operations have to be performed on locals wordlists: + +doc-common-list +doc-sub-list? +doc-list-size + +Several features of our locals wordlist implementation make these +operations easy to implement: The locals wordlists are organised as +linked lists; the tails of these lists are shared, if the lists +contain some of the same locals; and the address of a name is greater +than the address of the names behind it in the list. + +Another important implementation detail is the variable +@code{dead-code}. It is used by @code{BEGIN} and @code{THEN} to +determine if they can be reached directly or only through the branch +that they resolve. @code{dead-code} is set by @code{UNREACHABLE}, +@code{AHEAD}, @code{EXIT} etc., and cleared at the start of a colon +definition, by @code{BEGIN} and usually by @code{THEN}. + +Counted loops are similar to other loops in most respects, but +@code{LEAVE} requires special attention: It performs basically the same +service as @code{AHEAD}, but it does not create a control-flow stack +entry. Therefore the information has to be stored elsewhere; +traditionally, the information was stored in the target fields of the +branches created by the @code{LEAVE}s, by organizing these fields into a +linked list. Unfortunately, this clever trick does not provide enough +space for storing our extended control flow information. Therefore, we +introduce another stack, the leave stack. It contains the control-flow +stack entries for all unresolved @code{LEAVE}s. + +Local names are kept until the end of the colon definition, even if +they are no longer visible in any control-flow path. In a few cases +this may lead to increased space needs for the locals name area, but +usually less than reclaiming this space would cost in code size. + + +@subsection ANS Forth locals + +The ANS Forth locals wordset does not define a syntax for locals, but +words that make it possible to define various syntaxes. One of the +possible syntaxes is a subset of the syntax we used in the gforth locals +wordset, i.e.: + +@example +@{ local1 local2 ... -- comment @} +@end example +or +@example +@{ local1 local2 ... @} +@end example + +The order of the locals corresponds to the order in a stack comment. The +restrictions are: +@itemize @bullet +@item +Locals can only be cell-sized values (no type specifers are allowed). +@item +Locals can be defined only outside control structures. +@item +Locals can interfere with explicit usage of the return stack. For the +exact (and long) rules, see the standard. If you don't use return stack +accessing words in a definition using locals, you will we all right. The +purpose of this rule is to make locals implementation on the return +stack easier. +@item +The whole definition must be in one line. +@end itemize + +Locals defined in this way behave like @code{VALUE}s +(@xref{values}). I.e., they are initialized from the stack. Using their +name produces their value. Their value can be changed using @code{TO}. + +Since this syntax is supported by gforth directly, you need not do +anything to use it. If you want to port a program using this syntax to +another ANS Forth system, use @file{anslocal.fs} to implement the syntax +on the other system. + +Note that a syntax shown in the standard, section A.13 looks +similar, but is quite different in having the order of locals +reversed. Beware! + +The ANS Forth locals wordset itself consists of the following word + +doc-(local) + +The ANS Forth locals extension wordset defines a syntax, but it is so +awful that we strongly recommend not to use it. We have implemented this +syntax to make porting to gforth easy, but do not document it here. The +problem with this syntax is that the locals are defined in an order +reversed with respect to the standard stack comment notation, making +programs harder to read, and easier to misread and miswrite. The only +merit of this syntax is that it is easy to implement using the ANS Forth +locals wordset. @contents @bye