A single-cell datum may be treated by a Standard Program as a signed integer. Moving and storing such data is performed as for any single-cell data. In addition to the universally-applicable operators for single-cell data specified above, the following mathematical and comparison operators are valid for single-cell signed integers:
* */ */MOD /MOD MOD + +! - / 1+ 1- ABS MAX MIN NEGATE 0< 0> < >
Given the same number of bits, unsigned integers usually represent twice the number of absolute values representable by signed integers.
A single-cell datum may be treated by a Standard Program as an unsigned integer. Moving and storing such data is performed as for any single-cell data. In addition, the following mathematical and comparison operators are valid for single-cell unsigned integers:
UM* UM/MOD + +! - 1+ 1- * U< U>
An address is uniquely represented as a single cell unsigned number and can be treated as such when being moved to, from, or upon the stack. Conversely, each unsigned number represents a unique address (which is not necessarily an address of accessible memory). This one-to-one relationship between addresses and unsigned numbers forces an equivalence between address arithmetic and the corresponding operations on unsigned numbers.
Several operators are provided specifically for address arithmetic:
CHAR+ CHARS CELL+ CELLS
and, if the floating-point word set is present:
FLOAT+ FLOATS SFLOAT+ SFLOATS DFLOAT+ DFLOATS
A Standard Program may never assume a particular correspondence between a Forth address and the physical address to which it is mapped.
The trend in ANS Forth is to move toward the consistent use of the c-addr u representation of strings on the stack. The use of the alternate address of counted string stack representation is discouraged. The traditional Forth words WORD and FIND continue to use the address of counted string representation for historical reasons. The new word C" , added as a porting aid for existing programs, also uses the counted string representation.
Counted strings remain useful as a way to store strings in memory. This use is not discouraged, but when references to such strings appear on the stack, it is preferable to use the c-addr u representation.
The association between an execution token and a definition is static. Once made, it does not change with changes in the search order or anything else. However it may not be unique, e.g., the phrases
' 1+ and ' CHAR+might return the same value.
a) Storage and retrieval
Two operators are provided to fetch and store cell pairs:
2@ 2!
b) Manipulation on the stack
Additionally, these operators may be used to move cell pairs from, to and upon the stack:
2>R 2DROP 2DUP 2OVER 2R> 2SWAP 2ROT
c) Comparison
The following comparison operations are universally valid for cell pairs:
D= D0=
If a double-cell integer is to be treated as signed, the following comparison and mathematical operations are valid:
D+ D- D< D0< DABS DMAX DMIN DNEGATE M*/ M+
If a double-cell integer is to be treated as unsigned, the following comparison and mathematical operations are valid:
D+ D- UM/MOD DU<
See: A.3.1.3.4 Counted Strings.
Traditionally, Forth has been implemented on two's-complement machines where there is a one-to-one mapping of signed numbers to unsigned numbers - any single cell item can be viewed either as a signed or unsigned number. Indeed, the signed representation of any positive number is identical to the equivalent unsigned representation. Further, addresses are treated as unsigned numbers: there is no distinct pointer type. Arithmetic ordering on two's complement machines allows + and - to work on both signed and unsigned numbers. This arithmetic behavior is deeply embedded in common Forth practice. As a consequence of these behaviors, the likely ranges of signed and unsigned numbers for implementations hosted on each of the permissible arithmetic architectures is:
Arithmetic architecture - signed numbers, unsigned numbers
Two's complement - -n-1 to n, 0 to 2n+1
One's complement - -n to n, 0 to n
Signed magnitude - -n to n, 0 to n
where n is the largest positive signed number. For all three architectures, signed numbers in the 0 to n range are bitwise identical to the corresponding unsigned number. Note that unsigned numbers on a signed magnitude machine are equivalent to signed non-negative numbers as a consequence of the forced correspondence between addresses and unsigned numbers and of the required behavior of + and -.
For reference, these number representations may be defined by the way that NEGATE is implemented:
two's complement: : NEGATE INVERT 1+ ; one's complement: : NEGATE INVERT ; signed-magnitude: : NEGATE HIGH-BIT XOR ;
where HIGH-BIT is a bit mask with only the most-significant bit set. Note that all of these number systems agree on the representation of non-negative numbers.
Per 3.2.1.1 Internal number representation and 6.1.0270 0=, the implementor must ensure that no standard or supported word return negative zero for any numeric (non-Boolean or flag) result. Many existing programmer assumptions will be violated otherwise.
There is no requirement to implement circular unsigned arithmetic, nor to set the range of unsigned numbers to the full size of a cell. There is historical precedent for limiting the range of u to that of +n, which is permissible when the cell size is greater than 16 bits.
For example, an implementation might convert the characters a through z identically to the characters A through Z, or it might treat the characters [ through ~ as additional digits with decimal values 36 through 71, respectively.
The Forth-79 Standard specifies that the signed division operators (/, /MOD, MOD, */MOD, and */) round non-integer quotients towards zero (symmetric division). Forth-83 changed the semantics of these operators to round towards negative infinity (floored division). Some in the Forth community have declined to convert systems and applications from the Forth-79 to the Forth-83 divide. To resolve this issue, an ANS Forth system is permitted to supply either floored or symmetric operators. In addition, ANS Forth systems must provide a floored division primitive (FM/MOD), a symmetric division primitive (SM/REM), and a mixed precision multiplication operator (M*).
This compromise protects the investment made in current Forth applications; Forth-79 and Forth-83 programs are automatically compliant with ANS Forth with respect to division. In practice, the rounding direction rarely matters to applications. However, if a program requires a specific rounding direction, it can use the floored division primitive FM/MOD or the symmetric division primitive SM/REM to construct a division operator of the desired flavor. This simple technique can be used to convert Forth-79 and Forth-83 programs to ANS Forth without any analysis of the original programs.
Whether underflow occurs depends on the data-type of the result. For example, the phrase 1 2 - underflows if the result is unsigned and produces the valid signed result -1.
The only data type in Forth which has concrete rather than abstract existence is the stack entry. Even this primitive typing Forth only enforces by the hard reality of stack underflow or overflow. The programmer must have a clear idea of the number of stack entries to be consumed by the execution of a word and the number of entries that will be pushed back to a stack by the execution of a word. The observation of anomalous occurrences on the data stack is the first line of defense whereby the programmer may recognize errors in an application program. It is also worth remembering that multiple stack errors caused by erroneous application code are frequently of equal and opposite magnitude, causing complementary (and deceptive) results.
For these reasons and a host of other reasons, the one unambiguous, uncontroversial, and indispensable programming discipline observed since the earliest days of Forth is that of providing a stack diagram for all additions to the application dictionary with the exception of static constructs such as VARIABLEs and CONSTANTs.
The simplest use of control-flow words is to implement the basic control structures shown in figure A.1.
Figure A.1 - The basic control-flow patterns.
In control flow every branch, or transfer of control, must terminate at some destination. A natural implementation uses a stack to remember the origin of forward branches and the destination of backward branches. At a minimum, only the location of each origin or destination must be indicated, although other implementation-dependent information also may be maintained.
An origin is the location of the branch itself. A destination is where control would continue if the branch were taken. A destination is needed to resolve the branch address for each origin, and conversely, if every control-flow path is completed no unused destinations can remain.
With the addition of just three words (AHEAD, CS-ROLL and CS-PICK), the basic control-flow words supply the primitives necessary to compile a variety of transportable control structures. The abilities required are compilation of forward and backward conditional and unconditional branches and compile-time management of branch origins and destinations. Table A.1 shows the desired behavior.
The requirement that control-flow words are properly balanced by other control-flow words makes reasonable the description of a compile-time implementation-defined control-flow stack. There is no prescription as to how the control-flow stack is implemented, e.g., data stack, linked list, special array. Each element of the control-flow stack mentioned above is the same size.
Table A.1 - Compilation behavior of control-flow words
at compile time,
word:
supplies:
resolves:
is used to:
IF
orig
mark origin of forward conditional branch
THEN
orig
resolve IF or AHEAD
BEGIN
dest
mark backward destination
AGAIN
dest
resolve with backward unconditional branch
UNTIL
dest
resolve with backward conditional branch
AHEAD
orig
mark origin of forward unconditional branch
CS-PICK
copy item on control-flow stack
CS-ROLL
5eorder items on control-flow stack
With these tools, the remaining basic control-structure elements, shown in figure A.2, can be defined. The stack notation used here for immediate words is ( compilation / execution ).
: WHILE ( dest -- orig dest / flag -- ) \ conditional exit from loops POSTPONE IF \ conditional forward branch 1 CS-ROLL \ keep dest on top ; IMMEDIATE : REPEAT ( orig dest -- / -- ) \ resolve a single WHILE and return to BEGIN POSTPONE AGAIN \ uncond. backward branch to dest POSTPONE THEN \ resolve forward branch from orig ; IMMEDIATE : ELSE ( orig1 -- orig2 / -- ) \ resolve IF supplying alternate execution POSTPONE AHEAD \ unconditional forward branch orig2 1 CS-ROLL \ put orig1 back on top POSTPONE THEN \ resolve forward branch from orig1 ; IMMEDIATE
Figure A.2 - Additional basic control-flow patterns.
Forth control flow provides a solution for well-known problems with strictly structured programming.
The basic control structures can be supplemented, as shown in the examples in figure A.3, with additional WHILEs in BEGIN ... UNTIL and BEGIN ... WHILE ... REPEAT structures. However, for each additional WHILE there must be a THEN at the end of the structure. THEN completes the syntax with WHILE and indicates where to continue execution when the WHILE transfers control. The use of more than one additional WHILE is possible but not common. Note that if the user finds this use of THEN undesirable, an alias with a more likable name could be defined.
Additional actions may be performed between the control flow word (the REPEAT or UNTIL) and the THEN that matches the additional WHILE. Further, if additional actions are desired for normal termination and early termination, the alternative actions may be separated by the ordinary Forth ELSE. The termination actions are all specified after the body of the loop.
[[[ figure missing ]]]
Figure A.3 - Extended control-flow pattern examples.
Note that REPEAT creates an anomaly when matching the WHILE with ELSE or THEN, most notable when compared with the BEGIN...UNTIL case. That is, there will be one less ELSE or THEN than there are WHILEs because REPEAT resolves one THEN. As above, if the user finds this count mismatch undesirable, REPEAT could be replaced in-line by its own definition.
Other loop-exit control-flow words, and even other loops, can be defined. The only requirements are that the control-flow stack is properly maintained and manipulated.
The simple implementation of the ANS Forth CASE structure below is an example of control structure extension. Note the maintenance of the data stack to prevent interference with the possible control-flow stack usage.
0 CONSTANT CASE IMMEDIATE ( init count of OFs ) : OF ( #of -- orig #of+1 / x -- ) 1+ ( count OFs ) >R ( move off the stack in case the control-flow ) ( stack is the data stack. ) POSTPONE OVER POSTPONE = ( copy and test case value) POSTPONE IF ( add orig to control flow stack ) POSTPONE DROP ( discards case value if = ) R> ( we can bring count back now ) ; IMMEDIATE : ENDOF ( orig1 #of -- orig2 #of ) >R ( move off the stack in case the control-flow ) ( stack is the data stack. ) POSTPONE ELSE R> ( we can bring count back now ) ; IMMEDIATE : ENDCASE ( orig1..orign #of -- ) POSTPONE DROP ( discard case value ) 0 ?DO POSTPONE THEN LOOP ; IMMEDIATE
Table of Contents
Next Section