A.6.2 Core extension words

The words in this collection fall into several categories:

Because of the varied justifications for inclusion of these words, the Technical Committee does not encourage implementors to offer the complete collection, but to select those words deemed most valuable to their clientele.

A.6.2.0060 #TIB

The function of #TIB has been superseded by SOURCE.

A.6.2.0200 .(

Typical use: .( ccc)

A.6.2.0210 .R

In .R, R is short for RIGHT.

A.6.2.0340 2>R

Historically, 2>R has been used to implement DO. Hence the order of parameters on the return stack.

The primary advantage of 2>R is that it puts the top stack entry on the top of the return stack. For instance, a double-cell number may be transferred to the return stack and still have the most significant cell accessible on the top of the return stack.

A.6.2.0410 2R>

Note that 2R> is not equivalent to R> R>. Instead, it mirrors the action of 2>R (see A.6.2.0340).

A.6.2.0455 :NONAME

:NONAME allows a user to create an execution token with the semantics of a colon definition without an associated name. Previously, only : (colon) could create an execution token with these semantics. Thus, Forth code could only be compiled using the syntax of :, that is:

	: NAME  ...  ;

:NONAME removes this constraint and places the Forth compiler in the hands of the programmer.

:NONAME can be used to create application-specific programming languages. One technique is to mix Forth code fragments with application-specific constructs. The application-specific constructs use :NONAME to compile the Forth code and store the corresponding execution tokens in data structures.

The functionality of :NONAME can be built on any Forth system. For years, expert Forth programmers have exploited intimate knowledge of their systems to generate unnamed code fragments. Now, this function has been named and can be used in a portable program.

For example, :NONAME can be used to build a table of code fragments where indexing into the table allows executing a particular fragment. The declaration syntax of the table is:

:NONAME .. code for command 0 .. ;  0 CMD !

:NONAME .. code for command 1 .. ;  1 CMD !
   ...

:NONAME .. code for command 99 .. ; 99 CMD !

   ... 5 CMD @ EXECUTE ...

The definitions of the table building words are:

CREATE CMD-TABLE  \ table for command execution tokens
100 CELLS ALLOT

: CMD ( n -- a-addr ) \ nth element address in table
    CELLS CMD-TABLE + ;

As a further example, a defining word can be created to allow performance monitoring. In the example below, the number of times a word is executed is counted. : must first be renamed to allow the definition of the new ;.

: DOCOLON ( -- )     \ Modify CREATEd word to execute like a colon def
     DOES> ( i*x a-addr -- j*x )
     1 OVER +!         \ count executions
     CELL+ @ EXECUTE   \ execute :NONAME definition
;

: OLD: : ;           \ just an alias

OLD: : ( "name" -- a-addr xt colon-sys )
                     \ begins an execution-counting colon definition
     CREATE  HERE 0 ,  \ storage for execution counter
     0 ,               \ storage for execution token
     DOCOLON           \ set run time for CREATEd word
    :NONAME           \ begin unnamed colon definition
;

( Note the placement of DOES>: DOES> must modify the CREATEd word and not the :NONAME definition, so DOES> must execute before :NONAME.)

OLD: ; ( a-addr xt colon-sys -- )
                      \ ends an execution-counting colon definition )
    POSTPONE ;        \ complete compilation of colon def
    SWAP CELL+ !      \ save execution token
;  IMMEDIATE

The new : and ; are used just like the standard ones to define words:

	... : xxx  ... ;  ...  xxx  ...
Now however, these words may be ticked to retrieve the count (and execution token):
	... ' xxx >BODY ? ...

A.6.2.0620 ?DO

Typical use: : FACTORIAL ( +n1 -- +n2 ) 1 SWAP 1+ ?DO I * LOOP ;

This word was added in response to many requests for a resolution of the difficulty introduced by Forth-83's DO, which on a 16-bit system will loop 65,535 times if given equal arguments. As this Standard also encourages 32-bit systems, this behavior can be intolerable. The Technical Committee considered applying these semantics to DO, but declined on the grounds that it might break existing code.

A.6.2.0700 AGAIN

Typical use: : X ... BEGIN ... AGAIN ... ;

Unless word-sequence has a way to terminate, this is an endless loop.

A.6.2.0855 C"

Typical use: : X ... C" ccc" ... ;

It is easy to convert counted strings to pointer/length but hard to do the opposite. C" is the only new word that uses the address of counted string stack representation. It is provided as an aid to porting existing programs to ANS Forth systems. It is relatively difficult to implement C" in terms of other standard words, considering its compile string into the current definition semantics.

Users of C" are encouraged to migrate their application code toward the consistent use of the preferred c-addr u stack representation with the alternate word S". This may be accomplished by converting application words with counted string input arguments to use the preferred c-addr u representation, thus eliminating the need for C" .

See: A.3.1.3.4 Counted strings.

A.6.2.0873 CASE

Typical use:

   : X ...
       CASE
       test1 OF ... ENDOF
       testn OF ... ENDOF
       ... ( default )
       ENDCASE ...
   ;

A.6.2.0945 COMPILE,

COMPILE, is the compilation equivalent of EXECUTE. In many cases, it is possible to compile a word by using POSTPONE without resorting to the use of COMPILE,. However, the use of POSTPONE requires that the name of the word must be known at compile time, whereas COMPILE, allows the word to be located at any time. It is sometime possible to use EVALUATE to compile a word whose name is not known until run time. This has two possible problems:

In traditional threaded-code implementations, compilation is performed by , (comma). This usage is not portable; it doesn't work for subroutine-threaded, native code, or relocatable implementations. Use of COMPILE, is portable.

In most systems it is possible to implement COMPILE, so it will generate code that is optimized to the same extent as code that is generated by the normal compilation process. However, in some implementations there are two different tokens corresponding to a particular definition name: the normal execution token that is used while interpreting or with EXECUTE, and another compilation token that is used while compiling. It is not always possible to obtain the compilation token from the execution token. In these implementations, COMPILE, might not generate code that is as efficient as normally compiled code.

A.6.2.0970 CONVERT

CONVERT may be defined as follows:

	: CONVERT   CHAR+ 65535 >NUMBER DROP ;

A.6.2.1342 ENDCASE

Typical use:

   : X ...
       CASE
       test1 OF ... ENDOF
       testn OF ... ENDOF
       ... ( default )
       ENDCASE ...
   ;

A.6.2.1343 ENDOF

Typical use:

: X ...
   CASE
   test1 OF ... ENDOF
   testn OF ... ENDOF
   ... ( default )
   ENDCASE ...
;

A.6.2.1390 EXPECT

Specification of positive integer counts (+n) for EXPECT allows some implementors to continue their practice of using a zero or negative value as a flag to trigger special behavior. Insofar as such behavior is outside the Standard, Standard Programs cannot depend upon it, but the Technical Committee doesn't wish to preclude it unnecessarily. Since actual values are almost always small integers, no functionality is impaired by this restriction.

A.6.2.1850 MARKER

As dictionary implementations have gotten more elaborate and in some cases have used multiple address spaces, FORGET has become prohibitively difficult or impossible to implement on many Forth systems. MARKER greatly eases the problem by making it possible for the system to remember landmark information in advance that specifically marks the spots where the dictionary may at some future time have to be rearranged.

A.6.2.1950 OF

Typical use:

   : X ...
       CASE
       test1 OF ... ENDOF
       testn OF ... ENDOF
       ... ( default )
       ENDCASE ...
   ;

A.6.2.2000 PAD

PAD has been available as scratch storage for strings since the earliest Forth implementations. It was brought to our attention that many programmers are reluctant to use PAD, fearing incompatibilities with system uses. PAD is specifically intended as a programmer convenience, however, which is why we documented the fact that no standard words use it.

A.6.2.2008 PARSE

Typical use: char PARSE ccc<char>

The traditional Forth word for parsing is WORD. PARSE solves the following problems with WORD:

a) WORD always skips leading delimiters. This behavior is appropriate for use by the text interpreter, which looks for sequences of non-blank characters, but is inappropriate for use by words like ( , .( , and ." . Consider the following (flawed) definition of .( :

	: .(   [CHAR] )  WORD COUNT TYPE ;  IMMEDIATE

This works fine when used in a line like:

	.( HELLO)   5 .

but consider what happens if the user enters an empty string:

	.( )   5 .

The definition of .( shown above would treat the ) as a leading delimiter, skip it, and continue consuming characters until it located another ) that followed a non-) character, or until the parse area was empty. In the example shown, the 5 . would be treated as part of the string to be printed.

With PARSE, we could write a correct definition of .( :

	: .(   [CHAR] ) PARSE TYPE ; IMMEDIATE

This definition avoids the empty string anomaly.

b) WORD returns its result as a counted string. This has four bad effects:

1) The characters accepted by WORD must be copied from the input buffer into a temporary buffer, in order to make room for the count character that must be at the beginning of the counted string. The copy step is inefficient, compared to PARSE, which leaves the string in the input buffer and doesn't need to copy it anywhere.

2) WORD must be careful not to store too many characters into the temporary buffer, thus overwriting something beyond the end of the buffer. This adds to the overhead of the copy step. (WORD may have to scan a lot of characters before finding the trailing delimiter.)

3) The count character limits the length of the string returned by WORD to 255 characters (longer strings can easily be stored in blocks!). This limitation does not exist for PARSE.

4) The temporary buffer is typically overwritten by the next use of WORD. This introduces a temporal dependency; the value returned by WORD is only valid for a limited duration. PARSE has a temporal dependency, too, related to the lifetime of the input buffer, but that is less severe in most cases than WORD's temporal dependency.

The behavior of WORD with respect to skipping leading delimiters is useful for parsing blank-delimited names. Many system implementations include an additional word for this purpose, similar to PARSE with respect to the c-addr u return value, but without an explicit delimiter argument (the delimiter set is implicitly white space), and which does skip leading delimiters. A common description for this word is:

	PARSE-WORD  ( <spaces>name -- c-addr u )

Skip leading spaces and parse name delimited by a space. c-addr is the address within the input buffer and u is the length of the selected string. If the parse area is empty, the resulting string has a zero length.

If both PARSE and PARSE-WORD are present, the need for WORD is largely eliminated.

A.6.2.2030 PICK

0 PICK is equivalent to DUP and 1 PICK is equivalent to OVER.

A.6.2.2040 QUERY

The function of QUERY may be performed with ACCEPT and EVALUATE.

A.6.2.2125 REFILL

This word is a useful generalization of QUERY. Re-defining QUERY to meet this specification would have broken existing code. REFILL is designed to behave reasonably for all possible input sources. If the input source is coming from the user, as with QUERY, REFILL could still return a false value if, for instance, a communication channel closes so that the system knows that no more input will be available.

A.6.2.2150 ROLL

2 ROLL is equivalent to ROT, 1 ROLL is equivalent to SWAP and 0 ROLL is a null operation.

A.6.2.2182 SAVE-INPUT

SAVE-INPUT and RESTORE-INPUT allow the same degree of input source repositioning within a text file as is available with BLOCK input. SAVE-INPUT and RESTORE-INPUT hide the details of the operations necessary to accomplish this repositioning, and are used the same way with all input sources. This makes it easier for programs to reposition the input source, because they do not have to inspect several variables and take different action depending on the values of those variables.

SAVE-INPUT and RESTORE-INPUT are intended for repositioning within a single input source; for example, the following scenario is NOT allowed for a Standard Program:

   : XX
       SAVE-INPUT  CREATE
       S" RESTORE-INPUT" EVALUATE
       ABORT" couldn't restore input"
   ;

This is incorrect because, at the time RESTORE-INPUT is executed, the input source is the string via EVALUATE, which is not the same input source that was in effect when SAVE-INPUT was executed.

The following code is allowed:

: XX
    SAVE-INPUT  CREATE
    S" .( Hello)" EVALUATE
    RESTORE-INPUT ABORT" couldn't restore input"
;

After EVALUATE returns, the input source specification is restored to its previous state, thus SAVE-INPUT and RESTORE-INPUT are called with the same input source in effect.

In the above examples, the EVALUATE phrase could have been replaced by a phrase involving INCLUDE-FILE and the same rules would apply.

The Standard does not specify what happens if a program violates the above rules. A Standard System might check for the violation and return an exception indication from RESTORE-INPUT, or it might fail in an unpredictable way.

The return value from RESTORE-INPUT is primarily intended to report the case where the program attempts to restore the position of an input source whose position cannot be restored. The keyboard might be such an input source.

Nesting of SAVE-INPUT and RESTORE-INPUT is allowed. For example, the following situation works as expected:

: XX
    SAVE-INPUT
    S" f1" INCLUDED      \ The file "f1" includes:
    \   ... SAVE-INPUT ... RESTORE-INPUT ...
    \ End of file "f1"
    RESTORE-INPUT  ABORT" couldn't restore input"
;

In principle, RESTORE-INPUT could be implemented to always fail, e.g.:

: RESTORE-INPUT  ( x1 ... xn n -- flag )
    0 ?DO DROP LOOP TRUE
;

Such an implementation would not be useful in most cases. It would be preferable for a system to leave SAVE-INPUT and RESTORE-INPUT undefined, rather than to create a useless implementation. In the absence of the words, the application programmer could choose whether or not to create dummy implementations or to work-around the problem in some other way.

Examples of how an implementation might use the return values from SAVE-INPUT to accomplish the save/restore function:

Input Source    possible stack values
------------    ---------------------
block           >IN @  BLK @  2
EVALUATE        >IN @  1
keyboard        >IN @  1
text file       >IN @  lo-pos  hi-pos  3

These are examples only; a Standard Program may not assume any particular meaning for the individual stack items returned by SAVE-INPUT.

A.6.2.2290 TIB

The function of TIB has been superseded by SOURCE.

A.6.2.2295 TO

Historically, some implementations of TO have not explicitly parsed. Instead, they set a mode flag that is tested by the subsequent execution of name. ANS Forth explicitly requires that TO must parse, so that TO's effect will be predictable when it is used at the end of the parse area.

Typical use: x TO name

A.6.2.2298 TRUE

TRUE is equivalent to the phrase 0 0=.

A.6.2.2405 VALUE

Typical use:

0 VALUE DATA

: EXCHANGE ( n1 -- n2 ) DATA SWAP TO DATA ;

EXCHANGE leaves n1 in DATA and returns the prior value n2.

A.6.2.2440 WITHIN

We describe WITHIN without mentioning circular number spaces (an undefined term) or providing the code. Here is a number line with the overflow point (o) at the far right and the underflow point (u) at the far left:

u--------------------------------------------------------------o

There are two cases to consider: either the n2|u2..n3|u3 range straddles the overflow/underflow points or it does not. Lets examine the non-straddle case first:

u-------------------[.....................)------------------------o

The [ denotes n2|u2, the ) denotes n3|u3, and the dots and [ are numbers WITHIN the range. n3|u3 is greater than n2|u2, so the following tests will determine if n1|u1 is WITHIN n2|u2 and n3|u3: n2|u2 symbol 163 Symbol n1|u1 and n1|u1 < n3|u3.

In the case where the comparison range straddles the overflow/underflow points:

u...............)-----------------------------[........................o n3|u3 is less than n2|u2 and the following tests will determine if n1|u1 is WITHIN n2|u2 and n3|u3:

n2|u2 n1|u1 or n1|u1 < n3|u3.

WITHIN must work for both signed and unsigned arguments. One obvious implementation does not work:

: WITHIN  ( test low high -- flag )
    >R  OVER < 0= ( test flag1 )
    SWAP R> <     ( flag1 flag2 )
    AND
;

Assume two's-complement arithmetic on a 16-bit machine, and consider the following test:

	33000  32000 34000  WITHIN

The above implementation returns false for that test, even though the unsigned number 33000 is clearly within the range {{32000 .. 34000}}.

The problem is that, in the incorrect implementation, the signed comparison < gives the wrong answer when 32000 is compared to 33000, because when those numbers are treated as signed numbers, 33000 is treated as negative 32536, while 32000 remains positive.

Replacing < with U< in the above implementation makes it work with unsigned numbers, but causes problems with certain signed number ranges; in particular, the test:

	1  -5  5  WITHIN

would give an incorrect answer.

For two's-complement machines that ignore arithmetic overflow (most machines), the following implementation works in all cases:

:  WITHIN  ( test low high -- flag )   OVER - >R - R>  U<  ;

A.6.2.2530 [COMPILE]

Typical use: : name2 ... [COMPILE] name1 ... ; IMMEDIATE

A.6.2.2535 \

Typical use: 5 CONSTANT THAT \ THIS IS A COMMENT ABOUT THAT

Table of Contents
Next Section