This Standard is more extensive than previous industry standards for the Forth language. Several things made this necessary:
The result of the effort to satisfy all of these objectives is a Standard arranged so that the required word set remains small. Thus ANS Forth can be provided for resource-constrained embedded systems. Words beyond those in the required word set are organized into a number of optional word sets and their extensions, enabling implementation of tailored systems that are Standard.
When judging relative merits, the members of the X3J14 Technical Committee were guided by the following goals (listed in alphabetic order):
From the beginning, the X3J14 Technical Committee faced not only conflicting ideas as to what real Forth is, but also conflicting needs of the various groups within the Forth community. At one extreme were those who pressed for a bare Forth. At the other extreme were those who wanted a fat Forth. Many were somewhere in between. All were convinced of the rightness of their own position and of the wrongness of at least one of the two extremes. The committee's composition reflected this full range of interests.
The approach we have taken is to define a Core word set establishing a greatest lower bound for required system functionality and to provide a portfolio of optional word sets for special purposes. This simple approach parallels the fundamental nature of Forth as an extensible language, and thereby achieves a kind of meta-extensibility.
With this key, high-level compromise, regardless of the actual makeup of the individual word sets, a firm and workable framework is established for the long term. One may or may not agree that there should be a Locals word set, or that the word COMPILE, belongs in the Core Extensions word set. But at least there is a mechanism whereby such things can be included in a logical and orderly manner.
Several implications of this scheme of optional word sets are significant.
First, ANS Forth systems can continue to be implemented on a greater range of hardware than could be claimed by almost any other single language. Since only the Core word set is required, very limited hardware will be able to accommodate an ANS Forth implementation.
Second, a greater degree of portability of applications, and of programmers, is anticipated. The optional word sets standardize various functions (e.g., floating point) that were widely implemented before, but not with uniform definition names and methodologies, nor the same levels of completeness. With such words now standardized in the optional word sets, communications between programmers - verbally, via magazine or journal articles, etc. - will leap to a new level of facility, and the shareability of code and applications should rise dramatically.
Third, ANS Forth systems may be designed to offer the user the power to selectively, even dynamically, include or exclude one or more of the optional word sets or portions thereof. Also, lower-priced products may be offered for the user who needs the Core word set and not much more. Thus, virtually unlimited flexibility will be available to the user.
But these advantages have a price. The burden is on the user to decide what capabilities are desired, and to select product offerings accordingly, especially when portability of applications is important. We do not expect most implementors to attempt to provide all word sets, but rather to select those most valuable to their intended markets.
The basic requirement is that if the implementor claims to have a particular optional word set the entire required portion of that word set must be available. If the implementor wishes to offer only part of an optional word set, it is acceptable to say, for example, This system offers portions of the [named] word set, particularly if the selected or excluded words are itemized clearly.
Each optional word set will probably appeal to a particular constituency. For example, scientists performing complex mathematical analysis may place a higher value on the Floating-Point word set than programmers developing simple embedded controllers. As in the case of the core extensions, we expect implementors to offer those word sets they expect will be valued by their users.
Optional word sets may be offered in source form or otherwise factored so that the user may selectively load them.
The extensions to the optional word sets include words which are deemed less essential to performing the primary activity supported by the word set, though clearly relevant to it. As in the case of the Core Extensions, implementors may selectively add itemized subsets of a word set extension providing the labeling doesn't mislead the user into thinking incorrectly that all words are present.
On subroutine threaded Forth systems, everything is object code. There are no traditional code or data fields. Only a word defined by CREATE or by a word that calls CREATE has a data field. Only a data field defined via CREATE can be manipulated portably.
The Core word set contains the essential body of words in a Forth system. It is the only required word set. Other word sets defined in this Standard are optional additions to make it possible to provide Standard Systems with tailored levels of functionality.
The use of -sys, orig, and dest data types in stack effect diagrams conveys two pieces of information. First, it warns the reader that many implementations use the data stack in unspecified ways for those purposes, so that items underneath on either the control-flow or data stacks are unavailable. Second, in cases where orig and dest are used, explicit pairing rules are documented on the assumption that all systems will implement that model so that its results are equivalent to employment of some stack, and that in fact many implementations do use the data stack for this purpose. However, nothing in this Standard requires that implementations actually employ the data stack (or any other) for this purpose so long as the implied behavior of the model is maintained.
Forth systems are unusually simple to develop, in comparison with compilers for more conventional languages such as C. In addition to Forth systems supported by vendors, public-domain implementations and implementation guides have been widely available for nearly twenty years, and a large number of individuals have developed their own Forth systems. As a result, a variety of implementation approaches have developed, each optimized for a particular platform or target market.
The X3J14 Technical Committee has endeavored to accommodate this diversity by constraining implementors as little as possible, consistent with a goal of defining a standard interface between an underlying Forth System and an application program being developed on it.
Similarly, we will not undertake in this section to tell you how to implement a Forth System, but rather will provide some guidance as to what the minimum requirements are for systems that can properly claim compliance with this Standard.
Most computers deal with arbitrary bit patterns. There is no way to determine by inspection whether a cell contains an address or an unsigned integer. The only meaning a datum possesses is the meaning assigned by an application.
When data are operated upon, the meaning of the result depends on the meaning assigned to the input values. Some combinations of input values produce meaningless results: for instance, what meaning can be assigned to the arithmetic sum of the ASCII representation of the character A and a TRUE flag? The answer may be no meaning; or alternatively, that operation might be the first step in producing a checksum. Context is the determiner.
The discipline of circumscribing meaning which a program may assign to various combinations of bit patterns is sometimes called data typing. Many computer languages impose explicit data typing and have compilers that prevent ill-defined operations.
Forth rarely explicitly imposes data-type restrictions. Still, data types implicitly do exist, and discipline is required, particularly if portability of programs is a goal. In Forth, it is incumbent upon the programmer (rather than the compiler) to determine that data are accurately typed.
This section attempts to offer guidance regarding de facto data typing in Forth.
The correct identification and proper manipulation of the character data type is beyond the purview of Forth's enforcement of data type by means of stack depth. Characters do not necessarily occupy the entire width of their single stack entry with meaningful data. While the distinction between signed and unsigned character is entirely absent from the formal specification of Forth, the tendency in practice is to treat characters as short positive integers when mathematical operations come into play.
a) Standard Character Set
1) The storage unit for the character data type (C@, C!, FILL, etc.) must be able to contain unsigned numbers from 0 through 255.
2) An implementation is not required to restrict character storage to that range, but a Standard Program without environmental dependencies cannot assume the ability to store numbers outside that range in a char location.
3) The allowed number representations are two's-complement, one's-complement, and signed-magnitude. Note that all of these number systems agree on the representation of positive numbers.
4) Since a char can store small positive numbers and since the character data type is a sub-range of the unsigned integer data type, C! must store the n least-significant bits of a cell (8 <= n <= bits/cell). Given the enumeration of allowed number representations and their known encodings, TRUE xx C! xx C@ must leave a stack item with some number of bits set, which will thus will be accepted as non-zero by IF.
5) For the purposes of input (KEY, ACCEPT, etc.) and output (EMIT, TYPE, etc.), the encoding between numbers and human-readable symbols is ISO646/IRV (ASCII) within the range from 32 to 126 (space to ~). EBCDIC is out (most EBCDIC computer systems support ASCII too). Outside that range, it is up to the implementation. The obvious implementation choice is to use ASCII control characters for the range from 0 to 31, at least for the displayable characters in that range (TAB, RETURN, LINEFEED, FORMFEED). However, this is not as clear-cut as it may seem, because of the variation between operating systems on the treatment of those characters. For example, some systems TAB to 4 character boundaries, others to 8 character boundaries, and others to preset tab stops. Some systems perform an automatic linefeed after a carriage return, others perform an automatic carriage return after a linefeed, and others do neither.
The codes from 128 to 255 may eventually be standardized, either formally or informally, for use as international characters, such as the letters with diacritical marks found in many European languages. One such encoding is the 8-bit ISO Latin-1 character set. The computer marketplace at large will eventually decide which encoding set of those characters prevails. For Forth implementations running under an operating system (the majority of those running on standard platforms these days), most Forth implementors will probably choose to do whatever the system does, without performing any remapping within the domain of the Forth system itself.
6) A Standard Program can depend on the ability to receive any character in the range 32 ... 126 through KEY, and similarly to display the same set of characters with EMIT. If a program must be able to receive or display any particular character outside that range, it can declare an environmental dependency on the ability to receive or display that character.
7) A Standard Program cannot use control characters in definition names. However, a Standard System is not required to enforce this prohibition. Thus, existing systems that currently allow control characters in words names from BLOCK source may continue to allow them, and programs running on those systems will continue to work. In text file source, the parsing action with space as a delimiter (e.g., BL WORD) treats control characters the same as spaces. This effectively implies that you cannot use control characters in definition names from text-file source, since the text interpreter will treat the control characters as delimiters. Note that this control-character folding applies only when space is the delimiter, thus the phrase CHAR ) WORD may collect a string containing control characters.
b) Storage and retrieval
Characters are transferred from the data stack to memory by C! and from memory to the data stack by C@. A number of lower-significance bits equivalent to the implementation-dependent width of a character are transferred from a popped data stack entry to an address by the action of C! without affecting any bits which may comprise the higher-significance portion of the cell at the destination address; however, the action of C@ clears all higher-significance bits of the data stack entry which it pushes that are beyond the implementation-dependent width of a character (which may include implementation-defined display information in the higher-significance bits). The programmer should keep in mind that operating upon arbitrary stack entries with words intended for the character data type may result in truncation of such data.
c) Manipulation on the stack
In addition to C@ and C!, characters are moved to, from and upon the data stack by the following words:
>R ?DUP DROP DUP OVER PICK R> R@ ROLL ROT SWAP
d) Additional operations
The following mathematical operators are valid for character data:
+ - * / /MOD MOD
The following comparison and bitwise operators may be valid for characters, keeping in mind that display information cached in the most significant bits of characters in an implementation-defined fashion may have to be masked or otherwise dealt with:
AND OR > < U> U< = <> 0= 0<> MAX MIN LSHIFT RSHIFT
A.3.1.3 Single-cell types
A single-cell stack entry viewed without regard to typing is the fundamental data type of Forth. All other data types are actually represented by one or more single-cell stack entries.
a) Storage and retrieval
Single-cell data are transferred from the stack to memory by !; from memory to the stack by @. All bits are transferred in both directions and no type checking of any sort is performed, nor does the Standard System check that a memory address used by ! or @ is properly aligned or properly sized to hold the datum thus transferred.
b) Manipulation on the stack
Here is a selection of the most important words which move single-cell data to, from and upon the data stack:
! @ >R ?DUP DROP DUP OVER PICK R> R@ ROLL ROT SWAP
c) Comparison operators
The following comparison operators are universally valid for one or more single cells:
= <> 0= <>
A FALSE flag is a single-cell datum with all bits unset, and a TRUE flag is a single-cell datum with all bits set. While Forth words which test flags accept any non-null bit pattern as true, there exists the concept of the well-formed flag. If an operation whose result is to be used as a flag may produce any bit-mask other than TRUE or FALSE, the recommended discipline is to convert the result to a well-formed flag by means of the Forth word 0<> so that the result of any subsequent logical operations on the flag will be predictable.
In addition to the words which move, fetch and store single-cell items, the following words are valid for operations on one or more flag data residing on the data stack:
AND OR XOR INVERT
Table of Contents
Next Section