D. Compatibility analysis of ANS Forth (informative annex)

Prior to ANS Forth, there were several industry standards for Forth. The most influential are listed here in chronological order, along with the major differences between ANS Forth and the most recent, Forth-83.

D.1 FIG Forth (circa 1978)

FIG Forth was a model implementation of the Forth language developed by the Forth Interest Group (FIG). In FIG Forth, a relatively small number of words were implemented in processor-dependent machine language and the rest of the words were implemented in Forth. The FIG model was placed in the public domain, and was ported to a wide variety of computer systems. Because the bulk of the FIG Forth implementation was the same across all machines, programs written in FIG Forth enjoyed a substantial degree of portability, even for system-level programs that directly manipulate the internals of the Forth system implementation.

FIG Forth implementations were influential in increasing the number of people interested in using Forth. Many people associate the implementation techniques embodied in the FIG Forth model with the nature of Forth.

However, FIG Forth was not necessarily representative of commercial Forth implementations of the same era. Some of the most successful commercial Forth systems used implementation techniques different from the FIG Forth model.

D.2 Forth-79

The Forth-79 Standard resulted from a series of meetings from 1978 to 1980, by the Forth Standards Team, an international group of Forth users and vendors (interim versions known as Forth 77 and Forth 78 were also released by the group).

Forth-79 described a set of words defined on a 16-bit, twos-complement, unaligned, linear byte-addressing virtual machine. It prescribed an implementation technique known as indirect threaded code, and used the ASCII character set.

The Forth-79 Standard served as the basis for several public domain and commercial implementations, some of which are still available and supported today.

D.3 Forth-83

The Forth-83 Standard, also by the Forth Standards Team, was released in 1983. Forth-83 attempted to fix some of the deficiencies of Forth-79.

Forth-83 was similar to Forth-79 in most respects. However, Forth-83 changed the definition of several well-defined features of Forth-79. For example, the rounding behavior of integer division, the base value of the operands of PICK and ROLL, the meaning of the address returned by ', the compilation behavior of ', the value of a true flag, the meaning of NOT, and the chaining behavior of words defined by VOCABULARY were all changed. Forth-83 relaxed the implementation restrictions of Forth-79 to allow any kind of threaded code, but it did not fully allow compilation to native machine code (this was not specifically prohibited, but rather was an indirect consequence of another provision).

Many new Forth implementations were based on the Forth-83 Standard, but few strictly compliant Forth-83 implementations exist.

Although the incompatibilities resulting from the changes between Forth-79 and Forth-83 were usually relatively easy to fix, a number of successful Forth vendors did not convert their implementations to be Forth-83 compliant. For example, the most successful commercial Forth for Apple Macintosh computers is based on Forth-79.

D.4 Recent developments

Since the Forth-83 Standard was published, the computer industry has undergone rapid and profound changes. The speed, memory capacity, and disk capacity of affordable personal computers have increased by factors of more than 100. 8-bit processors have given way to 16-bit processors, and now 32-bit processors are commonplace.

The operating systems and programming-language environments of small systems are much more powerful than they were in the early 80's.

The personal-computer marketplace has changed from a predominantly hobbyist market to a mature business and commercial market.

Improved technology for designing custom microprocessors has resulted in the design of numerous Forth chips, computers optimized for the execution of the Forth language.

The market for ROM-based embedded control computers has grown substantially.

In order to take full advantage of this evolving technology, and to better compete with other programming languages, many recent Forth implementations have ignored some of the rules of previous Forth standards. In particular:

generation, and optimization techniques, rather than the traditional threaded code.

Competitive pressure from other programming languages (predominantly C) and from other Forth vendors have led Forth vendors to optimizations that do not fit in well with the virtual machine model implied by existing Forth standards.

D.5 ANS Forth approach

The ANS Forth committee addressed the serious fragmentation of the Forth community caused by the differences between Forth-79 and Forth-83, and the divergence from either of these two industry standards caused by marketplace pressures.

Consequently, the committee has chosen to base its compatibility decisions not upon a strict comparison with the Forth-83 Standard, but instead upon consideration of the variety of existing implementations, especially those with substantial user bases and/or considerable success in the marketplace.

The committee feels that, if ANS Forth prescribes stringent requirements upon the virtual machine model, as did the previous standards, then many implementors will chose not to comply with ANS Forth. The committee hopes that ANS Forth will serve to unify rather than to further divide the Forth community, and thus has chosen to encompass rather than invalidate popular implementation techniques.

Many of the changes from Forth-83 are justified by this rationale. Most fall into the category that an ANS Forth Standard Program may not assume x, where x is an entitlement resulting from the virtual machine model prescribed by the Forth-83 Standard. The committee feels that these restrictions are reasonable, especially considering that a substantial number of existing Forth implementations do not correctly implement the Forth-83 virtual model, thus the Forth-83 entitlements exist in theory but not in practice.

Another way of looking at this is that while ANS Forth acknowledges the diversity of current Forth practice, it attempts to document the similarity therein. In some sense, ANS Forth is thus a description of reality rather than a prescription for a particular virtual machine.

Since there is no previous American National Standard for Forth, the action requirements prescribed by section 3.4 of X3/SD-9, Policy and Guidelines, regarding previous standards do not apply.

The following discussion describes differences between ANS Forth and Forth-83. In most cases, Forth-83 is representative of Forth-79 and FIG Forth for the purposes of this discussion. In many of these cases, however, ANS Forth is more representative of the existing state of the Forth industry than the previously-published standards.

D.6 Differences from Forth-83

D.6.1 Stack width

Forth-83 specifies that stack items occupy 16 bits. This includes addresses, flags, and numbers. ANS Forth specifies that stack items are at least 16 bits; the actual size must be documented by the implementation.

Words affected: all arithmetic, logical and addressing operators

Reason: 32-bit machines are becoming commonplace. A 16-bit Forth system on a 32-bit machine is not competitive.

Impact: Programs that assume 16-bit stack width will continue to run on 16-bit machines; ANS Forth does not require a different stack width, but simply allows it. Many programs will be unaffected (but see address unit).

Transition/Conversion: Programs which use bit masks with the high bits set may have to be changed, substituting either an implementation-defined bit-mask constant, or a procedure to calculate a bit mask in a stack-width-independent way. Here are some procedures for constructing width-independent bit masks:

1  CONSTANT LO-BIT
TRUE 1 RSHIFT  INVERT  CONSTANT HI-BIT

: LO-BITS  ( n -- mask )
    0 SWAP  0 ?DO  1 LSHIFT  LO-BIT OR  LOOP ;

: HI-BITS  ( n -- mask )
    0 SWAP  0 ?DO  1 RSHIFT  HI-BIT OR  LOOP ;

Programs that depend upon the modulo 65536 behavior implicit in 16-bit arithmetic operations will need to be rewritten to explicitly perform the modulus operation in the appropriate places. The committee believes that such assumptions occur infrequently. Examples: some checksum or CRC calculations, some random number generators and most fixed-point fractional math.

D.6.2 Number representation

Forth-83 specifies two's-complement number representation and arithmetic. ANS Forth also allows one's-complement and signed-magnitude.

Words affected: all arithmetic and logical operators, LOOP, +LOOP

Reason: Some computers use one's-complement or signed-magnitude. The committee did not wish to force Forth implementations for those machines to emulate two's-complement arithmetic, and thus incur severe performance penalties. The experience of some committee members with such machines indicates that the usage restrictions necessary to support their number representations are not overly burdensome.

Impact: An ANS Forth Standard Program may declare an environmental dependency on two's-complement arithmetic. This means that the otherwise-Standard Program is only guaranteed to work on two's-complement machines. Effectively, this is not a severe restriction, because the overwhelming majority of current computers use two's-complement. The committee knows of no Forth-83 compliant implementations for non-two's-complement machines at present, so existing Forth-83 programs will still work on the same class of machines on which they currently work.

Transition/Conversion: Existing programs wishing to take advantage of the possibility of ANS Forth Standard Systems on non-two's-complement machines may do so by eliminating the use of arithmetic operators to perform logical functions, by deriving bit-mask constants from bit operations as described in the section about stack width, by restricting the usage range of unsigned numbers to the range of positive numbers, and by using the provided operators for conversion from single numbers to double numbers.

D.6.3 Address units

Forth-83 specifies that each unique address refers to an 8-bit byte in memory. ANS Forth specifies that the size of the item referred to by each unique address is implementation-defined, but, by default, is the size of one character. Forth-83 describes many memory operations in terms of a number of bytes. ANS Forth describes those operations in terms of a number of either characters or address units.

Words affected: those with address unit arguments

Reason: Some machines, including the most popular Forth chip, address 16-bit memory locations instead of 8-bit bytes.

Impact: Programs may choose to declare an environmental dependency on byte addressing, and will continue to work on the class of machines for which they now work. In order for a Forth implementation on a word-addressed machine to be Forth-83 compliant, it would have to simulate byte addressing at considerable cost in speed and memory efficiency. The committee knows of no such Forth-83 implementations for such machines, thus an environmental dependency on byte addressing does not restrict a Standard Program beyond its current de facto restrictions.

Transition/Conversion: The new CHARS and CHAR+ address arithmetic operators should be used for programs that require portability to non-byte-addressed machines. The places where such conversion is necessary may be identified by searching for occurrences of words that accept a number of address units as an argument (e.g., MOVE , ALLOT).

D.6.4 Address increment for a cell is no longer two

As a consequence of Forth-83's simultaneous specification of 16-bit stack width and byte addressing, the number two could reliably be used in address calculations involving memory arrays containing items from the stack. Since ANS Forth requires neither 16-bit stack width nor byte addressing, the number two is no longer necessarily appropriate for such calculations.

Words affected: @ ! +! 2+ 2* 2- +LOOP

Reason: See reasons for Address Units and Stack Width

Impact: In this respect, existing programs will continue to work on machines where a stack cell occupies two address units when stored in memory. This includes most machines for which Forth-83 compliant implementations currently exist. In principle, it would also include 16-bit-word-addressed machines with 32-bit stack width, but the committee knows of no examples of such machines.

Transition/Conversion: The new CELLS and CELL+ address arithmetic operators should be used for portable programs. The places where such conversion is necessary may be identified by searching for the character 2 and determining whether or not it is used as part of an address calculation. The following substitutions are appropriate within address calculations:

Old                     New
---                     ---
2+  or  2 +             CELL+
2*  or  2 *             CELLS
2-  or  2 -             1 CELLS -
2/  or  2 /             1 CELLS /
2                       1 CELLS

The number 2 by itself is sometimes used for address calculations as an argument to +LOOP, when the loop index is an address. When converting the word 2/ which operates on negative dividends, one should be cognizant of the rounding method used.

D.6.5 Address alignment

Forth-83 imposes no restriction upon the alignment of addresses to any boundary. ANS Forth specifies that a Standard System may require alignment of addresses for use with various @ and ! operators.

Words Affected: ! +! 2! 2@ @ ? ,

Reason: Many computers have hardware restrictions that favor the use of aligned addresses. On some machines, the native memory-access instructions will cause an exception trap if used with an unaligned address. Even on machines where unaligned accesses do not cause exception traps, aligned accesses are usually faster.

Impact: All of the ANS Forth words that return addresses suitable for use with aligned @ and ! words must return aligned addresses. In most cases, there will be no problem. Problems can arise from the use of user-defined data structures containing a mixture of character data and cell-sized data.

Many existing Forth systems, especially those currently in use on computers with strong alignment requirements, already require alignment. Much existing Forth code that is currently in use on such machines has already been converted for use in an aligned environment.

Transition/Conversion: There are two possible approaches to conversion of programs for use on a system requiring address alignment.

The easiest approach is to redefine the system's aligned @ and ! operators so that they do not require alignment. For example, on a 16-bit little-endian byte-addressed machine, unaligned @ and ! could be defined:

	: @  ( addr -- x )  DUP C@ SWAP CHAR+ C@ 8 LSHIFT OR  ;
	: !  ( x addr -- )  OVER 8 RSHIFT OVER CHAR+ C! C!  ;

These definitions, and similar ones for +!, 2@, 2!, ,, and ? as needed, can be compiled before an unaligned application, which will then work as expected.

This approach may conserve memory if the application uses substantial numbers of data structures containing unaligned fields.

Another approach is to modify the application's source code to eliminate unaligned data fields. The ANS Forth words ALIGN and ALIGNED may be used to force alignment of data fields. The places where such alignment is needed may be determined by inspecting the parts of the application where data structures (other than simple variables) are defined, or by smart compiler techniques (see the Smart Compiler discussion below).

This approach will probably result in faster application execution speed, at the possible expense of increased memory utilization for data structures.

Finally, it is possible to combine the preceding techniques by identifying exactly those data fields that are unaligned, and using unaligned versions of the memory access operators for only those fields. This hybrid approach affects a compromise between execution speed and memory utilization.

Table of Contents
Next Section