Many Forth systems support access to a host file system, and many of these support interpretation of Forth from source text files. The Forth-83 Standard did not address host OS files. Nevertheless, a degree of similarity exists among modern implementations.
For example, files must be opened and closed, created and deleted. Forth file-system implementations differ mostly in the treatment and disposition of the exception codes, and in the format of the file-identification strings. The underlying mechanism for creating file-control blocks might or might not be visible. We have chosen to keep it invisible.
Files must also be read and written. Text files, if supported, must be read and written one line at a time. Interpretation of text files implies that they are somehow integrated into the text interpreter input mechanism. These and other requirements have shaped the file-access extensions word set.
Most of the existing implementations studied use simple English words for common host file functions: OPEN, CLOSE, READ, etc. Although we would have preferred to do likewise, there were so many minor variations in implementation of these words that adopting any particular meaning would have broken much existing code. We have used names with a suffix -FILE for most of these words. We encourage implementors to conform their single-word primitives to the ANS behaviors, and hope that if this is done on a widespread basis we can adopt better definition names in a future standard.
Specific rationales for members of this word set follow.
Many systems reuse file identifiers; when a file is closed, a subsequently opened file may be given the same identifier. If the original file has blocks still in block buffers, these will be incorrectly associated with the newly opened file with disastrous results. The block buffer system must be flushed to avoid this.
Some operating systems require that files be opened in a different mode to access their contents as an unstructured stream of binary data rather than as a sequence of lines.
The arguments to READ-FILE and WRITE-FILE are arrays of character storage elements, each element consisting of at least 8 bits. The Technical Committee intends that, in BIN mode, the contents of these storage elements can be written to a file and later read back without alteration. The Technical Committee has declined to address issues regarding the impact of wide characters on the File and Block word sets.
Typical use:
: X .. S" TEST.FTH" R/W CREATE-FILE ABORT" CREATE-FILE FAILED" ... ;
Here are two implementation alternatives for saving the input source specification in the presence of text file input:
1) Save the file position (as returned by FILE-POSITION) of the beginning of the line being interpreted. To restore the input source specification, seek to that position and re-read the line into the input buffer.
2) Allocate a separate line buffer for each active text input file, using that buffer as the input buffer. This method avoids the seek and reread step, and allows the use of pseudo-files such as pipes and other sequential-access-only communication channels.
Typical use: ... S" filename" INCLUDED ...
Typical use:
: X .. S" TEST.FTH" R/W OPEN-FILE ABORT" OPEN-FILE FAILED" ... ;
A typical sequential file-processing algorithm might look like:
BEGIN ( ) ... READ-FILE THROW ( length ) ?DUP WHILE ( length ) ... ( ) REPEAT ( )
In this example, THROW is used to handle (unexpected) exception conditions, which are reported as non-zero values of the ior return value from READ-FILE. End-of-file is reported as a zero value of the length return value.
Implementations are allowed to store the line terminator in the memory buffer in order to allow the use of line reading functions provided by host operating systems, some of which store the terminator. Without this provision, a temporary buffer might be needed. The two-character limitation is sufficient for the vast majority of existing operating systems. Implementations on host operating systems whose line terminator sequence is longer than two characters may have to take special action to prevent the storage of more than two terminator characters.
Standard Programs may not depend on the presence of any such terminator sequence in the buffer.
A typical line-oriented sequential file-processing algorithm might look like:
BEGIN ( ) . . . READ-LINE THROW ( length not-eof-flag ) WHILE ( length ) . . . ( ) REPEAT DROP ( )
In this example, THROW is used to handle (unexpected) I/O exception condition, which are reported as non-zero values of the ior return value from READ-LINE.
READ-LINE needs a separate end-of-file flag because empty (zero-length) lines are a routine occurrence, so a zero-length line cannot be used to signify end-of-file.
Typical use: ... S" ccc" ...
The interpretation semantics for S" are intended to provide a simple mechanism for entering a string in the interpretation state. Since an implementation may choose to provide only one buffer for interpreted strings, an interpreted string is subject to being overwritten by the next execution of S" in interpretation state. It is intended that no standard words other than S" should in themselves cause the interpreted string to be overwritten. However, since words such as EVALUATE, LOAD, INCLUDE-FILE and INCLUDED can result in the interpretation of arbitrary text, possibly including instances of S", the interpreted string may be invalidated by some uses of these words.
When the possibility of overwriting a string can arise, it is prudent to copy the string to a safe buffer allocated by the application.
Programs wishing to parse in the fashion of S" are advised to use PARSE or WORD COUNT instead of S", preventing the overwriting of the interpreted string buffer.
The Technical Committee has considered many proposals dealing with the inclusion and makeup of the Floating-Point Word Sets in ANS Forth. Although it has been argued that ANS Forth should not address floating-point arithmetic and numerous Forth applications do not need floating-point, there are a growing number of important Forth applications from spread sheets to scientific computations that require the use of floating-point arithmetic. Initially the Technical Committee adopted proposals that made the Forth Vendors Group Floating-Point Standard, first published in 1984, the framework for inclusion of Floating-Point in ANS Forth. There is substantial common practice and experience with the Forth Vendors Group Floating-Point Standard. Subsequently the Technical Committee adopted proposals that placed the basic floating-point arithmetic, stack and support words in the Floating-Point word set and the floating-point transcendental functions in the Floating-Point Extensions word set. The Technical Committee also adopted proposals that:
Several issues concerning the Floating-Point word set were resolved by consensus in the Technical Committee:
Floating-point stack: By default the floating-point stack is separate from the data and return stacks; however, an implementation may keep floating-point numbers on the data stack. A program can determine whether floating-point numbers are kept on the data stack by passing the string FLOATING-STACK to ENVIRONMENT? It is the experience of several members of the Technical Committee that with proper coding practices it is possible to write floating-point code that will run identically on systems with a separate floating-point stack and with floating-point numbers kept on the data stack.
Floating-point input: The current base must be DECIMAL. Floating-point input is not allowed in an arbitrary base. All floating-point numbers to be interpreted by an ANS Forth system must contain the exponent indicator E (see 12.3.7 Text interpreter input number conversion). Consensus in the Technical Committee deemed this form of floating-point input to be in more common use than the alternative that would have a floating-point input mode that would allow numbers with embedded decimal points to be treated as floating-point numbers.
Floating-point representation: Although the format and precision of the significand and the format and range of the exponent of a floating-point number are implementation defined in ANS Forth, the Floating-Point Extensions word set contains the words DF@, SF@, DF!, and SF! for fetching and storing double- and single-precision IEEE floating-point-format numbers to memory. The IEEE floating-point format is commonly used by numeric math co-processors and for exchange of floating-point data between programs and systems.
In defining custom floating-point data structures, be aware that CREATE doesn't necessarily leave the data space pointer aligned for various floating-point data types. Programs may comply with the requirement for the various kinds of floating-point alignment by specifying the appropriate alignment both at compile-time and execution time. For example:
: FCONSTANT ( F: r -- ) CREATE FALIGN HERE 1 FLOATS ALLOT F! DOES> ( F: -- r ) FALIGNED F@ ;
The Technical Committee has more than once received the suggestion that the text interpreter in Standard Forth systems should treat numbers that have an embedded decimal point, but no exponent, as floating-point numbers rather than double cell numbers. This suggestion, although it has merit, has always been voted down because it would break too much existing code; many existing implementations put the full digit string on the stack as a double number and use other means to inform the application of the location of the decimal point.
>FLOAT enables programs to read floating-point data in legible ASCII format. It accepts a much broader syntax than does the text interpreter since the latter defines rules for composing source programs whereas >FLOAT defines rules for accepting data. >FLOAT is defined as broadly as is feasible to permit input of data from ANS Forth systems as well as other widely used standard programming environments.
This is a synthesis of common FORTRAN practice. Embedded spaces are explicitly forbidden in much scientific usage, as are other field separators such as comma or slash.
While >FLOAT is not required to treat a string of blanks as zero, this behavior is strongly encouraged, since a future version of ANS Forth may include such a requirement.
For example, 1E3 F. displays 1000. .
Typical use: r FCONSTANT name
Typical use: : X ... [ ... ( r ) ] FLITERAL ... ;
Typical use: FVARIABLE name
This word provides a primitive for floating-point display. Some floating-point formats, including those specified by IEEE-754, allow representations of numbers outside of an implementation-defined range. These include plus and minus infinities, denormalized numbers, and others. In these cases we expect that REPRESENT will usually be implemented to return appropriate character strings, such as +infinity or nan, possibly truncated.
FSINCOS and FATAN2 are a complementary pair of operators which convert angles to 2-vectors and vice-versa. They are essential to most geometric and physical applications since they correctly and unambiguously handle this conversion in all cases except null vectors, even when the tangent of the angle would be infinite.
FSINCOS returns a Cartesian unit vector in the direction of the given angle, measured counter-clockwise from the positive X-axis. The order of results on the stack, namely y underneath x, permits the 2-vector data type to be additionally viewed and used as a ratio approximating the tangent of the angle. Thus the phrase FSINCOS F/ is functionally equivalent to FTAN, but is useful over only a limited and discontinuous range of angles, whereas FSINCOS and FATAN2 are useful for all angles. This ordering has been found convenient for nearly two decades, and has the added benefit of being easy to remember. A corollary to this observation is that vectors in general should appear on the stack in this order.
The argument order for FATAN2 is the same, converting a vector in the conventional representation to a scalar angle. Thus, for all angles, FSINCOS FATAN2 is an identity within the accuracy of the arithmetic and the argument range of FSINCOS. Note that while FSINCOS always returns a valid unit vector, FATAN2 will accept any non-null vector. An ambiguous condition exists if the vector argument to FATAN2 has zero magnitude.
This function allows accurate computation when its arguments are close to zero, and provides a useful base for the standard exponential functions. Hyperbolic functions such as cosh(x) can be efficiently and accurately implemented by using FEXPM1; accuracy is lost in this function for small values of x if the word FEXP is used.
An important application of this word is in finance; say a loan is repaid at 15% per year; what is the daily rate? On a computer with single precision (six decimal digit) accuracy:
1. Using FLN and FEXP:
FLN of 1.15 = 0.139762, divide by 365 = 3.82910E-4, form the exponent using FEXP = 1.00038, and subtract one (1) and convert to percentage = 0.038%.
Thus we only have two digit accuracy.
2. Using FLNP1 and FEXPM1:
FLNP1 of 0.15 = 0.139762, (this is the same value as in the first example, although with the argument closer to zero it may not be so) divide by 365 = 3.82910E-4, form the exponent and subtract one (1) using FEXPM1 = 3.82983E-4, and convert to percentage = 0.0382983%.
This is full six digit accuracy.
The presence of this word allows the hyperbolic functions to be computed with usable accuracy. For example, the hyperbolic sine can be defined as:
: FSINH ( r1 -- r2 ) FEXPM1 FDUP FDUP 1.0E0 F+ F/ F+ 2.0E0 F/ ;
This function allows accurate compilation when its arguments are close to zero, and provides a useful base for the standard logarithmic functions. For example, FLN can be implemented as:
: FLN 1.0E0 F- FLNP1 ;
This provides the three types of floating point equality in common use -- close in absolute terms, exact equality as represented, and relatively close.
Table of Contents
Next Section