The Text Interpreter - Gforth Manual

Next: The Input Stream, Previous: Compiling words, Up: Words

5.13 The Text Interpreter

The text interpreter¹ is an endless loop that processes input from the current input device. It is also called the outer interpreter, in contrast to the inner interpreter (see Engine) which executes the compiled Forth code on interpretive implementations.

The text interpreter operates in one of two states: interpret state and compile state. The current state is defined by the aptly-named variable state.

This section starts by describing how the text interpreter behaves when it is in interpret state, processing input from the user input device – the keyboard. This is the mode that a Forth system is in after it starts up.

The text interpreter works from an area of memory called the input buffer², which stores your keyboard input when you press the <RET> key. Starting at the beginning of the input buffer, it skips leading spaces (called delimiters) then parses a string (a sequence of non-space characters) until it reaches either a space character or the end of the buffer. Having parsed a string, it makes two attempts to process it:

It looks for the string in a dictionary of definitions. If the string is found, the string names a definition (also known as a word) and the dictionary search returns information that allows the text interpreter to perform the word's interpretation semantics. In most cases, this simply means that the word will be executed.
If the string is not found in the dictionary, the text interpreter attempts to treat it as a number, using the rules described in Number Conversion. If the string represents a legal number in the current radix, the number is pushed onto a parameter stack (the data stack for integers, the floating-point stack for floating-point numbers).

If both attempts fail, or if the word is found in the dictionary but has no interpretation semantics³ the text interpreter discards the remainder of the input buffer, issues an error message and waits for more input. If one of the attempts succeeds, the text interpreter repeats the parsing process until the whole of the input buffer has been processed, at which point it prints the status message “ ok” and waits for more input.

The text interpreter keeps track of its position in the input buffer by updating a variable called >IN (pronounced “to-in”). The value of >IN starts out as 0, indicating an offset of 0 from the start of the input buffer. The region from offset >IN @ to the end of the input buffer is called the parse area⁴. This example shows how >IN changes as the text interpreter parses the input buffer:

     : remaining >IN @ SOURCE 2 PICK - -ROT + SWAP
       CR ." ->" TYPE ." <-" ; IMMEDIATE
     
     1 2 3 remaining + remaining .
     
     : foo 1 2 3 remaining SWAP remaining ;

The result is:

     ->+ remaining .<-
     ->.<-5  ok
     
     ->SWAP remaining ;-<
     ->;<-  ok

The value of >IN can also be modified by a word in the input buffer that is executed by the text interpreter. This means that a word can “trick” the text interpreter into either skipping a section of the input buffer⁵ or into parsing a section twice. For example:

     : lat ." <<foo>>" ;
     : flat ." <<bar>>" >IN DUP @ 3 - SWAP ! ;

When flat is executed, this output is produced⁶:

     <<bar>><<foo>>

This technique can be used to work around some of the interoperability problems of parsing words. Of course, it's better to avoid parsing words where possible.

Two important notes about the behaviour of the text interpreter:

It processes each input string to completion before parsing additional characters from the input buffer.
It treats the input buffer as a read-only region (and so must your code).

When the text interpreter is in compile state, its behaviour changes in these ways:

If a parsed string is found in the dictionary, the text interpreter will perform the word's compilation semantics. In most cases, this simply means that the execution semantics of the word will be appended to the current definition.
When a number is encountered, it is compiled into the current definition (as a literal) rather than being pushed onto a parameter stack.
If an error occurs, state is modified to put the text interpreter back into interpret state.
Each time a line is entered from the keyboard, Gforth prints “ compiled” rather than “ ok”.

When the text interpreter is using an input device other than the keyboard, its behaviour changes in these ways:

When the parse area is empty, the text interpreter attempts to refill the input buffer from the input source. When the input source is exhausted, the input source is set back to the previous input source.
It doesn't print out “ ok” or “ compiled” messages each time the parse area is emptied.
If an error occurs, the input source is set back to the user input device.

You can read about this in more detail in Input Sources.

>in       – addr         core       “to-in”

input-var variable – a-addr is the address of a cell containing the char offset from the start of the input buffer to the start of the parse area.

source       – addr u         core       “source”

Return address addr and length u of the current input buffer

tib       – addr         core-ext-obsolescent       “t-i-b”

#tib       – addr         core-ext-obsolescent       “number-t-i-b”

input-var variable – a-addr is the address of a cell containing the number of characters in the terminal input buffer. OBSOLESCENT: source superceeds the function of this word.

Footnotes

[1] This is an expanded version of the material in Introducing the Text Interpreter.

[2] When the text interpreter is processing input from the keyboard, this area of memory is called the terminal input buffer (TIB) and is addressed by the (obsolescent) words TIB and #TIB.

[3] This happens if the word was defined as COMPILE-ONLY.

[4] In other words, the text interpreter processes the contents of the input buffer by parsing strings from the parse area until the parse area is empty.

[5] This is how parsing words work.

[6] Exercise for the reader: what would happen if the 3 were replaced with 4?