--- gforth/doc/gforth.ds 1999/03/11 22:52:11 1.25 +++ gforth/doc/gforth.ds 1999/03/23 20:24:21 1.26 @@ -32,7 +32,7 @@ Programming style note: @ifinfo This file documents Gforth @value{VERSION} -Copyright @copyright{} 1995-1998 Free Software Foundation, Inc. +Copyright @copyright{} 1995-1999 Free Software Foundation, Inc. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice @@ -71,7 +71,7 @@ Copyright @copyright{} 1995-1998 Free So @center Jens Wilke @center Neal Crook @sp 3 -@center This manual is permanently under construction and was last updated on 16-Feb-1999 +@center This manual is permanently under construction and was last updated on 23-Mar-1999 @comment The following two commands start the copyright page. @page @@ -107,8 +107,8 @@ personal machines. This manual correspon @menu * License:: The GPL -* Introduction:: An introduction to ANS Forth * Goals:: About the Gforth Project +* Introduction:: An introduction to ANS Forth * Invoking Gforth:: Starting (and exiting) Gforth * Words:: Forth words available in Gforth * Error messages:: How to interpret them @@ -129,6 +129,10 @@ personal machines. This manual correspon @detailmenu --- The Detailed Node Listing --- +Goals of Gforth + +* Gforth Extensions Sinful?:: + An Introduction to ANS Forth * Introducing the Text Interpreter:: @@ -139,10 +143,6 @@ An Introduction to ANS Forth * Review - elements of a Forth system:: * Exercises:: -Goals of Gforth - -* Gforth Extensions Sinful?:: - Forth Words * Notation:: @@ -152,21 +152,20 @@ Forth Words * Stack Manipulation:: * Memory:: * Control Structures:: -* Locals:: * Defining Words:: * The Text Interpreter:: -* Structures:: -* Object-oriented Forth:: * Tokens for Words:: * Word Lists:: * Environmental Queries:: * Files:: -* Including Files:: * Blocks:: * Other I/O:: * Programming Tools:: * Assembler and Code Words:: * Threading Words:: +* Locals:: +* Structures:: +* Object-oriented Forth:: * Passing Commands to the OS:: * Miscellaneous Words:: @@ -202,18 +201,6 @@ Control Structures * Calls and returns:: * Exception Handling:: -Locals - -* Gforth locals:: -* ANS Forth locals:: - -Gforth locals - -* Where are locals visible by name?:: -* How long do locals live?:: -* Programming Style:: -* Implementation:: - Defining Words * Simple Defining Words:: @@ -229,6 +216,45 @@ The Text Interpreter * Literals:: * Interpreter Directives:: +Word Lists + +* Why use word lists?:: +* Word list examples:: + +Files + +* Forth source files:: +* General files:: +* Search Paths:: +* Forth Search Paths:: +* General Search Paths:: + +Other I/O + +* Simple numeric output:: +* Formatted numeric output:: +* String Formats:: +* Displaying characters and strings:: +* Input:: + +Programming Tools + +* Debugging:: Simple and quick. +* Assertions:: Making your programs self-checking. +* Singlestep Debugger:: Executing your program word by word. + +Locals + +* Gforth locals:: +* ANS Forth locals:: + +Gforth locals + +* Where are locals visible by name?:: +* How long do locals live?:: +* Programming Style:: +* Implementation:: + Structures * Why explicit structure support?:: @@ -274,32 +300,6 @@ The @file{mini-oof.fs} model * Mini-OOF Example:: * Mini-OOF Implementation:: -Word Lists - -* Why use word lists?:: -* Word list examples:: - -Including Files - -* Words for Including:: -* Search Path:: -* Forth Search Paths:: -* General Search Paths:: - -Other I/O - -* Simple numeric output:: Predefined formats -* Formatted numeric output:: Formatted (pictured) output -* String Formats:: How Forth stores strings in memory -* Displaying characters and strings:: Other stuff -* Input:: Input - -Programming Tools - -* Debugging:: Simple and quick. -* Assertions:: Making your programs self-checking. -* Singlestep Debugger:: Executing your program word by word. - Tools * ANS Report:: Report the words used, sorted by wordset. @@ -422,7 +422,7 @@ Other Forth-related information @end detailmenu @end menu -@node License, Introduction, Top, Top +@node License, Goals, Top, Top @unnumbered GNU GENERAL PUBLIC LICENSE @center Version 2, June 1991 @@ -823,8 +823,101 @@ from other Forth compilers. However, thi reference manual. @end iftex -@c ---------------------------------------------------------- -@node Introduction, Goals, License, Top + +@c ****************************************************************** +@node Goals, Introduction, License, Top +@comment node-name, next, previous, up +@chapter Goals of Gforth +@cindex goals of the Gforth project +The goal of the Gforth Project is to develop a standard model for +ANS Forth. This can be split into several subgoals: + +@itemize @bullet +@item +Gforth should conform to the ANS Forth Standard. +@item +It should be a model, i.e. it should define all the +implementation-dependent things. +@item +It should become standard, i.e. widely accepted and used. This goal +is the most difficult one. +@end itemize + +To achieve these goals Gforth should be +@itemize @bullet +@item +Similar to previous models (fig-Forth, F83) +@item +Powerful. It should provide for all the things that are considered +necessary today and even some that are not yet considered necessary. +@item +Efficient. It should not get the reputation of being exceptionally +slow. +@item +Free. +@item +Available on many machines/easy to port. +@end itemize + +Have we achieved these goals? Gforth conforms to the ANS Forth +standard. It may be considered a model, but we have not yet documented +which parts of the model are stable and which parts we are likely to +change. It certainly has not yet become a de facto standard, but it +appears to be quite popular. It has some similarities to and some +differences from previous models. It has some powerful features, but not +yet everything that we envisioned. We certainly have achieved our +execution speed goals (@pxref{Performance}). It is free and available +on many machines. + +@menu +* Gforth Extensions Sinful?:: +@end menu + +@node Gforth Extensions Sinful?, , Goals, Goals +@comment node-name, next, previous, up +@section Is it a Sin to use Gforth Extensions? +@cindex Gforth extensions + +If you've been paying attention, you will have realised that there is an +ANS (American National Standard) for Forth. As you read through the rest +of this manual, you will see documentation for @var{Standard} words, and +documentation for some appealing Gforth @var{extensions}. You might ask +yourself the question: @var{``Given that there is a standard, would I be +committing a sin to use (non-Standard) Gforth extensions?''} + +The answer to that question is somewhat pragmatic and somewhat +philosophical. Consider these points: + +@itemize @bullet +@item +A number of the Gforth extensions can be implemented in ANS Forth using +files provided in the @file{compat/} directory. These are mentioned in +the text in passing. +@item +Forth has a rich historical precedent for programmers taking advantage +of implementation-dependent features of their tools (for example, +relying on a knowledge of the dictionary structure). Sometimes these +techniques are necessary to extract every last bit of performance from +the hardware, sometimes they are just a programming shorthand. +@item +The best way to break the rules is to know what the rules are. To learn +the rules, there is no substitute for studying the text of the Standard +itself. In particular, Appendix A of the Standard (@var{Rationale}) +provides a valuable insight into the thought processes of the technical +committee. +@item +The best reason to break a rule is because you have to; because it's +more productive to do that, because it makes your code run fast enough +or because you can see no Standard way to achieve what you want to +achieve. +@end itemize + +The tool @file{ans-report.fs} (@pxref{ANS Report}) makes it easy to +analyse your program and determine what non-Standard definitions it +relies upon. + +@c ****************************************************************** +@node Introduction, Invoking Gforth, Goals, Top @comment node-name, next, previous, up @chapter An Introduction to ANS Forth @cindex Forth - an introduction @@ -835,13 +928,13 @@ teaching material, it seems worthwhile t material. @xref{Forth-related information} for other sources of Forth-related information. -The examples in this section should work on any ANS Standard Forth, the -output shown was produced using Gforth. In each example, I have tried to +The examples in this section should work on any ANS Forth; the +output shown was produced using Gforth. Each example attempts to reproduce the exact output that Gforth produces. If you try out the examples (and you should), what you should type is shown @kbd{like this} and Gforth's response is shown @code{like this}. The single exception is that, where the example shows @kbd{} it means that you should -press the "carriage return" key. Unfortunatley, some output formats for +press the ``carriage return'' key. Unfortunately, some output formats for this manual cannot show the difference between @kbd{this} and @code{this} which will make trying out the examples harder (but not impossible). @@ -864,7 +957,6 @@ lead to great productivity improvements. * Review - elements of a Forth system:: * Exercises:: @end menu -@comment TODO add these sections to the top xref lists @comment ---------------------------------------------- @node Introducing the Text Interpreter, Stacks and Postfix notation, Introduction, Introduction @@ -876,11 +968,11 @@ When you invoke the Forth image, you wil and nothing else (if you have Gforth installed on your system, try invoking it now, by typing @kbd{gforth}). Forth is now running its command line interpreter, which is called the @var{Text Interpreter} -(also known as the @var{Outer Interpreter}). (@pxref{The Text -Interpreter} describes it in more detail, but we will learn more about -its behaviour as we go through this chapter). +(also known as the @var{Outer Interpreter}). (You will learn a lot +about the text interpreter as you read through this chapter, +but @pxref{The Text Interpreter} for more detail). -Although it may not be obvious, Forth is actually waiting for your +Although it's not obvious, Forth is actually waiting for your input. Type a number and press the key: @example @@ -889,24 +981,28 @@ input. Type a number and press the } +:1: Undefined word +qwer341 ^^^^^^^ -Error: Undefined word +$400D2BA8 Bounce +$400DBDA8 no.extensions @end example -When the text interpreter detects an error, it discards any remaining -text on a line, resets certain internal state and prints an error -message. - -The text interpreter works on input one line at a time. Starting at -the beginning of the line, it breaks the line into groups of characters -separated by spaces. For each group of characters in turn, it makes two -attempts to do something: +The exact text, other than the ``Undefined word'' may differ slightly on +your system, but the effect is the same; when the text interpreter +detects an error, it discards any remaining text on a line, resets +certain internal state and prints an error message. + +The text interpreter waits for you to press carrage-return, and then +processes your input line. Starting at the beginning of the line, it +breaks the line into groups of characters separated by spaces. For each +group of characters in turn, it makes two attempts to do something: @itemize @bullet @item @@ -933,9 +1029,10 @@ in the next section). @end itemize If the text interpreter is unable to do either of these things with any -group of characters, it discards the rest of the line and print an error -message. If the text interpreter reaches the end of the line without -error, it prints the status message " ok" followed by carriage-return. +group of characters, it discards the group of characters and the rest of +the line, then prints an error message. If the text interpreter reaches +the end of the line without error, it prints the status message ``@code{ ok}'' +followed by carriage-return. This is the simplest command we can give to the text interpreter: @@ -944,17 +1041,20 @@ This is the simplest command we can give @end example The text interpreter did everything we asked it to do (nothing) without -an error, so it said that everything is "ok". Try a slightly longer +an error, so it said that everything is ``@code{ ok}''. Try a slightly longer command: @example @kbd{12 dup fred dup} +:1: Undefined word +12 dup fred dup ^^^^ -Error: Undefined word +$400D2BA8 Bounce +$400DBDA8 no.extensions @end example -When you pres the key, the text interpreter starts to work its -way along the line. +When you press the carriage-return key, the text interpreter starts to +work its way along the line: @itemize @bullet @item @@ -963,12 +1063,12 @@ characters @code{12} and looks them up i dictionary@footnote{We can't tell if it found them or not, but assume for now that it did not}. There is no match for this group of characters in the name dictionary, so it tries to treat them as a number. It is -able to do this successfully, so it puts the number, 12, "on the stack" +able to do this successfully, so it puts the number, 12, ``on the stack'' (whatever that means). @item The text interpreter resumes scanning the line and gets the next group -of characters, @code{dup}. It looks them up in the name dictionary and -(you'll have to take my word for this) finds them, and executes the word +of characters, @code{dup}. It looks it up in the name dictionary and +(you'll have to take my word for this) finds it, and executes the word @code{dup} (whatever that means). @item Once again, the text interpreter resumes scanning the line and gets the @@ -993,18 +1093,20 @@ and executing it a second time. @cindex outer interpreter In procedural programming languages (like C and Pascal), the -building-block of programs is the function or procedure. These -functions or procedures are called with explicit parameters. For +building-block of programs is the @var{function} or @var{procedure}. These +functions or procedures are called with @var{explicit parameters}. For example, in C we might write: @example total = total + new_volume(length,height,depth); @end example -where total, length, height, depth are all variables and new_volume is -a function-call to another piece of code. +@noindent +where new_volume is a function-call to another piece of code, and total, +length, height and depth are all variables. length, height and depth are +parameters to the function-call. -In Forth, the equivalent to the function or procedure is the +In Forth, the equivalent of the function or procedure is the @var{definition} and parameters are implicitly passed between definitions using a shared stack that is visible to the programmer. Although Forth does support variables, the existence of the @@ -1015,7 +1117,7 @@ actual number is implementation-dependen used for any operation is implied unambiguously by the operation being performed. The stack used for all integer operations is called the @var{data stack} and, since this is the stack used most commonly, references to -"the data stack" are often abbreviated to "the stack". +``the data stack'' are often abbreviated to ``the stack''. The stacks have a last-in, first-out (LIFO) organisation. If you type: @@ -1023,29 +1125,31 @@ The stacks have a last-in, first-out (LI @kbd{1 2 3} ok @end example -Then you (well, the text interpreter, really) have placed three numbers -on the (data) stack. An analogy for the behaviour of the stack is to -take a pack of playing cards and deal out the ace (1), 2 and 3 into a -pile on the table. The 3 was the last card onto the pile ("last-in") and -if you take a card off the pile then, unless you're prepared to fiddle a -bit, the card that you take off will be the 3 ("first-out"). The number -that will be first-out of the stack is called the "top of stack", which +Then this instructs the text interpreter to placed three numbers on the +(data) stack. An analogy for the behaviour of the stack is to take a +pack of playing cards and deal out the ace (1), 2 and 3 into a pile on +the table. The 3 was the last card onto the pile (``last-in'') and if +you take a card off the pile then, unless you're prepared to fiddle a +bit, the card that you take off will be the 3 (``first-out''). The +number that will be first-out of the stack is called the @var{top of +stack}, which +@cindex TOS definition is often abbreviated to @var{TOS}. -To see how parameters are passed in Forth, we will consider the -behaviour of the definition @code{+} (pronounced "plus"). You will not be -surprised to learn that this definition performs addition. More +To understand how parameters are passed in Forth, consider the +behaviour of the definition @code{+} (pronounced ``plus''). You will not +be surprised to learn that this definition performs addition. More precisely, it adds two number together and produces a result. Where does -it get the two numbers from? It takes the first two numbers off the +it get the two numbers from? It takes the top two numbers off the stack. Where does it place the result? On the stack. You can act-out the behaviour of @code{+} with your playing cards like this: @itemize @bullet @item -Pick up two cards from the stack +Pick up two cards from the stack on the table @item -Stare at them intently and ask yourself "what *is* the sum of these two -numbers" +Stare at them intently and ask yourself ``what @var{is} the sum of these two +numbers'' @item Decide that the answer is 5 @item @@ -1055,12 +1159,12 @@ Put a 5 on the remaining ace that's on t @end itemize If you don't have a pack of cards handy but you do have Forth running, -you can use the definition .s to show the current state of the stack, +you can use the definition @code{.s} to show the current state of the stack, without affecting the stack. Type: @example @kbd{clearstack 1 2 3} ok -@kbd{.s <3> 1 2 3 } ok +@kbd{.s} <3> 1 2 3 ok @end example The text interpreter looks up the word @code{clearstack} and executes @@ -1068,40 +1172,42 @@ it; it tidies up the stack and removes a left on it by earlier examples. The text interpreter pushes each of the three numbers in turn onto the stack. Finally, the text interpreter looks up the word @code{.s} and executes it. The effect of executing -@code{.s} is to print the "<3>" (the total number of items on the stack) -followed by a list of all the items and the item on the far right-hand -side is the TOS. +@code{.s} is to print the ``<3>'' (the total number of items on the stack) +followed by a list of all the items on the stack; the item on the far +right-hand side is the TOS. You can now type: -+ .s <2> 1 5 ok +@example +@kbd{+ .s} <2> 1 5 ok +@end example +@noindent which is correct; there are now 2 items on the stack and the result of the addition is 5. -If you're playing with cards, try doing a second addition; pick up the +If you're playing with cards, try doing a second addition: pick up the two cards, work out that their sum is 6, shuffle them into the pack, -look for a 6 and place that on the table. You now have just one item -on the stack. What happens if you try to do a third addition? Pick up -the first card, pick up the second card - ah. There is no second -card. This is called a "stack underflow" and consitutes an error. If -you try to do the same thing with Forth it will report an error -(probably a Stack Underflow or an Invalid Memory Address error). - -The opposite situation to a stack underflow is a stack overflow, which -simply accepts that there is a finite amount of storage space reserved -for the stack. To stretch the playing card analogy, if you had enough -packs of cards and you piled the cards up on the table, you would -eventually be unable to add another card; you'd hit the -ceiling. Gforth allows you to set the maximum size of the stacks. In -general, the only time that you will get a stack overflow is because a -definition has a bug in it and is generating data on the stack -uncontrollably. +look for a 6 and place that on the table. You now have just one item on +the stack. What happens if you try to do a third addition? Pick up the +first card, pick up the second card -- ah! There is no second card. This +is called a @var{stack underflow} and consitutes an error. If you try to +do the same thing with Forth it will report an error (probably a Stack +Underflow or an Invalid Memory Address error). + +The opposite situation to a stack underflow is a @var{stack overflow}, +which simply accepts that there is a finite amount of storage space +reserved for the stack. To stretch the playing card analogy, if you had +enough packs of cards and you piled the cards up on the table, you would +eventually be unable to add another card; you'd hit the ceiling. Gforth +allows you to set the maximum size of the stacks. In general, the only +time that you will get a stack overflow is because a definition has a +bug in it and is generating data on the stack uncontrollably. There's one final use for the playing card analogy. If you model your stack using a pack of playing cards, the maximum number of items on your stack will be 52 (I assume you didn't use the Joker). The maximum -*value* of any item on the stack is 13 (the King). In fact, the only +@var{value} of any item on the stack is 13 (the King). In fact, the only possible numbers are positive integer numbers 1 through 13; you can't have (for example) 0 or 27 or 3.52 or -2. If you change the way you think about some of the cards, you can accommodate different @@ -1112,7 +1218,7 @@ numbers) but the numbers that you can re In that analogy, the limit was the amount of information that a single stack entry could hold, and Forth has a similar limit. In Forth, the -size of a stack entry is called a "cell". The actual size of a cell is +size of a stack entry is called a @var{cell}. The actual size of a cell is implementation dependent and affects the maximum value that a stack entry can hold. A Standard Forth provides a cell size of at least 16-bits, and most desktop systems use a cell size of 32-bits. @@ -1120,40 +1226,51 @@ entry can hold. A Standard Forth provide Forth does not do any type checking for you, so you are free to manipulate and combine stack items in any way you wish. A convenient ways of treating stack items is as 2's complement signed integers, and -that is what Standard words like "+" do. Therefore you can type: +that is what Standard words like ``+'' do. Therefore you can type: --5 12 + .s <1> 7 ok +@example +@kbd{-5 12 + .s} <1> 7 ok +@end example -If you use numbers and definitions like "+" in order to turn Forth +If you use numbers and definitions like ``+'' in order to turn Forth into a great big pocket calculator, you will realise that it's rather different from a normal calculator. Rather than typing 2 + 3 = you had -to type 2 3 + (ignore the fact that you had to use .s to see the +to type 2 3 + (ignore the fact that you had to use @code{.s} to see the result). The terminology used to describe this difference is to say -that your calculator uses "Infix Notation" (parameters and operators -are mixed) whilst Forth uses "Postfix Notation" (parameters and -operators are separate), also called "Reverse Polish Notation". +that your calculator uses @var{Infix Notation} (parameters and operators +are mixed) whilst Forth uses @var{Postfix Notation} (parameters and +operators are separate), also called @var{Reverse Polish Notation}. Whilst postfix notation might look confusing to begin with, it has several important advantages: -- it is unambiguous -- it is more concise -- it fits naturally with a stack-based system +@itemize @bullet +@item +it is unambiguous +@item +it is more concise +@item +it fits naturally with a stack-based system +@end itemize To examine these claims in more detail, consider these sums: +@example 6 + 5 * 4 = 4 * 5 + 6 = +@end example If you're just learning maths or your maths is very rusty, you will probably come up with the answer 44 for the first and 26 for the second. If you are a bit of a whizz at maths you will remember the -*convention* that multiplication takes precendence over addition, and +@var{convention} that multiplication takes precendence over addition, and you'd come up with the answer 26 both times. To explain the answer 26 to someone who got the answer 44, you'd probably rewrite the first sum like this: +@example 6 + (5 * 4) = +@end example If what you really wanted was to perform the addition before the multiplication, you would have to use parentheses to force it. @@ -1167,26 +1284,28 @@ these keystroke sequences: Postfix notation is unambiguous because the order that the operators are applied is always explicit; that also means that parentheses are -never required. The operators are *active* (the act of quoting the -operator makes the operation occur) which removes the need for "=". +never required. The operators are @var{active} (the act of quoting the +operator makes the operation occur) which removes the need for ``=''. The sum 6 + 5 * 4 can be written (in postfix notation) in two equivalent ways: +@example 6 5 4 * + or: 5 4 * 6 + +@end example An important thing that you should notice about this notation is that the @var{order} of the numbers does not change; if you want to subtract 2 from 10 you type @code{10 2 -}. -The reason why Forth uses postfix notation is very simple to explain: it +The reason that Forth uses postfix notation is very simple to explain: it makes the implementation extremely simple, and it follows naturally from using the stack as a mechanism for passing parameters. Another way of thinking about this is to realise that all Forth definitions are @var{active}; they execute as they are encountered by the text -interpreter. The result of this is that the syntax of Forth is almost -trivially simple. +interpreter. The result of this is that the syntax of Forth is trivially +simple. @@ -1197,14 +1316,14 @@ trivially simple. Until now, the examples we've seen have been trivial; we've just been using Forth an a bigger-than-pocket calculator. Also, each calculation -we've shown has been a "one-off" -- to repeat it we'd need to type it in +we've shown has been a ``one-off'' -- to repeat it we'd need to type it in again@footnote{That's not quite true. If you press the up-arrow key on your keyboard you should be able to scroll back to any earlier command, edit it and re-enter it.} In this section we'll see how to add new word to Forth's vocabulary. -The easiest way to create a new word is to use a "colon -definition". We'll define a few and try them out before we worry too +The easiest way to create a new word is to use a @var{colon +definition}. We'll define a few and try them out before we worry too much about how they work. Try typing in these examples; be careful to copy the spaces accurately: @@ -1341,7 +1460,7 @@ magic to make that xt or number get exec at the time that @code{add-two} is @var{executed}. Therefore, when you execute @code{add-two} its @var{run-time effect} is exactly the same as if you had typed @code{2 + .} outside of a definition, and pressed -. +carriage-return. In Forth, every word or number can be described in terms of three properties: @@ -1468,7 +1587,7 @@ example). The effect of executing it are compilation state at this time. If you execute @code{word2} it does nothing at all. -@cindex ." -- how it works +@cindex @code{."}, how it works Before leaving the subject of immediate words, consider the behaviour of @code{."} in the definition of @code{greet}, in the previous section. This word is both a parsing word and an immediate word. Notice @@ -1480,7 +1599,7 @@ the text interpreter can identify it. Th it is a @var{delimiter}. The examples earlier show that, when the string is displayed, there is neither a space before the @code{H} nor after the @code{e}. Since @code{."} is an immediate word, it executes at the time -that @code{greet is defined}. When it executes, it searches forward in +that @code{greet} is defined. When it executes, it searches forward in the input line looking for the delimiter. When it finds the delimiter, it updates @code{>in} to point past the delimiter. It also compiles some magic code into the definition of @code{greet}; the xt of a run-time @@ -1506,7 +1625,7 @@ If you have tried out the examples in th have typed them in by hand; when you leave Gforth, your definitions will be deleted. You can avoid this by using a text editor to enter Forth source code into a file, and then load all of the code from the file -using @code{include} (@xref{Including Files}). A Forth source +using @code{include} (@xref{Forth source files}). A Forth source file is processed by the text interpreter, just as though you had typed it in by hand@footnote{Actually, there are some subtle differences, like the fact that it doesn't print @code{ ok} at the end of each line}. @@ -1526,11 +1645,12 @@ long definitions by hand, you can use a the history file into a Forth source file for reuse at a later time. @cindex history file -@cindex .gforth-history -@cindex GFORTHHIST +@cindex @file{.gforth-history} +@cindex @code{GFORTHHIST} environment variable +@cindex environment variables You can find out the name of your history file using @code{history-file type }. On non-Unix systems you can find the location of the file using -@code{history-dir type }@footnote{The environment variable GFORTHHIST +@code{history-dir type }@footnote{The environment variable @code{GFORTHHIST} determines the location of the file.} @@ -1552,7 +1672,7 @@ Forth program development is an interact @item The main command loop that accepts input, and controls both interpretation and compilation, is called the @var{text interpreter} -(also known as the @var{outer interpreter}. +(also known as the @var{outer interpreter}). @item Forth has a very simple syntax, consisting of words and numbers separated by spaces or carriage-return characters. Any additional syntax @@ -1573,7 +1693,7 @@ semantics} of a word that it encounters. @item The relationship between the @var{interpretation semantics}, @var{compilation semantics} and @var{execution semantics} for a word depend upon the way in which -the word was defined (for example, whether it is an @var{immediate} word. +the word was defined (for example, whether it is an @var{immediate} word). @item Forth definitions can be implemented in Forth (called @var{high-level definitions}) or in some other way (usually a lower-level language and @@ -1583,7 +1703,7 @@ definitions} or @var{primitives}). Many Forth systems are implemented mainly in Forth. @item You now know enough to read and understand the rest of this manual and -the ANS Forth Standard. +the ANS Forth document. @end itemize @@ -1609,7 +1729,7 @@ provides. Even scarier, you know almost system. However, that's not a good idea just yet.. better to try writing some programs in Gforth. -The large number of Forth words available in ANS Standard Forth and +The large number of Forth words available in ANS Forth and Gforth make learning Forth somewhat daunting. To make the problem easier, use the index of this manual to learn more about these words: @@ -1622,127 +1742,32 @@ all the exercises in a .fs file in the d inspiration from Starting Forth and Kelly&Spies. +@c ****************************************************************** +@node Invoking Gforth, Words, Introduction, Top +@chapter Invoking Gforth +@cindex Gforth - invoking +@cindex invoking Gforth +@cindex running Gforth +@cindex command-line options +@cindex options on the command line +@cindex flags on the command line -@c ---------------------------------------------------------- -@node Goals, Invoking Gforth, Introduction, Top -@comment node-name, next, previous, up -@chapter Goals of Gforth -@cindex Goals -The goal of the Gforth Project is to develop a standard model for -ANS Forth. This can be split into several subgoals: - -@itemize @bullet -@item -Gforth should conform to the ANS Forth Standard. -@item -It should be a model, i.e. it should define all the -implementation-dependent things. -@item -It should become standard, i.e. widely accepted and used. This goal -is the most difficult one. -@end itemize - -To achieve these goals Gforth should be -@itemize @bullet -@item -Similar to previous models (fig-Forth, F83) -@item -Powerful. It should provide for all the things that are considered -necessary today and even some that are not yet considered necessary. -@item -Efficient. It should not get the reputation of being exceptionally -slow. -@item -Free. -@item -Available on many machines/easy to port. -@end itemize +You will usually just say @code{gforth}. In many other cases the default +Gforth image will be invoked like this: +@example +gforth [files] [-e forth-code] +@end example +This interprets the contents of the files and the Forth code in the order they +are given. -Have we achieved these goals? Gforth conforms to the ANS Forth -standard. It may be considered a model, but we have not yet documented -which parts of the model are stable and which parts we are likely to -change. It certainly has not yet become a de facto standard, but it -appears to be quite popular. It has some similarities to and some -differences from previous models. It has some powerful features, but not -yet everything that we envisioned. We certainly have achieved our -execution speed goals (@pxref{Performance}). It is free and available -on many machines. +In general, the command line looks like this: -@menu -* Gforth Extensions Sinful?:: -@end menu +@example +gforth [initialization options] [image-specific options] +@end example -@node Gforth Extensions Sinful?, , Goals, Goals -@comment node-name, next, previous, up -@section Is it a Sin to use Gforth Extensions? -@cindex Gforth extensions - -If you've been paying attention, you will have realised that there is an -ANS Standard for Forth. As you read through the rest of this manual, you -will see documentation for @var{Standard} words, and documentation for -some appealing Gforth @var{extensions}. You might ask yourself the -question: @var{"Given that there is a standard, would I be committing a -sin to use (non-Standard) Gforth extensions?"} - -The answer to that question is somewhat pragmatic and somewhat -philosophical. Consider these points: - -@itemize @bullet -@item -A number of the Gforth extensions can be implemented in ANS Standard -Forth using files provided in the @file{compat/} directory. These are -mentioned in the text in passing. -@item -Forth has a rich historical precedent for programmers taking advantage -of implementation-dependent features of their tools (for example, -relying on a knowledge of the dictionary structure). Sometimes these -techniques are necessary to extract every last bit of performance from -the hardware, sometimes they are just a programming shorthand. -@item -The best way to break the rules is to know what the rules are. To learn -the rules, there is no substitute for studying the text of the Standard -itself. In particular, Appendix A of the Standard (@var{Rationale}) -provides a valuable insight into the thought processes of the technical -committee. -@item -The best reason to break a rule is because you have to; because it's -more productive to do that, because it makes your code run fast enough -or because you can see no Standard way to achieve what you want to -achieve. -@end itemize - -The tool @file{ans-report.fs} (@pxref{ANS Report}) makes it easy to -analyse your program and determine what non-Standard definitions it -relies upon. - - - -@c ---------------------------------------------------------- -@node Invoking Gforth, Words, Goals, Top -@chapter Invoking Gforth -@cindex Gforth - invoking -@cindex invoking Gforth -@cindex running Gforth -@cindex command-line options -@cindex options on the command line -@cindex flags on the command line - -You will usually just say @code{gforth}. In many other cases the default -Gforth image will be invoked like this: -@example -gforth [files] [-e forth-code] -@end example -This interprets the contents of the files and the Forth code in the order they -are given. - -In general, the command line looks like this: - -@example -gforth [initialization options] [image-specific options] -@end example - -The initialization options must come before the rest of the command -line. They are: +The initialization options must come before the rest of the command +line. They are: @table @code @cindex -i, command-line option @@ -1856,7 +1881,7 @@ default image @file{gforth.fi} consist o in which they are given. The @code{-e @var{forth-code}} or @code{--evaluate @var{forth-code}} option evaluates the Forth code. This option takes only one argument; if you want to evaluate more -Forth words, you have to quote them or use several @code{-e}s. To exit +Forth words, you have to quote them or use @code{-e} several times. To exit after processing the command line (instead of entering interactive mode) append @code{-e bye} to the command line. @@ -1900,9 +1925,10 @@ doc-bye @comment some are in .c files. +@c ****************************************************************** @node Words, Error messages, Invoking Gforth, Top @chapter Forth Words -@cindex Words +@cindex words @menu * Notation:: @@ -1912,21 +1938,20 @@ doc-bye * Stack Manipulation:: * Memory:: * Control Structures:: -* Locals:: * Defining Words:: * The Text Interpreter:: -* Structures:: -* Object-oriented Forth:: * Tokens for Words:: * Word Lists:: * Environmental Queries:: * Files:: -* Including Files:: * Blocks:: * Other I/O:: * Programming Tools:: * Assembler and Code Words:: * Threading Words:: +* Locals:: +* Structures:: +* Object-oriented Forth:: * Passing Commands to the OS:: * Miscellaneous Words:: @end menu @@ -1948,9 +1973,10 @@ that has become a de-facto standard for @table @var @item word -@cindex case insensitivity -The name of the word. BTW, Gforth is case insensitive, so you can -type the words in in lower case (However, @pxref{core-idef}). +@cindex case-sensitivity +The name of the word. Gforth is case-insensitive, so you can type the +words in in lower case (However, @pxref{core-idef, +Implementation-defined options, Implementation-defined options}). @item Stack effect @cindex stack effect @@ -1982,7 +2008,7 @@ The ANS Forth standard is divided into s system need not support all of them. Therefore, in theory, the fewer word sets your program uses the more portable it will be. However, we suspect that most ANS Forth systems on personal machines will feature -all word sets. Words that are not defined in the ANS standard have +all word sets. Words that are not defined in ANS Forth have @code{gforth} or @code{gforth-internal} as word set. @code{gforth} describes words that will work in future releases of Gforth; @code{gforth-internal} words are more volatile. Environmental query @@ -2056,10 +2082,10 @@ quotes. @node Comments, Boolean Flags, Notation, Words @section Comments -@cindex Comments +@cindex comments -Forth supports two styles of comment; the traditional "in-line" comment, -@code{(} and its modern cousin, the "comment to end of line"; @code{\}. +Forth supports two styles of comment; the traditional @var{in-line} comment, +@code{(} and its modern cousin, the @var{comment to end of line}; @code{\}. doc-( doc-\ @@ -2067,11 +2093,11 @@ doc-\G @node Boolean Flags, Arithmetic, Comments, Words @section Boolean Flags -@cindex Boolean Flags +@cindex Boolean flags A Boolean flag is cell-sized. A cell with all bits clear represents the flag @code{false} and a flag with all bits set represents the flag -@code{true}. Words that check a flag (for example, @var{IF}) will treat +@code{true}. Words that check a flag (for example, @code{IF}) will treat a cell that has @var{any} bit set as @code{true}. doc-true @@ -2092,6 +2118,7 @@ operators. If you perform division with you do not want to use @code{/} or @code{/mod} with its undefined behaviour, but rather @code{fm/mod} or @code{sm/mod} (probably the former, @pxref{Mixed precision}). +@comment TODO discuss the different division forms and the std approach @menu * Single precision:: @@ -2107,7 +2134,7 @@ former, @pxref{Mixed precision}). @cindex single precision arithmetic words By default, numbers in Forth are single-precision integers that are 1 -CELL in size. They can be signed or unsigned, depending upon how you +cell in size. They can be signed or unsigned, depending upon how you treat them. @xref{Number Conversion} for the rules used by the text interpreter for recognising single-precision integers. @@ -2148,13 +2175,14 @@ doc-d2/ recognising double-precision integers. A double precision number is represented by a cell pair, with the most -significant digit at the TOS. It is trivial to convert an unsigned single -to an (unsigned) double; simply push a @code{0} onto the TOS. Since numbers -are represented by Gforth using 2's complement arithmetic, converting -a signed single to a (signed) double requires sign-extension across the -most significant digit. This can be achieved using @code{s>d}. The moral -of the story is that you cannot convert a number without knowing what that -number represents. +significant digit at the TOS. It is trivial to convert an unsigned +single to an (unsigned) double; simply push a @code{0} onto the +TOS. Since numbers are represented by Gforth using 2's complement +arithmetic, converting a signed single to a (signed) double requires +sign-extension across the most significant digit. This can be achieved +using @code{s>d}. The moral of the story is that you cannot convert a +number without knowing whether it represents an unsigned or a +signed number. doc-s>d doc-d+ @@ -2228,8 +2256,8 @@ recognising floating-point numbers. @cindex angles in trigonometric operations @cindex trigonometric operations Angles in floating point operations are given in radians (a full circle -has 2 pi radians). Note, that Gforth has a separate floating point -stack, but we use the unified notation. +has 2 pi radians). Gforth has a separate floating point +stack, but the documentation uses the unified notation. @cindex floating-point arithmetic, pitfalls Floating point numbers have a number of unpleasant surprises for the @@ -2398,6 +2426,7 @@ doc-2rdrop @node Locals stack, Stack pointer manipulation, Return stack, Stack Manipulation @subsection Locals stack +@comment TODO @node Stack pointer manipulation, , Locals stack, Stack Manipulation @subsection Stack pointer manipulation @@ -2421,7 +2450,7 @@ doc-lp! @node Memory, Control Structures, Stack Manipulation, Words @section Memory -@cindex Memory words +@cindex memory words @menu * Memory Access:: @@ -2475,12 +2504,12 @@ char-aligned have no use in the standard created. @cindex @code{CREATE} and alignment -The standard guarantees that addresses returned by @code{CREATE}d words +AND Forth guarantees that addresses returned by @code{CREATE}d words are cell-aligned; in addition, Gforth guarantees that these addresses are aligned for all purposes. -Note that the standard defines a word @code{char}, which has nothing to -do with address arithmetic. +Note that the ANS Forth word @code{char} has nothing to do with address +arithmetic. doc-chars doc-char+ @@ -2542,7 +2571,7 @@ doc-blank doc-compare doc-search -@node Control Structures, Locals, Memory, Words +@node Control Structures, Defining Words, Memory, Words @section Control Structures @cindex control structures @@ -2605,7 +2634,7 @@ and many other programming languages has Gforth also provides the words @code{?DUP-IF} and @code{?DUP-0=-IF}, so you can avoid using @code{?dup}. Using these alternatives is also more -efficient than using @code{?dup}. Definitions in ANS Standard Forth +efficient than using @code{?dup}. Definitions in ANS Forth for @code{ENDIF}, @code{?DUP-IF} and @code{?DUP-0=-IF} are provided in @file{compat/control.fs}. @@ -2804,14 +2833,13 @@ prints nothing. @end itemize Unfortunately, @code{+DO}, @code{U+DO}, @code{-DO}, @code{U-DO} and -@code{-LOOP} are not in the ANS Forth standard. However, an -implementation for these words that uses only standard words is provided -in @file{compat/loops.fs}. - +@code{-LOOP} are not defined in ANS Forth. However, an implementation +for these words that uses only standard words is provided in +@file{compat/loops.fs}. @cindex @code{FOR} loops -Another counted loop is +Another counted loop is: @example @var{n} FOR @@ -2819,11 +2847,11 @@ FOR NEXT @end example This is the preferred loop of native code compiler writers who are too -lazy to optimize @code{?DO} loops properly. In Gforth, this loop -iterates @var{n+1} times; @code{i} produces values starting with @var{n} -and ending with 0. Other Forth systems may behave differently, even if -they support @code{FOR} loops. To avoid problems, don't use @code{FOR} -loops. +lazy to optimize @code{?DO} loops properly. This loop structure is not +defined in ANS Forth. In Gforth, this loop iterates @var{n+1} times; +@code{i} produces values starting with @var{n} and ending with 0. Other +Forth systems may behave differently, even if they support @code{FOR} +loops. To avoid problems, don't use @code{FOR} loops. @node Arbitrary control structures, Calls and returns, Counted Loops, Control Structures @subsection Arbitrary control structures @@ -2857,7 +2885,6 @@ would need to know how many stack items entry (many systems use one cell. In Gforth they currently take three, but this may change in the future). - Some standard control structure words are built from these words: doc-else @@ -2895,7 +2922,7 @@ through the definition (@code{LOOP} etc. fall-through path). Also, you have to ensure that all @code{LEAVE}s are resolved (by using one of the loop-ending words or @code{DONE}). -Another group of control structure words are +Another group of control structure words are: doc-case doc-endcase @@ -2910,40 +2937,38 @@ doc-endof In order to ensure readability we recommend that you do not create arbitrary control structures directly, but define new control structure words for the control structure you want and use these words in your -program. - -E.g., instead of writing: +program. For example, instead of writing: @example -begin +BEGIN ... -if [ 1 cs-roll ] +IF [ 1 CS-ROLL ] ... -again then +AGAIN THEN @end example @noindent we recommend defining control structure words, e.g., @example -: while ( dest -- orig dest ) - POSTPONE if - 1 cs-roll ; immediate - -: repeat ( orig dest -- ) - POSTPONE again - POSTPONE then ; immediate +: WHILE ( DEST -- ORIG DEST ) + POSTPONE IF + 1 CS-ROLL ; immediate + +: REPEAT ( orig dest -- ) + POSTPONE AGAIN + POSTPONE THEN ; immediate @end example @noindent and then using these to create the control structure: @example -begin +BEGIN ... -while +WHILE ... -repeat +REPEAT @end example That's much easier to read, isn't it? Of course, @code{REPEAT} and @@ -2957,14 +2982,12 @@ necessary to define them. @cindex recursive definitions A definition can be called simply be writing the name of the definition -to be called. Note that normally a definition is invisible during its +to be called. Normally a definition is invisible during its own definition. If you want to write a directly recursive definition, you -can use @code{recursive} to make the current definition visible. +can use @code{recursive} to make the current definition visible, or +@code{recurse} to call the current definition directly. doc-recursive - -Another way to perform a recursive call is - doc-recurse @comment TODO add example of the two recursion methods @@ -2993,4103 +3016,4253 @@ defer foo IS foo @end example -When the end of the definition is reached, it returns. An earlier return -can be forced using +The current definition returns control to the calling definition when +the end of the definition is reached or @code{EXIT} is encountered. doc-exit - -Don't forget to clean up the return stack and @code{UNLOOP} any -outstanding @code{?DO}...@code{LOOP}s before @code{EXIT}ing. - doc-;s @node Exception Handling, , Calls and returns, Control Structures @subsection Exception Handling -@cindex Exceptions - -@comment TODO examples and blurb -doc-catch -doc-throw -@comment TODO -- think this will alllcate you a new THROW code? -@comment for reserving new exception numbers. Note the existence of compat/exception.fs -doc---exception-exception -doc-quit -doc-abort -doc-abort" +@cindex exceptions +If your program detects a fatal error condition, the simplest action +that it can take is to @code{quit}. This resets the return stack and +restarts the text interpreter, but does not print any error message. -@node Locals, Defining Words, Control Structures, Words -@section Locals -@cindex locals - -Local variables can make Forth programming more enjoyable and Forth -programs easier to read. Unfortunately, the locals of ANS Forth are -laden with restrictions. Therefore, we provide not only the ANS Forth -locals wordset, but also our own, more powerful locals wordset (we -implemented the ANS Forth locals wordset through our locals wordset). +The next stage in severity is to execute @code{abort}, which has the +same effect as @code{quit}, with the addition that it resets the data +stack. -The ideas in this section have also been published in the paper -@cite{Automatic Scoping of Local Variables} by M. Anton Ertl, presented -at EuroForth '94; it is available at -@*@url{http://www.complang.tuwien.ac.at/papers/ertl94l.ps.gz}. +A slightly more sophisticated approach is use use @code{abort"}, which +compiles a string to be used as an error message and does a conditional +@code{abort} at run-time. For example: -@menu -* Gforth locals:: -* ANS Forth locals:: -@end menu +@example +@kbd{: checker abort" That flag was true" ." A false flag" ;} ok +@kbd{0 checker} A false flag ok +@kbd{1 checker} +:1: That flag was true +1 checker + ^^^^^^^ +$400D1648 throw +$400E4660 +@end example -@node Gforth locals, ANS Forth locals, Locals, Locals -@subsection Gforth locals -@cindex Gforth locals -@cindex locals, Gforth style +These simple techniques allow a program to react to a fatal error +condition, but they are not exactly user-friendly. The ANS Forth +Exception word set provides the pair of words @code{throw} and +@code{catch}, which can be used to provide sophisticated error-handling. -Locals can be defined with +@code{catch} has a similar behaviour to @code{execute}, in that it takes +an @var{xt} as a parameter and starts execution of the xt. However, +before passing control to the xt, @code{catch} pushes an +@var{exception frame} onto the @var{exception stack}. This exception +frame is used to restore the system to a known state if a detected error +occurs during the execution of the xt. A typical way to use @code{catch} +would be: @example -@{ local1 local2 ... -- comment @} +... ['] foo catch IF ... @end example -or + +Whilst @code{foo} executes, it can call other words to any level of +nesting, as usual. If @code{foo} (and all the words that it calls) +execute successfully, control will ultimately passes to the word following +the @code{catch}, and there will be a @code{true} flag (0) at +TOS. However, if any word detects an error, it can terminate the +execution of @code{foo} by pushing an error code onto the stack and then +performing a @code{throw}. The execution of @code{throw} will pass +control to the word following the @code{catch}, but this time the TOS +will hold the error code. Therefore, the @code{IF} in the example +can be used to determine whether @code{foo} executed successfully. + +This simple example shows how you can use @code{throw} and @code{catch} +to ``take over'' exception handling from the system: @example -@{ local1 local2 ... @} +: my-div ['] / catch if ." DIVIDE ERROR" else ." OK.. " . then ; @end example -E.g., +The next example is more sophisticated and shows a multi-level +@code{throw} and @code{catch}. To understand this example, start at the +definition of @code{top-level} and work backwards: + @example -: max @{ n1 n2 -- n3 @} - n1 n2 > if - n1 - else - n2 - endif ; +: lowest-level ( -- c ) + key dup 27 = if + 1 throw \ ESCAPE key pressed + else + ." lowest-level successfull" CR + then +; + +: lower-level ( -- c ) + lowest-level + \ at this level consider a CTRL-U to be a fatal error + dup 21 = if \ CTRL-U + 2 throw + else + ." lower-level successfull" CR + then +; + +: low-level ( -- c ) + ['] lower-level catch + ?dup if + \ error occurred - do we recognise it? + dup 1 = if + \ ESCAPE key pressed.. pretend it was an E + [char] E + else throw \ propogate the error upwards + then + then + ." low-level successfull" CR +; + +: top-level ( -- ) + CR ['] low-level catch \ CATCH is used like EXECUTE + ?dup if \ error occurred.. + ." Error " . ." occurred - contact your supplier" + else + ." The '" emit ." ' key was pressed" CR + then +; @end example -The similarity of locals definitions with stack comments is intended. A -locals definition often replaces the stack comment of a word. The order -of the locals corresponds to the order in a stack comment and everything -after the @code{--} is really a comment. +The ANS Forth document assigns @code{throw} codes thus: -This similarity has one disadvantage: It is too easy to confuse locals -declarations with stack comments, causing bugs and making them hard to -find. However, this problem can be avoided by appropriate coding -conventions: Do not use both notations in the same program. If you do, -they should be distinguished using additional means, e.g. by position. +@itemize @bullet +@item +codes in the range -1 -- -255 are reserved to be assigned by the +Standard. Assignments for codes in the range -1 -- -58 are currently +documented in the Standard. In particular, @code{-1 throw} is equivalent +to @code{abort} and @code{-2 throw} is equivalent to @code{abort"}. +@item +codes in the range -256 -- -4095 are reserved to be assigned by the system. +@item +all other codes may be assigned by programs. +@end itemize -@cindex types of locals -@cindex locals types -The name of the local may be preceded by a type specifier, e.g., -@code{F:} for a floating point value: +Gforth provides the word @code{exception} as a mechanism for assigning +system throw codes to applications. This allows multiple applications to +co-exist in memory without any clash of @code{throw} codes. A definition +of @code{exception} in ANS Forth is provided in +@file{compat/exception.fs}. -@example -: CX* @{ F: Ar F: Ai F: Br F: Bi -- Cr Ci @} -\ complex multiplication - Ar Br f* Ai Bi f* f- - Ar Bi f* Ai Br f* f+ ; -@end example -@cindex flavours of locals -@cindex locals flavours -@cindex value-flavoured locals -@cindex variable-flavoured locals -Gforth currently supports cells (@code{W:}, @code{W^}), doubles -(@code{D:}, @code{D^}), floats (@code{F:}, @code{F^}) and characters -(@code{C:}, @code{C^}) in two flavours: a value-flavoured local (defined -with @code{W:}, @code{D:} etc.) produces its value and can be changed -with @code{TO}. A variable-flavoured local (defined with @code{W^} etc.) -produces its address (which becomes invalid when the variable's scope is -left). E.g., the standard word @code{emit} can be defined in terms of -@code{type} like this: +doc-quit +doc-abort +doc-abort" -@example -: emit @{ C^ char* -- @} - char* 1 type ; -@end example +doc-catch +doc-throw +doc---exception-exception -@cindex default type of locals -@cindex locals, default type -A local without type specifier is a @code{W:} local. Both flavours of -locals are initialized with values from the data or FP stack. -Currently there is no way to define locals with user-defined data -structures, but we are working on it. +@c ------------------------------------------------------------- +@node Defining Words, The Text Interpreter, Control Structures, Words +@section Defining Words +@cindex defining words -Gforth allows defining locals everywhere in a colon definition. This -poses the following questions: +@comment TODO much more intro material here. 3 classes: colon defn, variables/constants +@comment values, user-defined defining words. @menu -* Where are locals visible by name?:: -* How long do locals live?:: -* Programming Style:: -* Implementation:: +* Simple Defining Words:: +* Colon Definitions:: +* User-defined Defining Words:: +* Supplying names:: +* Interpretation and Compilation Semantics:: @end menu -@node Where are locals visible by name?, How long do locals live?, Gforth locals, Gforth locals -@subsubsection Where are locals visible by name? -@cindex locals visibility -@cindex visibility of locals -@cindex scope of locals - -Basically, the answer is that locals are visible where you would expect -it in block-structured languages, and sometimes a little longer. If you -want to restrict the scope of a local, enclose its definition in -@code{SCOPE}...@code{ENDSCOPE}. +@node Simple Defining Words, Colon Definitions, Defining Words, Defining Words +@subsection Simple Defining Words +@cindex simple defining words +@cindex defining words, simple -doc-scope -doc-endscope +doc-constant +doc-2constant +doc-fconstant +doc-variable +doc-2variable +doc-fvariable +doc-create +doc-user +doc-value +doc-to +doc-defer +doc-is -These words behave like control structure words, so you can use them -with @code{CS-PICK} and @code{CS-ROLL} to restrict the scope in -arbitrary ways. +Definitions in ANS Forth for @code{defer}, @code{} and +@code{[is]} are provided in @file{compat/defer.fs}. +@comment TODO - what do the two "is" words do? -If you want a more exact answer to the visibility question, here's the -basic principle: A local is visible in all places that can only be -reached through the definition of the local@footnote{In compiler -construction terminology, all places dominated by the definition of the -local.}. In other words, it is not visible in places that can be reached -without going through the definition of the local. E.g., locals defined -in @code{IF}...@code{ENDIF} are visible until the @code{ENDIF}, locals -defined in @code{BEGIN}...@code{UNTIL} are visible after the -@code{UNTIL} (until, e.g., a subsequent @code{ENDSCOPE}). +@node Colon Definitions, User-defined Defining Words, Simple Defining Words, Defining Words +@subsection Colon Definitions +@cindex colon definitions -The reasoning behind this solution is: We want to have the locals -visible as long as it is meaningful. The user can always make the -visibility shorter by using explicit scoping. In a place that can -only be reached through the definition of a local, the meaning of a -local name is clear. In other places it is not: How is the local -initialized at the control flow path that does not contain the -definition? Which local is meant, if the same name is defined twice in -two independent control flow paths? +@example +: name ( ... -- ... ) + word1 word2 word3 ; +@end example -This should be enough detail for nearly all users, so you can skip the -rest of this section. If you really must know all the gory details and -options, read on. +creates a word called @code{name}, that, upon execution, executes +@code{word1 word2 word3}. @code{name} is a @dfn{(colon) definition}. -In order to implement this rule, the compiler has to know which places -are unreachable. It knows this automatically after @code{AHEAD}, -@code{AGAIN}, @code{EXIT} and @code{LEAVE}; in other cases (e.g., after -most @code{THROW}s), you can use the word @code{UNREACHABLE} to tell the -compiler that the control flow never reaches that place. If -@code{UNREACHABLE} is not used where it could, the only consequence is -that the visibility of some locals is more limited than the rule above -says. If @code{UNREACHABLE} is used where it should not (i.e., if you -lie to the compiler), buggy code will be produced. +The explanation above is somewhat superficial. @xref{Interpretation and +Compilation Semantics} for an in-depth discussion of some of the issues +involved. -doc-unreachable +doc-: +doc-; -Another problem with this rule is that at @code{BEGIN}, the compiler -does not know which locals will be visible on the incoming -back-edge. All problems discussed in the following are due to this -ignorance of the compiler (we discuss the problems using @code{BEGIN} -loops as examples; the discussion also applies to @code{?DO} and other -loops). Perhaps the most insidious example is: -@example -AHEAD -BEGIN - x -[ 1 CS-ROLL ] THEN - @{ x @} - ... -UNTIL -@end example +@node User-defined Defining Words, Supplying names, Colon Definitions, Defining Words +@subsection User-defined Defining Words +@cindex user-defined defining words +@cindex defining words, user-defined -This should be legal according to the visibility rule. The use of -@code{x} can only be reached through the definition; but that appears -textually below the use. +You can create new defining words simply by wrapping defining-time code +around existing defining words and putting the sequence in a colon +definition. -From this example it is clear that the visibility rules cannot be fully -implemented without major headaches. Our implementation treats common -cases as advertised and the exceptions are treated in a safe way: The -compiler makes a reasonable guess about the locals visible after a -@code{BEGIN}; if it is too pessimistic, the -user will get a spurious error about the local not being defined; if the -compiler is too optimistic, it will notice this later and issue a -warning. In the case above the compiler would complain about @code{x} -being undefined at its use. You can see from the obscure examples in -this section that it takes quite unusual control structures to get the -compiler into trouble, and even then it will often do fine. +@comment TODO example -If the @code{BEGIN} is reachable from above, the most optimistic guess -is that all locals visible before the @code{BEGIN} will also be -visible after the @code{BEGIN}. This guess is valid for all loops that -are entered only through the @code{BEGIN}, in particular, for normal -@code{BEGIN}...@code{WHILE}...@code{REPEAT} and -@code{BEGIN}...@code{UNTIL} loops and it is implemented in our -compiler. When the branch to the @code{BEGIN} is finally generated by -@code{AGAIN} or @code{UNTIL}, the compiler checks the guess and -warns the user if it was too optimistic: -@example -IF - @{ x @} -BEGIN - \ x ? -[ 1 cs-roll ] THEN - ... -UNTIL -@end example +@cindex @code{CREATE} ... @code{DOES>} +If you want the words defined with your defining words to behave +differently from words defined with standard defining words, you can +write your defining word like this: -Here, @code{x} lives only until the @code{BEGIN}, but the compiler -optimistically assumes that it lives until the @code{THEN}. It notices -this difference when it compiles the @code{UNTIL} and issues a -warning. The user can avoid the warning, and make sure that @code{x} -is not used in the wrong area by using explicit scoping: @example -IF - SCOPE - @{ x @} - ENDSCOPE -BEGIN -[ 1 cs-roll ] THEN - ... -UNTIL -@end example +: def-word ( "name" -- ) + Create @var{code1} +DOES> ( ... -- ... ) + @var{code2} ; -Since the guess is optimistic, there will be no spurious error messages -about undefined locals. +def-word name +@end example -If the @code{BEGIN} is not reachable from above (e.g., after -@code{AHEAD} or @code{EXIT}), the compiler cannot even make an -optimistic guess, as the locals visible after the @code{BEGIN} may be -defined later. Therefore, the compiler assumes that no locals are -visible after the @code{BEGIN}. However, the user can use -@code{ASSUME-LIVE} to make the compiler assume that the same locals are -visible at the BEGIN as at the point where the top control-flow stack -item was created. +Technically, this fragment defines a defining word @code{def-word}, and +a word @code{name}; when you execute @code{name}, the address of the +body of @code{name} is put on the data stack and @var{code2} is executed +(the address of the body of @code{name} is the address @code{HERE} +returns immediately after the @code{CREATE}). The word @code{name} is +sometimes called a @var{child} of @code{def-word}. -doc-assume-live +In other words, if you make the following definitions: -E.g., @example -@{ x @} -AHEAD -ASSUME-LIVE -BEGIN - x -[ 1 CS-ROLL ] THEN - ... -UNTIL +: def-word1 ( "name" -- ) + Create @var{code1} ; + +: action1 ( ... -- ... ) + @var{code2} ; + +def-word name1 @end example -Other cases where the locals are defined before the @code{BEGIN} can be -handled by inserting an appropriate @code{CS-ROLL} before the -@code{ASSUME-LIVE} (and changing the control-flow stack manipulation -behind the @code{ASSUME-LIVE}). +Using @code{name1 action1} is equivalent to using @code{name}. + +The classic example is that you can define @code{Constant} in this way: -Cases where locals are defined after the @code{BEGIN} (but should be -visible immediately after the @code{BEGIN}) can only be handled by -rearranging the loop. E.g., the ``most insidious'' example above can be -arranged into: @example -BEGIN - @{ x @} - ... 0= -WHILE - x -REPEAT +: constant ( w "name" -- ) + create , +DOES> ( -- w ) + @@ ; @end example -@node How long do locals live?, Programming Style, Where are locals visible by name?, Gforth locals -@subsubsection How long do locals live? -@cindex locals lifetime -@cindex lifetime of locals - -The right answer for the lifetime question would be: A local lives at -least as long as it can be accessed. For a value-flavoured local this -means: until the end of its visibility. However, a variable-flavoured -local could be accessed through its address far beyond its visibility -scope. Ultimately, this would mean that such locals would have to be -garbage collected. Since this entails un-Forth-like implementation -complexities, I adopted the same cowardly solution as some other -languages (e.g., C): The local lives only as long as it is visible; -afterwards its address is invalid (and programs that access it -afterwards are erroneous). +@comment that is the classic example.. maybe it should be earlier. There +@comment is a beautiful description of how this works and what it does in +@comment the Forthwrite 100th edition. -@node Programming Style, Implementation, How long do locals live?, Gforth locals -@subsubsection Programming Style -@cindex locals programming style -@cindex programming style, locals +When you create a constant with @code{5 constant five}, first a new word +@code{five} is created, then the value 5 is laid down in the body of +@code{five} with @code{,}. When @code{five} is invoked, the address of +the body is put on the stack, and @code{@@} retrieves the value 5. -The freedom to define locals anywhere has the potential to change -programming styles dramatically. In particular, the need to use the -return stack for intermediate storage vanishes. Moreover, all stack -manipulations (except @code{PICK}s and @code{ROLL}s with run-time -determined arguments) can be eliminated: If the stack items are in the -wrong order, just write a locals definition for all of them; then -write the items in the order you want. +@cindex stack effect of @code{DOES>}-parts +@cindex @code{DOES>}-parts, stack effect +In the example above the stack comment after the @code{DOES>} specifies +the stack effect of the defined words, not the stack effect of the +following code (the following code expects the address of the body on +the top of stack, which is not reflected in the stack comment). This is +the convention that I use and recommend (it clashes a bit with using +locals declarations for stack effect specification, though). -This seems a little far-fetched and eliminating stack manipulations is -unlikely to become a conscious programming objective. Still, the number -of stack manipulations will be reduced dramatically if local variables -are used liberally (e.g., compare @code{max} in @ref{Gforth locals} with -a traditional implementation of @code{max}). +@subsubsection Applications of @code{CREATE..DOES>} +@cindex @code{CREATE} ... @code{DOES>}, applications -This shows one potential benefit of locals: making Forth programs more -readable. Of course, this benefit will only be realized if the -programmers continue to honour the principle of factoring instead of -using the added latitude to make the words longer. +You may wonder how to use this feature. Here are some usage patterns: -@cindex single-assignment style for locals -Using @code{TO} can and should be avoided. Without @code{TO}, -every value-flavoured local has only a single assignment and many -advantages of functional languages apply to Forth. I.e., programs are -easier to analyse, to optimize and to read: It is clear from the -definition what the local stands for, it does not turn into something -different later. +@cindex factoring similar colon definitions +When you see a sequence of code occurring several times, and you can +identify a meaning, you will factor it out as a colon definition. When +you see similar colon definitions, you can factor them using +@code{CREATE..DOES>}. E.g., an assembler usually defines several words +that look very similar: +@example +: ori, ( reg-target reg-source n -- ) + 0 asm-reg-reg-imm ; +: andi, ( reg-target reg-source n -- ) + 1 asm-reg-reg-imm ; +@end example -E.g., a definition using @code{TO} might look like this: +@noindent +This could be factored with: @example -: strcmp @{ addr1 u1 addr2 u2 -- n @} - u1 u2 min 0 - ?do - addr1 c@@ addr2 c@@ - - ?dup-if - unloop exit - then - addr1 char+ TO addr1 - addr2 char+ TO addr2 - loop - u1 u2 - ; +: reg-reg-imm ( op-code -- ) + CREATE , +DOES> ( reg-target reg-source n -- ) + @@ asm-reg-reg-imm ; + +0 reg-reg-imm ori, +1 reg-reg-imm andi, @end example -Here, @code{TO} is used to update @code{addr1} and @code{addr2} at -every loop iteration. @code{strcmp} is a typical example of the -readability problems of using @code{TO}. When you start reading -@code{strcmp}, you think that @code{addr1} refers to the start of the -string. Only near the end of the loop you realize that it is something -else. -This can be avoided by defining two locals at the start of the loop that -are initialized with the right value for the current iteration. +@cindex currying +Another view of @code{CREATE..DOES>} is to consider it as a crude way to +supply a part of the parameters for a word (known as @dfn{currying} in +the functional language community). E.g., @code{+} needs two +parameters. Creating versions of @code{+} with one parameter fixed can +be done like this: @example -: strcmp @{ addr1 u1 addr2 u2 -- n @} - addr1 addr2 - u1 u2 min 0 - ?do @{ s1 s2 @} - s1 c@@ s2 c@@ - - ?dup-if - unloop exit - then - s1 char+ s2 char+ - loop - 2drop - u1 u2 - ; -@end example -Here it is clear from the start that @code{s1} has a different value -in every loop iteration. +: curry+ ( n1 -- ) + CREATE , +DOES> ( n2 -- n1+n2 ) + @@ + ; -@node Implementation, , Programming Style, Gforth locals -@subsubsection Implementation -@cindex locals implementation -@cindex implementation of locals + 3 curry+ 3+ +-2 curry+ 2- +@end example -@cindex locals stack -Gforth uses an extra locals stack. The most compelling reason for -this is that the return stack is not float-aligned; using an extra stack -also eliminates the problems and restrictions of using the return stack -as locals stack. Like the other stacks, the locals stack grows toward -lower addresses. A few primitives allow an efficient implementation: +@subsubsection The gory details of @code{CREATE..DOES>} +@cindex @code{CREATE} ... @code{DOES>}, details -doc-@local# -doc-f@local# -doc-laddr# -doc-lp+!# -doc-lp! -doc->l -doc-f>l +doc-does> -In addition to these primitives, some specializations of these -primitives for commonly occurring inline arguments are provided for -efficiency reasons, e.g., @code{@@local0} as specialization of -@code{@@local#} for the inline argument 0. The following compiling words -compile the right specialized version, or the general version, as -appropriate: +@cindex @code{DOES>} in a separate definition +This means that you need not use @code{CREATE} and @code{DOES>} in the +same definition; you can put the @code{DOES>}-part in a separate +definition. This allows us to, e.g., select among different DOES>-parts: +@example +: does1 +DOES> ( ... -- ... ) + ... ; -doc-compile-@local -doc-compile-f@local -doc-compile-lp+! +: does2 +DOES> ( ... -- ... ) + ... ; -Combinations of conditional branches and @code{lp+!#} like -@code{?branch-lp+!#} (the locals pointer is only changed if the branch -is taken) are provided for efficiency and correctness in loops. +: def-word ( ... -- ... ) + create ... + IF + does1 + ELSE + does2 + ENDIF ; +@end example -A special area in the dictionary space is reserved for keeping the -local variable names. @code{@{} switches the dictionary pointer to this -area and @code{@}} switches it back and generates the locals -initializing code. @code{W:} etc.@ are normal defining words. This -special area is cleared at the start of every colon definition. +In this example, the selection of whether to use @code{does1} or +@code{does2} is made at compile-time; at the time that the child word is +@code{Create}d. -@cindex word list for defining locals -A special feature of Gforth's dictionary is used to implement the -definition of locals without type specifiers: every word list (aka -vocabulary) has its own methods for searching -etc. (@pxref{Word Lists}). For the present purpose we defined a word list -with a special search method: When it is searched for a word, it -actually creates that word using @code{W:}. @code{@{} changes the search -order to first search the word list containing @code{@}}, @code{W:} etc., -and then the word list for defining locals without type specifiers. +@cindex @code{DOES>} in interpretation state +In a standard program you can apply a @code{DOES>}-part only if the last +word was defined with @code{CREATE}. In Gforth, the @code{DOES>}-part +will override the behaviour of the last word defined in any case. In a +standard program, you can use @code{DOES>} only in a colon +definition. In Gforth, you can also use it in interpretation state, in a +kind of one-shot mode; for example: +@example +CREATE name ( ... -- ... ) + @var{initialization} +DOES> + @var{code} ; +@end example -The lifetime rules support a stack discipline within a colon -definition: The lifetime of a local is either nested with other locals -lifetimes or it does not overlap them. +@noindent +is equivalent to the standard: +@example +:noname +DOES> + @var{code} ; +CREATE name EXECUTE ( ... -- ... ) + @var{initialization} +@end example -At @code{BEGIN}, @code{IF}, and @code{AHEAD} no code for locals stack -pointer manipulation is generated. Between control structure words -locals definitions can push locals onto the locals stack. @code{AGAIN} -is the simplest of the other three control flow words. It has to -restore the locals stack depth of the corresponding @code{BEGIN} -before branching. The code looks like this: -@format -@code{lp+!#} current-locals-size @minus{} dest-locals-size -@code{branch} -@end format +You can get the address of the body of a word with: -@code{UNTIL} is a little more complicated: If it branches back, it -must adjust the stack just like @code{AGAIN}. But if it falls through, -the locals stack must not be changed. The compiler generates the -following code: -@format -@code{?branch-lp+!#} current-locals-size @minus{} dest-locals-size -@end format -The locals stack pointer is only adjusted if the branch is taken. +doc->body -@code{THEN} can produce somewhat inefficient code: -@format -@code{lp+!#} current-locals-size @minus{} orig-locals-size -: -@code{lp+!#} orig-locals-size @minus{} new-locals-size -@end format -The second @code{lp+!#} adjusts the locals stack pointer from the -level at the @var{orig} point to the level after the @code{THEN}. The -first @code{lp+!#} adjusts the locals stack pointer from the current -level to the level at the orig point, so the complete effect is an -adjustment from the current level to the right level after the -@code{THEN}. +@node Supplying names, Interpretation and Compilation Semantics, User-defined Defining Words, Defining Words +@subsection Supplying names for the defined words +@cindex names for defined words +@cindex defining words, name parameter -@cindex locals information on the control-flow stack -@cindex control-flow stack items, locals information -In a conventional Forth implementation a dest control-flow stack entry -is just the target address and an orig entry is just the address to be -patched. Our locals implementation adds a word list to every orig or dest -item. It is the list of locals visible (or assumed visible) at the point -described by the entry. Our implementation also adds a tag to identify -the kind of entry, in particular to differentiate between live and dead -(reachable and unreachable) orig entries. +@cindex defining words, name given in a string +By default, defining words take the names for the defined words from the +input stream. Sometimes you want to supply the name from a string. You +can do this with: -A few unusual operations have to be performed on locals word lists: +doc-nextname -doc-common-list -doc-sub-list? -doc-list-size +For example: -Several features of our locals word list implementation make these -operations easy to implement: The locals word lists are organised as -linked lists; the tails of these lists are shared, if the lists -contain some of the same locals; and the address of a name is greater -than the address of the names behind it in the list. +@example +s" foo" nextname create +@end example +@noindent +is equivalent to: +@example +create foo +@end example -Another important implementation detail is the variable -@code{dead-code}. It is used by @code{BEGIN} and @code{THEN} to -determine if they can be reached directly or only through the branch -that they resolve. @code{dead-code} is set by @code{UNREACHABLE}, -@code{AHEAD}, @code{EXIT} etc., and cleared at the start of a colon -definition, by @code{BEGIN} and usually by @code{THEN}. +@cindex defining words without name +Sometimes you want to define an @var{anonymous word}; a word without a +name. You can do this with: -Counted loops are similar to other loops in most respects, but -@code{LEAVE} requires special attention: It performs basically the same -service as @code{AHEAD}, but it does not create a control-flow stack -entry. Therefore the information has to be stored elsewhere; -traditionally, the information was stored in the target fields of the -branches created by the @code{LEAVE}s, by organizing these fields into a -linked list. Unfortunately, this clever trick does not provide enough -space for storing our extended control flow information. Therefore, we -introduce another stack, the leave stack. It contains the control-flow -stack entries for all unresolved @code{LEAVE}s. +doc-:noname -Local names are kept until the end of the colon definition, even if -they are no longer visible in any control-flow path. In a few cases -this may lead to increased space needs for the locals name area, but -usually less than reclaiming this space would cost in code size. +This leaves the execution token for the word on the stack after the +closing @code{;}. Here's an example in which a deferred word is +initialised with an @code{xt} from an anonymous colon definition: +@example +Defer deferred +:noname ( ... -- ... ) + ... ; +IS deferred +@end example +Gforth provides an alternative way of doing this, using two separate +words: -@node ANS Forth locals, , Gforth locals, Locals -@subsection ANS Forth locals -@cindex locals, ANS Forth style +doc-noname +@cindex execution token of last defined word +doc-lastxt -The ANS Forth locals wordset does not define a syntax for locals, but -words that make it possible to define various syntaxes. One of the -possible syntaxes is a subset of the syntax we used in the Gforth locals -wordset, i.e.: +The previous example can be rewritten using @code{noname} and +@code{lastxt}: @example -@{ local1 local2 ... -- comment @} -@end example -@noindent -or -@example -@{ local1 local2 ... @} +Defer deferred +noname : ( ... -- ... ) + ... ; +lastxt IS deferred @end example -The order of the locals corresponds to the order in a stack comment. The -restrictions are: +@code{lastxt} also works when the last word was not defined as +@code{noname}. -@itemize @bullet -@item -Locals can only be cell-sized values (no type specifiers are allowed). -@item -Locals can be defined only outside control structures. -@item -Locals can interfere with explicit usage of the return stack. For the -exact (and long) rules, see the standard. If you don't use return stack -accessing words in a definition using locals, you will be all right. The -purpose of this rule is to make locals implementation on the return -stack easier. -@item -The whole definition must be in one line. -@end itemize -Locals defined in this way behave like @code{VALUE}s (@xref{Simple -Defining Words}). I.e., they are initialized from the stack. Using their -name produces their value. Their value can be changed using @code{TO}. - -Since this syntax is supported by Gforth directly, you need not do -anything to use it. If you want to port a program using this syntax to -another ANS Forth system, use @file{compat/anslocal.fs} to implement the -syntax on the other system. +@node Interpretation and Compilation Semantics, , Supplying names, Defining Words +@subsection Interpretation and Compilation Semantics +@cindex semantics, interpretation and compilation -Note that a syntax shown in the standard, section A.13 looks -similar, but is quite different in having the order of locals -reversed. Beware! +@cindex interpretation semantics +The @dfn{interpretation semantics} of a word are what the text +interpreter does when it encounters the word in interpret state. It also +appears in some other contexts, e.g., the execution token returned by +@code{' @var{word}} identifies the interpretation semantics of +@var{word} (in other words, @code{' @var{word} execute} is equivalent to +interpret-state text interpretation of @code{@var{word}}). -The ANS Forth locals wordset itself consists of a word: +@cindex compilation semantics +The @dfn{compilation semantics} of a word are what the text interpreter +does when it encounters the word in compile state. It also appears in +other contexts, e.g, @code{POSTPONE @var{word}} compiles@footnote{In +standard terminology, ``appends to the current definition''.} the +compilation semantics of @var{word}. -doc-(local) +@cindex execution semantics +The standard also talks about @dfn{execution semantics}. They are used +only for defining the interpretation and compilation semantics of many +words. By default, the interpretation semantics of a word are to +@code{execute} its execution semantics, and the compilation semantics of +a word are to @code{compile,} its execution semantics.@footnote{In +standard terminology: The default interpretation semantics are its +execution semantics; the default compilation semantics are to append its +execution semantics to the execution semantics of the current +definition.} -The ANS Forth locals extension wordset defines a syntax using @code{locals|}, but it is so -awful that we strongly recommend not to use it. We have implemented this -syntax to make porting to Gforth easy, but do not document it here. The -problem with this syntax is that the locals are defined in an order -reversed with respect to the standard stack comment notation, making -programs harder to read, and easier to misread and miswrite. The only -merit of this syntax is that it is easy to implement using the ANS Forth -locals wordset. +@comment TODO expand, make it co-operate with new sections on text interpreter. -@node Defining Words, The Text Interpreter, Locals, Words -@section Defining Words -@cindex defining words +@cindex immediate words +@cindex compile-only words +You can change the semantics of the most-recently defined word: -@menu -* Simple Defining Words:: -* Colon Definitions:: -* User-defined Defining Words:: -* Supplying names:: -* Interpretation and Compilation Semantics:: -@end menu +doc-immediate +doc-compile-only +doc-restrict -@node Simple Defining Words, Colon Definitions, Defining Words, Defining Words -@subsection Simple Defining Words -@cindex simple defining words -@cindex defining words, simple +Note that ticking (@code{'}) a compile-only word gives an error +(``Interpreting a compile-only word''). -doc-constant -doc-2constant -doc-fconstant -doc-variable -doc-2variable -doc-fvariable -doc-create -doc-user -doc-value -doc-to -doc-defer -doc-is +Gforth also allows you to define words with arbitrary combinations of +interpretation and compilation semantics. -Definitions in ANS Standard Forth for @code{defer}, @code{} and -@code{[is]} are provided in @file{compat/defer.fs}. TODO - what do -the two is words do? +doc-interpret/compile: -@node Colon Definitions, User-defined Defining Words, Simple Defining Words, Defining Words -@subsection Colon Definitions -@cindex colon definitions +This feature was introduced for implementing @code{TO} and @code{S"}. I +recommend that you do not define such words, as cute as they may be: +they make it hard to get at both parts of the word in some contexts. +E.g., assume you want to get an execution token for the compilation +part. Instead, define two words, one that embodies the interpretation +part, and one that embodies the compilation part. Once you have done +that, you can define a combined word with @code{interpret/compile:} for +the convenience of your users. +You might try to use this feature to provide an optimizing +implementation of the default compilation semantics of a word. For +example, by defining: @example -: name ( ... -- ... ) - word1 word2 word3 ; +:noname + foo bar ; +:noname + POSTPONE foo POSTPONE bar ; +interpret/compile: foobar @end example -creates a word called @code{name}, that, upon execution, executes -@code{word1 word2 word3}. @code{name} is a @dfn{(colon) definition}. - -The explanation above is somewhat superficial. @xref{Interpretation and -Compilation Semantics} for an in-depth discussion of some of the issues -involved. - -doc-: -doc-; - -@node User-defined Defining Words, Supplying names, Colon Definitions, Defining Words -@subsection User-defined Defining Words -@cindex user-defined defining words -@cindex defining words, user-defined +@noindent +as an optimizing version of: -You can create new defining words simply by wrapping defining-time code -around existing defining words and putting the sequence in a colon -definition. +@example +: foobar + foo bar ; +@end example -@comment TODO example +Unfortunately, this does not work correctly with @code{[compile]}, +because @code{[compile]} assumes that the compilation semantics of all +@code{interpret/compile:} words are non-default. I.e., @code{[compile] +foobar} would compile the compilation semantics for the optimizing +@code{foobar}, whereas it would compile the interpretation semantics for +the non-optimizing @code{foobar}. -@cindex @code{CREATE} ... @code{DOES>} -If you want the words defined with your defining words to behave -differently from words defined with standard defining words, you can -write your defining word like this: +@cindex state-smart words (are a bad idea) +Some people try to use @var{state-smart} words to emulate the feature provided +by @code{interpret/compile:} (words are state-smart if they check +@code{STATE} during execution). E.g., they would try to code +@code{foobar} like this: @example -: def-word ( "name" -- ) - Create @var{code1} -DOES> ( ... -- ... ) - @var{code2} ; - -def-word name +: foobar + STATE @@ + IF ( compilation state ) + POSTPONE foo POSTPONE bar + ELSE + foo bar + ENDIF ; immediate @end example -Technically, this fragment defines a defining word @code{def-word}, and -a word @code{name}; when you execute @code{name}, the address of the -body of @code{name} is put on the data stack and @var{code2} is executed -(the address of the body of @code{name} is the address @code{HERE} -returns immediately after the @code{CREATE}). The word @code{name} is -sometimes called a @var{child} of @code{def-word}. +Although this works if @code{foobar} is only processed by the text +interpreter, it does not work in other contexts (like @code{'} or +@code{POSTPONE}). E.g., @code{' foobar} will produce an execution token +for a state-smart word, not for the interpretation semantics of the +original @code{foobar}; when you execute this execution token (directly +with @code{EXECUTE} or indirectly through @code{COMPILE,}) in compile +state, the result will not be what you expected (i.e., it will not +perform @code{foo bar}). State-smart words are a bad idea. Simply don't +write them@footnote{For a more detailed discussion of this topic, see +@cite{@code{State}-smartness -- Why it is Evil and How to Exorcise it} by Anton +Ertl; presented at EuroForth '98 and available from +@url{http://www.complang.tuwien.ac.at/papers/}}! -In other words, if you make the following definitions: +@cindex defining words with arbitrary semantics combinations +It is also possible to write defining words that define words with +arbitrary combinations of interpretation and compilation semantics. In +general, they look like this: @example -: def-word1 ( "name" -- ) - Create @var{code1} ; - -: action1 ( ... -- ... ) - @var{code2} ; - -def-word name1 +: def-word + create-interpret/compile + @var{code1} +interpretation> + @var{code2} + + @var{code3} + ( -- w ) - @@ ; +: constant ( n "name" -- ) + create-interpret/compile + , +interpretation> ( -- n ) + @@ + ( compilation. -- ; run-time. -- n ) + @@ postpone literal + +doc- +doc-body} also gives you the body of a word created with +@code{create-interpret/compile}. -@cindex stack effect of @code{DOES>}-parts -@cindex @code{DOES>}-parts, stack effect -In the example above the stack comment after the @code{DOES>} specifies -the stack effect of the defined words, not the stack effect of the -following code (the following code expects the address of the body on -the top of stack, which is not reflected in the stack comment). This is -the convention that I use and recommend (it clashes a bit with using -locals declarations for stack effect specification, though). +@c ---------------------------------------------------------- +@node The Text Interpreter, Tokens for Words, Defining Words, Words +@section The Text Interpreter +@cindex interpreter - outer +@cindex text interpreter +@cindex outer interpreter -@subsubsection Applications of @code{CREATE..DOES>} -@cindex @code{CREATE} ... @code{DOES>}, applications +Intro blah. -You may wonder how to use this feature. Here are some usage patterns: +@comment TODO -@cindex factoring similar colon definitions -When you see a sequence of code occurring several times, and you can -identify a meaning, you will factor it out as a colon definition. When -you see similar colon definitions, you can factor them using -@code{CREATE..DOES>}. E.g., an assembler usually defines several words -that look very similar: -@example -: ori, ( reg-target reg-source n -- ) - 0 asm-reg-reg-imm ; -: andi, ( reg-target reg-source n -- ) - 1 asm-reg-reg-imm ; -@end example - -@noindent -This could be factored with: -@example -: reg-reg-imm ( op-code -- ) - CREATE , -DOES> ( reg-target reg-source n -- ) - @@ asm-reg-reg-imm ; - -0 reg-reg-imm ori, -1 reg-reg-imm andi, -@end example - -@cindex currying -Another view of @code{CREATE..DOES>} is to consider it as a crude way to -supply a part of the parameters for a word (known as @dfn{currying} in -the functional language community). E.g., @code{+} needs two -parameters. Creating versions of @code{+} with one parameter fixed can -be done like this: -@example -: curry+ ( n1 -- ) - CREATE , -DOES> ( n2 -- n1+n2 ) - @@ + ; +doc->in +doc-tib +doc-#tib +doc-span +doc-restore-input +doc-save-input +doc-source +doc-source-id - 3 curry+ 3+ --2 curry+ 2- -@end example -@subsubsection The gory details of @code{CREATE..DOES>} -@cindex @code{CREATE} ... @code{DOES>}, details +@menu +* Number Conversion:: +* Interpret/Compile states:: +* Literals:: +* Interpreter Directives:: +@end menu -doc-does> +@comment TODO -@cindex @code{DOES>} in a separate definition -This means that you need not use @code{CREATE} and @code{DOES>} in the -same definition; you can put the @code{DOES>}-part in a separate -definition. This allows us to, e.g., select among different DOES>-parts: -@example -: does1 -DOES> ( ... -- ... ) - ... ; +The text interpreter works on input one line at a time. Starting at +the beginning of the line, it skips leading spaces (called +@var{delimiters}) then parses a string (a sequence of non-space +characters) until it either reaches a space character or it +reaches the end of the line. Having parsed a string, it then makes two +attempts to do something with it: -: does2 -DOES> ( ... -- ... ) - ... ; +* It looks the string up in a dictionary of definitions. If the string + is found in the dictionary, the string names a @var{definition} (also + known as a @var{word}) and the dictionary search will return an + @var{Execution token} (xt) for the definition and some flags that show + when the definition can be used legally. If the definition can be + legally executed in @var{Interpret} mode then the text interpreter will + use the xt to execute it, otherwise it will issue an error + message. The dictionary is described in more detail in . -: def-word ( ... -- ... ) - create ... - IF - does1 - ELSE - does2 - ENDIF ; -@end example +* If the string is not found in the dictionary, the text interpreter + attempts to treat it as a number in the current radix (base 10 after + initial startup). If the string represents a legal number in the + current radix, the number is pushed onto the appropriate parameter + stack. Stacks are discussed in more detail in . Number + conversion is described in more detail in
. -In this example, the selection of whether to use @code{does1} or -@code{does2} is made at compile-time; at the time that the child word is -@code{Create}d. +If both of these attempts fail, the remainer of the input line is +discarded and the text interpreter isses an error message. If one of +these attempts succeeds, the text interpreter repeats the parsing +process until the end of the line has been reached. At this point, +it prints the status message `` ok'' and waits for more input. -@cindex @code{DOES>} in interpretation state -In a standard program you can apply a @code{DOES>}-part only if the last -word was defined with @code{CREATE}. In Gforth, the @code{DOES>}-part -will override the behaviour of the last word defined in any case. In a -standard program, you can use @code{DOES>} only in a colon -definition. In Gforth, you can also use it in interpretation state, in a -kind of one-shot mode; for example: -@example -CREATE name ( ... -- ... ) - @var{initialization} -DOES> - @var{code} ; -@end example +There are two important things to note about the behaviour of the text +interpreter: -@noindent -is equivalent to the standard: -@example -:noname -DOES> - @var{code} ; -CREATE name EXECUTE ( ... -- ... ) - @var{initialization} -@end example +* it processes each input string to completion before parsing + additional characters from the input line. -You can get the address of the body of a word with: +* it keeps track of its position in the input line using a variable + (called >IN, pronounced ``to-in''). The value of >IN can be modified + by the execution of definitions in the input line. This means that + definitions can ``trick'' the text interpreter either into skipping + sections of the input line or into parsing a section of the + input line more than once. -doc->body -@node Supplying names, Interpretation and Compilation Semantics, User-defined Defining Words, Defining Words -@subsection Supplying names for the defined words -@cindex names for defined words -@cindex defining words, name parameter +@node Number Conversion, Interpret/Compile states, The Text Interpreter, The Text Interpreter +@subsection Number Conversion +@cindex number conversion +@cindex double-cell numbers, input format +@cindex input format for double-cell numbers +@cindex single-cell numbers, input format +@cindex input format for single-cell numbers +@cindex floating-point numbers, input format +@cindex input format for floating-point numbers -@cindex defining words, name given in a string -By default, defining words take the names for the defined words from the -input stream. Sometimes you want to supply the name from a string. You -can do this with: +If the text interpreter fails to find a particular string in the name +dictionary, it attempts to convert it to a number using a set of rules. -doc-nextname +Let represent any character that is a legal digit in the current +number base (for example, 0-9 when the number base is decimal or 0-9, A-F +when the number base is hexadecimal). -For example: +Let represent any character in the range 0-9. -@example -s" foo" nextname create -@end example -@noindent -is equivalent to: -@example -create foo -@end example +@comment TODO need to extend the next defn to support fp format +Let @{+ | -@} represent the optional presence of either a @code{+} or +@code{-} character. -@cindex defining words without name -Sometimes you want to define an @var{anonymous word}; a word without a -name. You can do this with: +Let * represent any number of instances of the previous character +(including none). -doc-:noname +Let any other character represent itself. -This leaves the execution token for the word on the stack after the -closing @code{;}. Here's an example in which a deferred word is -initialised with an @code{xt} from an anonymous colon definition: -@example -Defer deferred -:noname ( ... -- ... ) - ... ; -IS deferred -@end example +Now, the conversion rules are: -Gforth provides an alternative way of doing this, using two separate -words: +@itemize @bullet +@item +A string of the form * is treated as a single-precision +(CELL-sized) positive integer. Examples are 0 123 6784532 32343212343456 42 +@item +A string of the form -* is treated as a single-precision +(CELL-sized) negative integer, and is represented using 2's-complement +arithmetic. Examples are -45 -5681 -0 +@item +A string of the form *.* is treated as a double-precision +(double-CELL-sized) positive integer. Examples are 3465. 3.465 34.65 +(and note that these all represent the same number). +@item +A string of the form -*.* is treated as a +double-precision (double-CELL-sized) negative integer, and is +represented using 2's-complement arithmetic. Examples are -3465. -3.465 +-34.65 (and note that these all represent the same number). +@item +A string of the form @{+ | -@}@{.@}*@{e | E@}@{+ +| -@}* is treated as floating-point +number. Examples are 1e0 1.e 1.e0 +1e+0 (which all represent the same +number) +12.E-4 +@end itemize -doc-noname -@cindex execution token of last defined word -doc-lastxt +By default, the number base used for integer number conversion is given +by the contents of a variable named @code{BASE}. Base 10 (decimal) is +always used for floating-point number conversion. -The previous example can be rewritten using @code{noname} and -@code{lastxt}: +doc-base +doc-hex +doc-decimal -@example -Defer deferred -noname : ( ... -- ... ) - ... ; -lastxt IS deferred -@end example +@cindex '-prefix for character strings +@cindex &-prefix for decimal numbers +@cindex %-prefix for binary numbers +@cindex $-prefix for hexadecimal numbers +Gforth allows you to override the value of @code{BASE} by using a prefix +before the first digit of an (integer) number. Four prefixes are +supported: -@code{lastxt} also works when the last word was not defined as -@code{noname}. +@itemize @bullet +@item +@code{&} -- decimal number +@item +@code{%} -- binary number +@item +@code{$} -- hexadecimal number +@item +@code{'} -- base 256 number +@end itemize +Here are some examples, with the equivalent decimal number shown after +in braces: -@node Interpretation and Compilation Semantics, , Supplying names, Defining Words -@subsection Interpretation and Compilation Semantics -@cindex semantics, interpretation and compilation +-$41 (-65), %1001101 (205), %1001.0001 (145 - a double-precision number), +'AB (16706; ascii A is 65, ascii B is 66, number is 65*256 + 66), +'ab (24930; ascii a is 97, ascii B is 98, number is 97*256 + 98), +&905 (905), $abc (2478), $ABC (2478). -@cindex interpretation semantics -The @dfn{interpretation semantics} of a word are what the text -interpreter does when it encounters the word in interpret state. It also -appears in some other contexts, e.g., the execution token returned by -@code{' @var{word}} identifies the interpretation semantics of -@var{word} (in other words, @code{' @var{word} execute} is equivalent to -interpret-state text interpretation of @code{@var{word}}). +@cindex number conversion - traps for the unwary +Number conversion has a number of traps for the unwary: -@cindex compilation semantics -The @dfn{compilation semantics} of a word are what the text interpreter -does when it encounters the word in compile state. It also appears in -other contexts, e.g, @code{POSTPONE @var{word}} compiles@footnote{In -standard terminology, ``appends to the current definition''.} the -compilation semantics of @var{word}. - -@cindex execution semantics -The standard also talks about @dfn{execution semantics}. They are used -only for defining the interpretation and compilation semantics of many -words. By default, the interpretation semantics of a word are to -@code{execute} its execution semantics, and the compilation semantics of -a word are to @code{compile,} its execution semantics.@footnote{In -standard terminology: The default interpretation semantics are its -execution semantics; the default compilation semantics are to append its -execution semantics to the execution semantics of the current -definition.} +@itemize @bullet +@item +You cannot determine the current number base using the code sequence +@code{BASE @@ .} -- the number base is always 10 in the current number +base. Instead, use something like @code{BASE @@ DECIMAL DUP . BASE !} +@item +If the number base is set to a value greater than 14 (for example, +hexadecimal), the number 123E4 is ambiguous; the conversion rules allow +it to be intepreted as either a single-precision integer or a +floating-point number (Gforth treats it as an integer). The ambiguity +can be resolved by explicitly stating the sign of the mantissa and/or +exponent: 123E+4 or +123E4 -- if the number base is decimal, no +ambiguity arises; either representation will be treated as a +floating-point number. +@item +There is a word @code{bin} but it does @var{not} set the number base! +It is used to specify file types. +@item +ANS Forth requires the @code{.} of a double-precision number to +be the final character in the string. Allowing the @code{.} to be +anywhere after the first digit is a Gforth extension. +@item +The number conversion process does not check for overflow. +@item +In Gforth, number conversion to floating-point numbers always use base +10, irrespective of the value of @code{BASE}. In ANS Forth, +conversion to floating-point numbers whilst the value of +@code{BASE} is not 10 is an ambiguous condition. +@end itemize -@comment TODO expand, make it co-operate with new sections on text interpreter. -@cindex immediate words -You can change the compilation semantics into @code{execute}ing the -execution semantics with +@node Interpret/Compile states, Literals, Number Conversion, The Text Interpreter +@subsection Interpret/Compile states +@cindex Interpret/Compile states -doc-immediate +@comment TODO Intro blah. -@cindex compile-only words -You can remove the interpretation semantics of a word with +doc-state +doc-[ +doc-] -doc-compile-only -doc-restrict -Note that ticking (@code{'}) compile-only words gives an error -(``Interpreting a compile-only word''). +@node Literals, Interpreter Directives, Interpret/Compile states, The Text Interpreter +@subsection Literals +@cindex Literals -Gforth also allows you to define words with arbitrary combinations of -interpretation and compilation semantics. +@comment TODO Intro blah. -doc-interpret/compile: +doc-literal +doc-]L +doc-2literal +doc-fliteral -This feature was introduced for implementing @code{TO} and @code{S"}. I -recommend that you do not define such words, as cute as they may be: -they make it hard to get at both parts of the word in some contexts. -E.g., assume you want to get an execution token for the compilation -part. Instead, define two words, one that embodies the interpretation -part, and one that embodies the compilation part. Once you have done -that, you can define a combined word with @code{interpret/compile:} for -the convenience of your users. +@node Interpreter Directives, ,Literals, The Text Interpreter +@subsection Interpreter Directives +@cindex interpreter directives -You also might try to with this feature, like this: +These words are usually used outside of definitions; for example, to +control which parts of a source file are processed by the text +interpreter. There are only a few ANS Forth Standard words, but Gforth +supplements these with a rich set of immediate control structure words +to compensate for the fact that the non-immediate versions can only be +used in compile state (@pxref{Control Structures}). -You might try to use this feature to provide an optimizing -implementation of the default compilation semantics of a word. For -example, by defining: -@example -:noname - foo bar ; -:noname - POSTPONE foo POSTPONE bar ; -interpret/compile: foobar -@end example +doc-[IF] +doc-[ELSE] +doc-[THEN] +doc-[ENDIF] -@noindent -as an optimizing version of: +doc-[IFDEF] +doc-[IFUNDEF] -@example -: foobar - foo bar ; -@end example +doc-[?DO] +doc-[DO] +doc-[FOR] +doc-[LOOP] +doc-[+LOOP] +doc-[NEXT] -Unfortunately, this does not work correctly with @code{[compile]}, -because @code{[compile]} assumes that the compilation semantics of all -@code{interpret/compile:} words are non-default. I.e., @code{[compile] -foobar} would compile the compilation semantics for the optimizing -@code{foobar}, whereas it would compile the interpretation semantics for -the non-optimizing @code{foobar}. +doc-[BEGIN] +doc-[UNTIL] +doc-[AGAIN] +doc-[WHILE] +doc-[REPEAT] -@cindex state-smart words (are a bad idea) -Some people try to use @var{state-smart} words to emulate the feature provided -by @code{interpret/compile:} (words are state-smart if they check -@code{STATE} during execution). E.g., they would try to code -@code{foobar} like this: +@c ------------------------------------------------------------- +@node Tokens for Words, Word Lists, The Text Interpreter, Words +@section Tokens for Words +@cindex tokens for words -@example -: foobar - STATE @@ - IF ( compilation state ) - POSTPONE foo POSTPONE bar - ELSE - foo bar - ENDIF ; immediate -@end example +This chapter describes the creation and use of tokens that represent +words on the stack (and in data space). -Although this works if @code{foobar} is only processed by the text -interpreter, it does not work in other contexts (like @code{'} or -@code{POSTPONE}). E.g., @code{' foobar} will produce an execution token -for a state-smart word, not for the interpretation semantics of the -original @code{foobar}; when you execute this execution token (directly -with @code{EXECUTE} or indirectly through @code{COMPILE,}) in compile -state, the result will not be what you expected (i.e., it will not -perform @code{foo bar}). State-smart words are a bad idea. Simply don't -write them@footnote{For a more detailed discussion of this topic, see -@cite{@code{State}-smartness -- Why it is Evil and How to Exorcise it} by Anton -Ertl; presented at EuroForth '98 and available from -@url{http://www.complang.tuwien.ac.at/papers/}}! +Named words have interpretation and compilation semantics. Unnamed words +just have execution semantics. -@cindex defining words with arbitrary semantics combinations -It is also possible to write defining words that define words with -arbitrary combinations of interpretation and compilation semantics. In -general, they look like this: +@comment TODO ?normally interpretation semantics are the execution semantics. +@comment this should all be covered in earlier ss -@example -: def-word - create-interpret/compile - @var{code1} -interpretation> - @var{code2} - - @var{code3} - ( -- n ) - @@ - ( compilation. -- ; run-time. -- n ) - @@ postpone literal - -doc- -doc-body} also gives you the body of a word created with -@code{create-interpret/compile}. +doc-['] +doc-' -@c ---------------------------------------------------------- -@node The Text Interpreter, Structures, Defining Words, Words -@section The Text Interpreter -@cindex interpreter - outer -@cindex text interpreter -@cindex outer interpreter +For literals, you use @code{'} in interpreted code and @code{[']} in +compiled code. Gforth's @code{'} and @code{[']} behave somewhat unusually +by complaining about compile-only words. To get an execution token for a +compiling word @var{X}, use @code{COMP' @var{X} drop} or @code{[COMP'] +@var{X} drop}. -Intro blah. +@cindex compilation token +The compilation semantics are represented by a @dfn{compilation token} +consisting of two cells: @var{w xt}. The top cell @var{xt} is an +execution token. The compilation semantics represented by the +compilation token can be performed with @code{execute}, which consumes +the whole compilation token, with an additional stack effect determined +by the represented compilation semantics. -@comment TODO +doc-[comp'] +doc-comp' -doc->in -doc-tib -doc-#tib -doc-span -doc-restore-input -doc-save-input -doc-source -doc-source-id +You can compile the compilation semantics with @code{postpone,}. I.e., +@code{COMP' @var{word} POSTPONE,} is equivalent to @code{POSTPONE +@var{word}}. +doc-postpone, -@menu -* Number Conversion:: -* Interpret/Compile states:: -* Literals:: -* Interpreter Directives:: -@end menu +At present, the @var{w} part of a compilation token is an execution +token, and the @var{xt} part represents either @code{execute} or +@code{compile,}. However, don't rely on that knowledge, unless necessary; +we may introduce unusual compilation tokens in the future (e.g., +compilation tokens representing the compilation semantics of literals). -@comment TODO +@cindex name token +@cindex name field address +@cindex NFA +Named words are also represented by the @dfn{name token}, (@var{nt}). The abstract +data type @emph{name token} is implemented as a name field address (NFA). -The text interpreter works on input one line at a time. Starting at -the beginning of the line, it skips leading spaces (called -"delimiters") then parses a string (a sequence of non-space -characters) until it either reaches a space character or it -reaches the end of the line. Having parsed a string, it then makes two -attempts to do something with it: +doc-find-name +doc-name>int +doc-name?int +doc-name>comp +doc-name>string -* It looks the string up in a dictionary of definitions. If the string - is found in the dictionary, the string names a "definition" (also - known as a "word") and the dictionary search will return an - "Execution token" (xt) for the definition and some flags that show - when the definition can be used legally. If the definition can be - legally executed in "Interpret" mode then the text interpreter will - use the xt to execute it, otherwise it will issue an error - message. The dictionary is described in more detail in . +@c ------------------------------------------------------------- +@node Word Lists, Environmental Queries, Tokens for Words, Words +@section Word Lists +@cindex word lists +@cindex name dictionary -* If the string is not found in the dictionary, the text interpreter - attempts to treat it as a number in the current radix (base 10 after - initial startup). If the string represents a legal number in the - current radix, the number is pushed onto the appropriate parameter - stack. Stacks are discussed in more detail in . Number - conversion is described in more detail in
. +@cindex wid +All definitions other than those created by @code{:noname} have an entry +in the name dictionary. The name dictionary is fragmented into a number +of parts, called @var{word lists}. A word list is identified by a +cell-sized word list identifier (@var{wid}) in much the same way as a +file is identified by a file handle. The numerical value of the wid has +no (portable) meaning, and might change from session to session. -If both of these attempts fail, the remainer of the input line is -discarded and the text interpreter isses an error message. If one of -these attempts succeeds, the text interpreter repeats the parsing -process until the end of the line has been reached. At this point, -it prints the status message " ok" and waits for more input. +@cindex compilation word list +At any one time, a single word list is defined as the word list to which +all new definitions will be added -- this is called the @var{compilation +word list}. When Gforth is started, the compilation word list is the +word list called @code{FORTH-WORDLIST}. -There are two important things to note about the behaviour of the text -interpreter: +@cindex search order stack +Forth maintains a stack of word lists, representing the @var{search +order}. When the name dictionary is searched (for example, when +attempting to find a word's execution token during compilation), only +those word lists that are currently in the search order are +searched. The most recently-defined word in the word list at the top of +the word list stack is searched first, and the search proceeds until +either the word is located or the oldest definition in the word list at +the bottom of the stack is reached. Definitions of the word may exist in +more than one word lists; the search order determines which version will +be found. -* it processes each input string to completion before parsing - additional characters from the input line. +The ANS Forth Standard ``Search order'' word set is intended to provide a +set of low-level tools that allow various different schemes to be +implemented. Gforth provides @code{vocabulary}, a traditional Forth +word. @file{compat/vocabulary.fs} provides an implementation in ANS +Standard Forth. -* it keeps track of its position in the input line using a variable - (called >IN, pronounced "to-in"). The value of >IN can be modified - by the execution of definitions in the input line. This means that - definitions can "trick" the text interpreter either into skipping - sections of the input line or into parsing a section of the - input line more than once. +TODO: locals section refers to here, saying that every word list (aka +vocabulary) has its own methods for searching etc. Need to document that. +doc-forth-wordlist +doc-definitions +doc-get-current +doc-set-current -@node Number Conversion, Interpret/Compile states, The Text Interpreter, The Text Interpreter -@subsection Number Conversion -@cindex Number conversion -@cindex double-cell numbers, input format -@cindex input format for double-cell numbers -@cindex single-cell numbers, input format -@cindex input format for single-cell numbers -@cindex floating-point numbers, input format -@cindex input format for floating-point numbers +@comment TODO when a defn (like set-order) is instanced twice, the second instance gets documented. +@comment In general that might be fine, but in this example (search.fs) the second instance is an +@comment alias, so it would not naturally have documentation +@comment .. the fix to that is to add a specific prefix, like the object-orientation stuff does. -If the text interpreter fails to find a particular string in the name -dictionary, it attempts to convert it to a number using a set of rules. +doc-get-order +doc-set-order +doc-wordlist +doc-also +doc-forth +doc-only +doc-order +doc-previous -Let represent any character that is a legal digit in the current -number base (for example, 0-9 when the number base is decimal or 0-9, A-F -when the number base is hexadecimal). +doc-find +doc-search-wordlist -Let represent any character in the range 0-9. +doc-words +doc-vlist -@comment TODO need to extend the next defn to support fp format -Let @{+ | -@} represent the optional presence of either a @code{+} or -@code{-} character. +doc-mappedwordlist +doc-root +doc-vocabulary +doc-seal +doc-vocs +doc-current +doc-context -Let * represent any number of instances of the previous character -(including none). +@menu +* Why use word lists?:: +* Word list examples:: +@end menu -Let any other character represent itself. +@node Why use word lists?, Word list examples, Word Lists, Word Lists +@subsection Why use word lists? +@cindex word lists - why use them? -Now, the conversion rules are: +There are several reasons for using multiple word lists: @itemize @bullet @item -A string of the form * is treated as a single-precision -(CELL-sized) positive integer. Examples are 0 123 6784532 32343212343456 42 -@item -A string of the form -* is treated as a single-precision -(CELL-sized) negative integer, and is represented using 2's-complement -arithmetic. Examples are -45 -5681 -0 -@item -A string of the form *.* is treated as a double-precision -(double-CELL-sized) positive integer. Examples are 3465. 3.465 34.65 -(and note that these all represent the same number). +To improve compilation speed by reducing the number of name dictionary +entries that must be searched. This is achieved by creating a new +word list that contains all of the definitions that are used in the +definition of a Forth system but which would not usually be used by +programs running on that system. That word list would be on the search +list when the Forth system was compiled but would be removed from the +search list for normal operation. This can be a useful technique for +low-performance systems (for example, 8-bit processors in embedded +systems) but is unlikely to be necessary in high-performance desktop +systems. @item -A string of the form -*.* is treated as a -double-precision (double-CELL-sized) negative integer, and is -represented using 2's-complement arithmetic. Examples are -3465. -3.465 --34.65 (and note that these all represent the same number). +To prevent a set of words from being used outside the context in which +they are valid. Two classic examples of this are an integrated editor +(all of the edit commands are defined in a separate word list; the +search order is set to the editor word list when the editor is invoked; +the old search order is restored when the editor is terminated) and an +integrated assembler (the op-codes for the machine are defined in a +separate word list which is used when a @code{CODE} word is defined). @item -A string of the form @{+ | -@}@{.@}*@{e | E@}@{+ -| -@}* is treated as floating-point -number. Examples are 1e0 1.e 1.e0 +1e+0 (which all represent the same -number) +12.E-4 +To prevent a name-space clash between multiple definitions with the same +name. For example, when building a cross-compiler you might have a word +@code{IF} that generates conditional code for your target system. By +placing this definition in a different word list you can control whether +the host system's @code{IF} or the target system's @code{IF} get used in +any particular context by controlling the order of the word lists on the +search order stack. @end itemize -By default, the number base used for integer number conversion is given -by the contents of a variable named @code{BASE}. Base 10 (decimal) is -always used for floating-point number conversion. - -doc-base -doc-hex -doc-decimal +@node Word list examples, ,Why use word lists?, Word Lists +@subsection Word list examples +@cindex word lists - examples -@cindex '-prefix for character strings -@cindex &-prefix for decimal numbers -@cindex %-prefix for binary numbers -@cindex $-prefix for hexadecimal numbers -Gforth allows you to override the value of @code{BASE} by using a prefix -before the first digit of an (integer) number. Four prefixes are -supported: +Here is an example of creating and using a new wordlist using ANS +Forth Standard words: -@itemize @bullet -@item -@code{&} -- decimal number -@item -@code{%} -- binary number -@item -@code{$} -- hexadecimal number -@item -@code{'} -- base 256 number -@end itemize +@example +wordlist constant my-new-words-wordlist +: my-new-words get-order nip my-new-words-wordlist swap set-order ; -Here are some examples, with the equivalent decimal number shown after -in braces: +\ add it to the search order +also my-new-words --$41 (-65) %1001101 (205) %1001.0001 (145 - a double-precision number) -'AB (16706; ascii A is 65, ascii B is 66, number is 65*256 + 66) -'ab (24930; ascii a is 97, ascii B is 98, number is 97*256 + 98) -&905 (905) $abc (2478) $ABC (2478) +\ alternatively, add it to the search order and make it +\ the compilation word list +also my-new-words definitions +\ type "order" to see the problem +@end example -@cindex Number conversion - traps for the unwary -Number conversion has a number of traps for the unwary: +The problem with this example is that @code{order} has no way to +associate the name @code{my-new-words} with the wid of the word list (in +Gforth, @code{order} and @code{vocs} will display @code{???} for a wid +that has no associated name). There is no Standard way of associating a +name with a wid. -@itemize @bullet -@item -You cannot determine the current number base using the code sequence -@code{BASE @@ .} -- the number base is always 10 in the current number -base. Instead, use something like @code{BASE @@ DECIMAL DUP . BASE !} -@item -If the number base is set to a value greater than 14 (for example, -hexadecimal), the number 123E4 is ambiguous; the conversion rules allow -it to be intepreted as either a single-precision integer or a -floating-point number (Gforth treats it as an integer). The ambiguity -can be resolved by explicitly stating the sign of the mantissa and/or -exponent: 123E+4 or +123E4 -- if the number base is decimal, no -ambiguity arises; either representation will be treated as a -floating-point number. -@item -There is a word @code{bin} but it does @var{not} set the number base! -It is used to specify file types. -@item -ANS Forth Standard requires the @code{.} of a double-precision number to -be the final character in the string. Allowing the @code{.} to be -anywhere after the first digit is a Gforth extension. -@item -The number conversion process does not check for overflow. -@item -In Gforth, number conversion to floating-point numbers always use base -10, irrespective of the value of @code{BASE}. For the ANS Forth -Standard, conversion to floating-point numbers whilst the value of -@code{BASE} is not 10 is an ambiguous condition. -@end itemize +In Gforth, this example can be re-coded using @code{vocabulary}, which +associates a name with a wid: +@example +vocabulary my-new-words -@node Interpret/Compile states, Literals, Number Conversion, The Text Interpreter -@subsection Interpret/Compile states -@cindex Interpret/Compile states +\ add it to the search order +my-new-words -@comment TODO -Intro blah. +\ alternatively, add it to the search order and make it +\ the compilation word list +my-new-words definitions +\ type "order" to see that the problem is solved +@end example -doc-state -doc-[ -doc-] - - -@node Literals, Interpreter Directives, Interpret/Compile states, The Text Interpreter -@subsection Literals -@cindex Literals - -@comment TODO -Intro blah. - -doc-literal -doc-]L -doc-2literal -doc-fliteral - -@node Interpreter Directives, ,Literals, The Text Interpreter -@subsection Interpreter Directives -@cindex Interpreter Directives - -These words are usually used outside of definitions; for example, to -control which parts of a source file are processed by the text -interpreter. There are only a few ANS Forth Standard words, but Gforth -supplements these with a rich set of immediate control structure words -to compensate for the fact that the non-immediate versions can only be -used in compile state (@pxref{Control Structures}). - -doc-[IF] -doc-[ELSE] -doc-[THEN] -doc-[ENDIF] - -doc-[IFDEF] -doc-[IFUNDEF] - -doc-[?DO] -doc-[DO] -doc-[FOR] -doc-[LOOP] -doc-[+LOOP] -doc-[NEXT] +@c ------------------------------------------------------------- +@node Environmental Queries, Files, Word Lists, Words +@section Environmental Queries +@cindex environmental queries +@comment TODO more index entries -doc-[BEGIN] -doc-[UNTIL] -doc-[AGAIN] -doc-[WHILE] -doc-[REPEAT] +ANS Forth introduced the idea of ``environmental queries'' as a way +for a program running on a system to determine certain characteristics of the system. +The Standard specifies a number of strings that might be recognised by a system. +The Standard requires that the name space used for environmental queries +be distinct from the name space used for definitions. -@c ---------------------------------------------------------- -@node Structures, Object-oriented Forth, The Text Interpreter, Words -@section Structures -@cindex structures -@cindex records +Typically, environmental queries are supported by creating a set of +definitions in a word list that is @var{only} used during environmental +queries; that is what Gforth does. There is no Standard way of adding +definitions to the set of recognised environmental queries, but any +implementation that supports the loading of optional word sets must have +some mechanism for doing this (after loading the word set, the +associated environmental query string must return @code{true}). In +Gforth, the word list used to honour environmental queries can be +manipulated just like any other word list. -This section presents the structure package that comes with Gforth. A -version of the package implemented in ANS Standard Forth is available in -@file{compat/struct.fs}. This package was inspired by a posting on -comp.lang.forth in 1989 (unfortunately I don't remember, by whom; -possibly John Hayes). A version of this section has been published in -???. Marcel Hendrix provided helpful comments. +doc-environment? +doc-environment-wordlist -@menu -* Why explicit structure support?:: -* Structure Usage:: -* Structure Naming Convention:: -* Structure Implementation:: -* Structure Glossary:: -@end menu +doc-gforth +doc-os-class -@node Why explicit structure support?, Structure Usage, Structures, Structures -@subsection Why explicit structure support? +Note that, whilst the documentation for (e.g.) @code{gforth} shows it +returning two items on the stack, querying it using @code{environment?} +will return an additional item; the @code{true} flag that shows that the +string was recognised. -@cindex address arithmetic for structures -@cindex structures using address arithmetic -If we want to use a structure containing several fields, we could simply -reserve memory for it, and access the fields using address arithmetic -(@pxref{Address arithmetic}). As an example, consider a structure with -the following fields +@comment TODO Document the standard strings or note where they are documented herein -@table @code -@item a -is a float -@item b -is a cell -@item c -is a float -@end table +Here are some examples of using environmental queries: -Given the (float-aligned) base address of the structure we get the -address of the field +@example +s" address-unit-bits" environment? 0= +[IF] + cr .( environmental attribute address-units-bits unknown... ) cr +[THEN] -@table @code -@item a -without doing anything further. -@item b -with @code{float+} -@item c -with @code{float+ cell+ faligned} -@end table +s" block" environment? [IF] DROP include block.fs [THEN] -It is easy to see that this can become quite tiring. +s" gforth" environment? [IF] 2DROP include compat/vocabulary.fs [THEN] -Moreover, it is not very readable, because seeing a -@code{cell+} tells us neither which kind of structure is -accessed nor what field is accessed; we have to somehow infer the kind -of structure, and then look up in the documentation, which field of -that structure corresponds to that offset. +s" gforth" environment? [IF] .( Gforth version ) TYPE + [ELSE] .( Not Gforth..) [THEN] +@end example -Finally, this kind of address arithmetic also causes maintenance -troubles: If you add or delete a field somewhere in the middle of the -structure, you have to find and change all computations for the fields -afterwards. -So, instead of using @code{cell+} and friends directly, how -about storing the offsets in constants: +Here is an example of adding a definition to the environment word list: @example -0 constant a-offset -0 float+ constant b-offset -0 float+ cell+ faligned c-offset +get-current environment-wordlist set-current +true constant block +true constant block-ext +set-current @end example -Now we can get the address of field @code{x} with @code{x-offset -+}. This is much better in all respects. Of course, you still -have to change all later offset definitions if you add a field. You can -fix this by declaring the offsets in the following way: +You can see what definitions are in the environment word list like this: @example -0 constant a-offset -a-offset float+ constant b-offset -b-offset cell+ faligned constant c-offset +get-order 1+ environment-wordlist swap set-order words previous @end example -Since we always use the offsets with @code{+}, we could use a defining -word @code{cfield} that includes the @code{+} in the action of the -defined word: -@example -: cfield ( n "name" -- ) - create , -does> ( name execution: addr1 -- addr2 ) - @@ + ; +@c ------------------------------------------------------------- +@node Files, Blocks, Environmental Queries, Words +@section Files -0 cfield a -0 a float+ cfield b -0 b cell+ faligned cfield c -@end example +Gforth provides facilities for accessing files that are stored in the +host operating system's file-system. Files that are processed by Gforth +can be divided into two categories: -Instead of @code{x-offset +}, we now simply write @code{x}. +@itemize @bullet +@item +Files that are processed by the Text Interpreter (@var{Forth source files}). +@item +Files that are processed by some other program (@var{general files}). +@end itemize -The structure field words now can be used quite nicely. However, -their definition is still a bit cumbersome: We have to repeat the -name, the information about size and alignment is distributed before -and after the field definitions etc. The structure package presented -here addresses these problems. +@menu +* Forth source files:: +* General files:: +* Search Paths:: +* Forth Search Paths:: +* General Search Paths:: +@end menu -@node Structure Usage, Structure Naming Convention, Why explicit structure support?, Structures -@subsection Structure Usage -@cindex structure usage -@cindex @code{field} usage -@cindex @code{struct} usage -@cindex @code{end-struct} usage -You can define a structure for a (data-less) linked list with: -@example -struct - cell% field list-next -end-struct list% -@end example +@c ------------------------------------------------------------- +@node Forth source files, General files, Files, Files +@subsection Forth source files +@cindex including files +@cindex Forth source files -With the address of the list node on the stack, you can compute the -address of the field that contains the address of the next node with -@code{list-next}. E.g., you can determine the length of a list -with: +The simplest way to interpret the contents of a file is to use one of +these two formats: @example -: list-length ( list -- n ) -\ "list" is a pointer to the first element of a linked list -\ "n" is the length of the list - 0 begin ( list1 n1 ) - over - while ( list1 n1 ) - 1+ swap list-next @@ swap - repeat - nip ; +include mysource.fs +s" mysource.fs" included @end example -You can reserve memory for a list node in the dictionary with -@code{list% %allot}, which leaves the address of the list node on the -stack. For the equivalent allocation on the heap you can use @code{list% -%alloc} (or, for an @code{allocate}-like stack effect (i.e., with ior), -use @code{list% %allocate}). You can get the the size of a list -node with @code{list% %size} and its alignment with @code{list% -%alignment}. +Sometimes you want to include a file only if it is not included already +(by, say, another source file). In that case, you can use one of these +fomats: -Note that in ANS Forth the body of a @code{create}d word is -@code{aligned} but not necessarily @code{faligned}; -therefore, if you do a: @example -create @emph{name} foo% %allot +require mysource.fs +needs mysource.fs +s" mysource.fs" required @end example -@noindent -then the memory alloted for @code{foo%} is -guaranteed to start at the body of @code{@emph{name}} only if -@code{foo%} contains only character, cell and double fields. +@cindex stack effect of included files +@cindex including files, stack effect +I recommend that you write your source files such that interpreting them +does not change the stack. This allows using these files with +@code{required} and friends without complications. For example: -@cindex strcutures containing structures -You can include a structure @code{foo%} as a field of -another structure, like this: @example -struct -... - foo% field ... -... -end-struct ... +1 require foo.fs drop @end example -@cindex structure extension -@cindex extended records -Instead of starting with an empty structure, you can extend an -existing structure. E.g., a plain linked list without data, as defined -above, is hardly useful; You can extend it to a linked list of integers, -like this:@footnote{This feature is also known as @emph{extended -records}. It is the main innovation in the Oberon language; in other -words, adding this feature to Modula-2 led Wirth to create a new -language, write a new compiler etc. Adding this feature to Forth just -required a few lines of code.} -@example -list% - cell% field intlist-int -end-struct intlist% -@end example +doc-include-file +doc-included +doc-include +@comment TODO describe what happens on error. Describes how the require +@comment stuff works and describe how to clear/reset the history (eg +@comment for debug). Might want to include that in the MARKER example. +doc-required +doc-require +doc-needs -@code{intlist%} is a structure with two fields: -@code{list-next} and @code{intlist-int}. +A definition in ANS Forth for @code{required} is provided in +@file{compat/required.fs}. -@cindex structures containing arrays -You can specify an array type containing @emph{n} elements of -type @code{foo%} like this: +@c ------------------------------------------------------------- +@node General files, Search Paths, Forth source files, Files +@subsection General files +@cindex general files +@cindex file-handling -@example -foo% @emph{n} * -@end example +Files are opened/created by name and type. The following types are +recognised: -You can use this array type in any place where you can use a normal -type, e.g., when defining a @code{field}, or with -@code{%allot}. +doc-r/o +doc-r/w +doc-w/o +doc-bin -@cindex first field optimization -The first field is at the base address of a structure and the word -for this field (e.g., @code{list-next}) actually does not change -the address on the stack. You may be tempted to leave it away in the -interest of run-time and space efficiency. This is not necessary, -because the structure package optimizes this case and compiling such -words does not generate any code. So, in the interest of readability -and maintainability you should include the word for the field when -accessing the field. +When a file is opened/created, it returns a file identifier, +@var{wfileid} that is used for all other file commands. All file +commands also return a status value, @var{wior}, that is 0 for a +successful operation and an implementation-defined non-zero value in the +case of an error. -@node Structure Naming Convention, Structure Implementation, Structure Usage, Structures -@subsection Structure Naming Convention -@cindex structure naming conventions +doc-open-file +doc-create-file -The field names that come to (my) mind are often quite generic, and, -if used, would cause frequent name clashes. E.g., many structures -probably contain a @code{counter} field. The structure names -that come to (my) mind are often also the logical choice for the names -of words that create such a structure. +doc-close-file +doc-delete-file +doc-rename-file +doc-read-file +doc-read-line +doc-write-file +doc-write-line +doc-emit-file +doc-flush-file -Therefore, I have adopted the following naming conventions: +doc-file-status +doc-file-position +doc-reposition-file +doc-file-size +doc-resize-file -@itemize @bullet -@cindex field naming convention -@item -The names of fields are of the form -@code{@emph{struct}-@emph{field}}, where -@code{@emph{struct}} is the basic name of the structure, and -@code{@emph{field}} is the basic name of the field. You can -think of field words as converting the (address of the) -structure into the (address of the) field. +@c --------------------------------------------------------- +@node Search Paths, Forth Search Paths, General files, Files +@subsection Search Paths +@cindex path for @code{included} +@cindex file search path +@cindex @code{include} search path +@cindex search path for files -@cindex structure naming convention -@item -The names of structures are of the form -@code{@emph{struct}%}, where -@code{@emph{struct}} is the basic name of the structure. -@end itemize +@comment what uses these search paths.. just include and friends? +If you specify an absolute filename (i.e., a filename starting with +@file{/} or @file{~}, or with @file{:} in the second position (as in +@samp{C:...})) for @code{included} and friends, that file is included +just as you would expect. -This naming convention does not work that well for fields of extended -structures; e.g., the integer list structure has a field -@code{intlist-int}, but has @code{list-next}, not -@code{intlist-next}. +For relative filenames, Gforth uses a search path similar to Forth's +search order (@pxref{Word Lists}). It tries to find the given filename +in the directories present in the path, and includes the first one it +finds. There are separate search paths for Forth source files and +general files. -@node Structure Implementation, Structure Glossary, Structure Naming Convention, Structures -@subsection Structure Implementation -@cindex structure implementation -@cindex implementation of structures +If the search path contains the directory @file{.} (as it should), this +refers to the directory that the present file was @code{included} +from. This allows files to include other files relative to their own +position (irrespective of the current working directory or the absolute +position). This feature is essential for libraries consisting of +several files, where a file may include other files from the library. +It corresponds to @code{#include "..."} in C. If the current input +source is not a file, @file{.} refers to the directory of the innermost +file being included, or, if there is no file being included, to the +current working directory. -The central idea in the implementation is to pass the data about the -structure being built on the stack, not in some global -variable. Everything else falls into place naturally once this design -decision is made. +Use @file{~+} to refer to the current working directory (as in the +@code{bash}). -The type description on the stack is of the form @emph{align -size}. Keeping the size on the top-of-stack makes dealing with arrays -very simple. +If the filename starts with @file{./}, the search path is not searched +(just as with absolute filenames), and the @file{.} has the same meaning +as described above. -@code{field} is a defining word that uses @code{Create} -and @code{DOES>}. The body of the field contains the offset -of the field, and the normal @code{DOES>} action is simply: +@c --------------------------------------------------------- +@node Forth Search Paths, General Search Paths, Search Paths, Files +@subsubsection Forth Search Paths +@cindex search path control - forth + +The search path is initialized when you start Gforth (@pxref{Invoking +Gforth}). You can display it and change it using these words: + +doc-.fpath +doc-fpath+ +doc-fpath= +doc-open-fpath-file + +Here is an example of using @code{fpath} and @code{require}: @example -@ + +fpath= /usr/lib/forth/|./ +require timer.fs @end example -@noindent -i.e., add the offset to the address, giving the stack effect -@var{addr1 -- addr2} for a field. +@c --------------------------------------------------------- +@node General Search Paths, , Forth Search Paths, Files +@subsubsection General Search Paths +@cindex search path control - for user applications -@cindex first field optimization, implementation -This simple structure is slightly complicated by the optimization -for fields with offset 0, which requires a different -@code{DOES>}-part (because we cannot rely on there being -something on the stack if such a field is invoked during -compilation). Therefore, we put the different @code{DOES>}-parts -in separate words, and decide which one to invoke based on the -offset. For a zero offset, the field is basically a noop; it is -immediate, and therefore no code is generated when it is compiled. +Your application may need to search files in several directories, like +@code{included} does. To facilitate this, Gforth allows you to define +and use your own search paths, by providing generic equivalents of the +Forth search path words: -@node Structure Glossary, , Structure Implementation, Structures -@subsection Structure Glossary -@cindex structure glossary +doc-.path +doc-path+ +doc-path= +doc-open-path-file -doc-%align -doc-%alignment -doc-%alloc -doc-%allocate -doc-%allot -doc-cell% -doc-char% -doc-dfloat% -doc-double% -doc-end-struct -doc-field -doc-float% -doc-nalign -doc-sfloat% -doc-%size -doc-struct +Here's an example of creating a search path: + +@example +\ Make a buffer for the path: +create mypath 100 chars , \ maximum length (is checked) + 0 , \ real len + 100 chars allot \ space for path +@end example @c ------------------------------------------------------------- -@node Object-oriented Forth, Tokens for Words, Structures, Words -@section Object-oriented Forth +@node Blocks, Other I/O, Files, Words +@section Blocks -Gforth comes with three packets for object-oriented programming: -@file{objects.fs}, @file{oof.fs}, and @file{mini-oof.fs}; none of them -is preloaded, so you have to @code{include} them before use. The most -important differences between these packets (and others) are discussed -in @ref{Comparison with other object models}. All packets are written -in ANS Forth and can be used with any other ANS Forth. +This chapter describes how to use block files within Gforth. + +Block files are traditionally means of data and source storage in +Forth. They have been very important in resource-starved computers +without OS in the past. Gforth doesn't encourage to use blocks as +source, and provides blocks only for backward compatibility. The ANS +standard requires blocks to be available when files are. + +@comment TODO what about errors on open-blocks? +doc-open-blocks +doc-use +doc-scr +doc-blk +doc-get-block-fid +doc-block-position +doc-update +doc-save-buffers +doc-save-buffer +doc-empty-buffers +doc-empty-buffer +doc-flush +doc-get-buffer +doc---block-block +doc-buffer +doc-updated? +doc-list +doc-load +doc-thru +doc-+load +doc-+thru +doc---block---> +doc-block-included + +@c ------------------------------------------------------------- +@node Other I/O, Programming Tools, Blocks, Words +@section Other I/O +@comment TODO more index entries @menu -* Why object-oriented programming?:: -* Object-Oriented Terminology:: -* Objects:: -* OOF:: -* Mini-OOF:: -* Comparison with other object models:: +* Simple numeric output:: Predefined formats +* Formatted numeric output:: Formatted (pictured) output +* String Formats:: How Forth stores strings in memory +* Displaying characters and strings:: Other stuff +* Input:: Input @end menu +@node Simple numeric output, Formatted numeric output, Other I/O, Other I/O +@subsection Simple numeric output +@cindex simple numeric output +@comment TODO more index entries -@node Why object-oriented programming?, Object-Oriented Terminology, , Object-oriented Forth -@subsubsection Why object-oriented programming? -@cindex object-oriented programming motivation -@cindex motivation for object-oriented programming +The simplest output functions are those that display numbers from the +data or floating-point stacks. Floating-point output is always displayed +using base 10. Numbers displayed from the data stack use the value stored +in @code{base}. -Often we have to deal with several data structures (@emph{objects}), -that have to be treated similarly in some respects, but differently in -others. Graphical objects are the textbook example: circles, triangles, -dinosaurs, icons, and others, and we may want to add more during program -development. We want to apply some operations to any graphical object, -e.g., @code{draw} for displaying it on the screen. However, @code{draw} -has to do something different for every kind of object. -@comment TODO add some other operations eg perimeter, area -@comment and tie in to concrete examples later.. +doc-. +doc-dec. +doc-hex. +doc-u. +doc-.r +doc-u.r +doc-d. +doc-ud. +doc-d.r +doc-ud.r +doc-f. +doc-fe. +doc-fs. -We could implement @code{draw} as a big @code{CASE} -control structure that executes the appropriate code depending on the -kind of object to be drawn. This would be not be very elegant, and, -moreover, we would have to change @code{draw} every time we add -a new kind of graphical object (say, a spaceship). +Examples of printing the number 1234.5678E23 in the different floating-point output +formats are shown below: -What we would rather do is: When defining spaceships, we would tell -the system: "Here's how you @code{draw} a spaceship; you figure -out the rest." - -This is the problem that all systems solve that (rightfully) call -themselves object-oriented; the object-oriented packages presented here -solve this problem (and not much else). -@comment TODO ?list properties of oo systems.. oo vs o-based? - -@node Object-Oriented Terminology, Objects, Why object-oriented programming?, Object-oriented Forth -@subsubsection Object-Oriented Terminology -@cindex object-oriented terminology -@cindex terminology for object-oriented programming - -This section is mainly for reference, so you don't have to understand -all of it right away. The terminology is mainly Smalltalk-inspired. In -short: +@example +f. 123456779999999000000000000. +fe. 123.456779999999E24 +fs. 1.23456779999999E26 +@end example -@table @emph -@cindex class -@item class -a data structure definition with some extras. -@cindex object -@item object -an instance of the data structure described by the class definition. +@node Formatted numeric output, String Formats, Simple numeric output, Other I/O +@subsection Formatted numeric output +@cindex Formatted numeric output +@cindex pictured numeric output +@comment TODO more index entries -@cindex instance variables -@item instance variables -fields of the data structure. +Forth traditionally uses a technique called @var{pictured numeric +output} for formatted printing of integers. In this technique, digits +are extracted from the number (using the current output radix defined by +@code{base}), converted to ASCII codes and appended to a string that is +built in a scratch-pad area of memory (@pxref{core-idef, +Implementation-defined options, Implementation-defined +options}). Arbitrary characters can be appended to the string during the +extraction process. The completed string is specified by an address +and length and can be manipulated (@code{TYPE}ed, copied, modified) +under program control. -@cindex selector -@cindex method selector -@cindex virtual function -@item selector -(or @emph{method selector}) a word (e.g., -@code{draw}) that performs an operation on a variety of data -structures (classes). A selector describes @emph{what} operation to -perform. In C++ terminology: a (pure) virtual function. +All of the words described in the previous section for simple numeric +output are implemented in Gforth using pictured numeric output. -@cindex method -@item method -the concrete definition that performs the operation -described by the selector for a specific class. A method specifies -@emph{how} the operation is performed for a specific class. +Three important things to remember about Pictured Numeric Output: -@cindex selector invocation -@cindex message send -@cindex invoking a selector -@item selector invocation -a call of a selector. One argument of the call (the TOS (top-of-stack)) -is used for determining which method is used. In Smalltalk terminology: -a message (consisting of the selector and the other arguments) is sent -to the object. +@itemize @bullet +@item +It always operates on double-precision numbers; to display a single-precision number, +convert it first (@pxref{Double precision} for ways of doing this). +@item +It always treats the double-precision number as though it were unsigned. Refer to +the examples below for ways of printing signed numbers. +@item +The string is built up from right to left; least significant digit first. +@end itemize -@cindex receiving object -@item receiving object -the object used for determining the method executed by a selector -invocation. In the @file{objects.fs} model, it is the object that is on -the TOS when the selector is invoked. (@emph{Receiving} comes from -the Smalltalk @emph{message} terminology.) +doc-<# +doc-# +doc-#s +doc-hold +doc-sign +doc-#> -@cindex child class -@cindex parent class -@cindex inheritance -@item child class -a class that has (@emph{inherits}) all properties (instance variables, -selectors, methods) from a @emph{parent class}. In Smalltalk -terminology: The subclass inherits from the superclass. In C++ -terminology: The derived class inherits from the base class. +doc-represent -@end table +Here are some examples of using pictured numeric output: -@c If you wonder about the message sending terminology, it comes from -@c a time when each object had it's own task and objects communicated via -@c message passing; eventually the Smalltalk developers realized that -@c they can do most things through simple (indirect) calls. They kept the -@c terminology. +@example +: my-u. ( u -- ) + \ Simplest use of pns.. behaves like Standard u. + 0 \ convert to unsigned double + <# \ start conversion + #s \ convert all digits + #> \ complete conversion + TYPE SPACE ; \ display, with trailing space +: cents-only ( u -- ) + 0 \ convert to unsigned double + <# \ start conversion + # # \ convert two least-significant digits + #> \ complete conversion, discard other digits + TYPE SPACE ; \ display, with trailing space -@node Objects, OOF, Object-Oriented Terminology, Object-oriented Forth -@subsection The @file{objects.fs} model -@cindex objects -@cindex object-oriented programming +: dollars-and-cents ( u -- ) + 0 \ convert to unsigned double + <# \ start conversion + # # \ convert two least-significant digits + [char] . hold \ insert decimal point + #s \ convert remaining digits + [char] $ hold \ append currency symbol + #> \ complete conversion + TYPE SPACE ; \ display, with trailing space -@cindex @file{objects.fs} -@cindex @file{oof.fs} +: my-. ( n -- ) + \ handling negatives.. behaves like Standard . + s>d \ convert to signed double + swap over dabs \ leave sign byte followed by unsigned double + <# \ start conversion + #s \ convert all digits + rot sign \ get at sign byte, append "-" if needed + #> \ complete conversion + TYPE SPACE ; \ display, with trailing space -This section describes the @file{objects.fs} packet. This material also has been published in @cite{Yet Another Forth Objects Package} by Anton Ertl and appeared in Forth Dimensions 19(2), pages 37--43 (@url{http://www.complang.tuwien.ac.at/forth/objects/objects.html}). -@c McKewan's and Zsoter's packages +: account. ( n -- ) + \ accountants don't like minus signs, they use braces + \ for negative numbers + s>d \ convert to signed double + swap over dabs \ leave sign byte followed by unsigned double + <# \ start conversion + 2 pick \ get copy of sign byte + 0< IF [char] ) hold THEN \ right-most character of output + #s \ convert all digits + rot \ get at sign byte + 0< IF [char] ( hold THEN + #> \ complete conversion + TYPE SPACE ; \ display, with trailing space +@end example -This section assumes that you have read @ref{Structures}. +Here are some examples of using these words: -The techniques on which this model is based have been used to implement -the parser generator, Gray, and have also been used in Gforth for -implementing the various flavours of word lists (hashed or not, -case-sensitive or not, special-purpose word lists for locals etc.). +@example +1 my-u. 1 +hex -1 my-u. decimal FFFFFFFF +1 cents-only 01 +1234 cents-only 34 +2 dollars-and-cents $0.02 +1234 dollars-and-cents $12.34 +123 my-. 123 +-123 my. -123 +123 account. 123 +-456 account. (456) +@end example -@menu -* Properties of the Objects model:: -* Basic Objects Usage:: -* The Objects base class:: -* Creating objects:: -* Object-Oriented Programming Style:: -* Class Binding:: -* Method conveniences:: -* Classes and Scoping:: -* Object Interfaces:: -* Objects Implementation:: -* Objects Glossary:: -@end menu +@node String Formats, Displaying characters and strings, Formatted numeric output, Other I/O +@subsection String Formats +@cindex string formats -Marcel Hendrix provided helpful comments on this section. Andras Zsoter -and Bernd Paysan helped me with the related works section. +@comment TODO more index entries -@node Properties of the Objects model, Basic Objects Usage, Objects, Objects -@subsubsection Properties of the @file{objects.fs} model -@cindex @file{objects.fs} properties +Forth commonly uses two different methods for representing a string: @itemize @bullet @item -It is straightforward to pass objects on the stack. Passing -selectors on the stack is a little less convenient, but possible. +@cindex address of counted string +As a @var{counted string}, represented by a @var{c-addr}. The char +addressed by @var{c-addr} contains a character-count, @var{n}, of the +string and the string occupies the subsequent @var{n} char addresses in +memory. +@item +As cell pair on the stack; @var{c-addr u}, where @var{u} is the length +of the string in characters, and @var{c-addr} is the address of the +first byte of the string. +@end itemize -@item -Objects are just data structures in memory, and are referenced by their -address. You can create words for objects with normal defining words -like @code{constant}. Likewise, there is no difference between instance -variables that contain objects and those that contain other data. +ANS Forth encourages the use of the second format when representing +strings on the stack, whilst conceeding that the counted string format +remains useful as a way of storing strings in memory. -@item -Late binding is efficient and easy to use. +doc-count -@item -It avoids parsing, and thus avoids problems with state-smartness -and reduced extensibility; for convenience there are a few parsing -words, but they have non-parsing counterparts. There are also a few -defining words that parse. This is hard to avoid, because all standard -defining words parse (except @code{:noname}); however, such -words are not as bad as many other parsing words, because they are not -state-smart. +@xref{Memory Blocks} for words that move, copy and search +for strings. @xref{Displaying characters and strings,} for words that +display characters and strings. -@item -It does not try to incorporate everything. It does a few things and does -them well (IMO). In particular, this model was not designed to support -information hiding (although it has features that may help); you can use -a separate package for achieving this. -@item -It is layered; you don't have to learn and use all features to use this -model. Only a few features are necessary (@xref{Basic Objects Usage}, -@xref{The Objects base class}, @xref{Creating objects}.), the others -are optional and independent of each other. +@node Displaying characters and strings, Input, String Formats, Other I/O +@subsection Displaying characters and strings +@cindex displaying characters and strings +@cindex compiling characters and strings +@cindex cursor control -@item -An implementation in ANS Forth is available. +@comment TODO more index entries -@end itemize +This section starts with a glossary of Forth words and ends with a set +of examples. +doc-bl +doc-space +doc-spaces +doc-emit +doc-toupper +doc-." +doc-.( +doc-type +doc-cr +doc-at-xy +doc-page +doc-s" +doc-c" +doc-char +doc-[char] +doc-sliteral -@node Basic Objects Usage, The Objects base class, Properties of the Objects model, Objects -@subsubsection Basic @file{objects.fs} Usage -@cindex basic objects usage -@cindex objects, basic usage - -You can define a class for graphical objects like this: +As an example, consider the following text, stored in a file @file{test.fs}: -@cindex @code{class} usage -@cindex @code{end-class} usage -@cindex @code{selector} usage @example -object class \ "object" is the parent class - selector draw ( x y graphical -- ) -end-class graphical +.( text-1) +: my-word + ." text-2" cr + .( text-3) +; + +." text-4" + +: my-char + [char] ALPHABET emit + char emit +; @end example -This code defines a class @code{graphical} with an -operation @code{draw}. We can perform the operation -@code{draw} on any @code{graphical} object, e.g.: +When you load this code into Gforth, the following output is generated: @example -100 100 t-rex draw +@kbd{include test.fs } text-1text-3text-4 ok @end example -@noindent -where @code{t-rex} is a word (say, a constant) that produces a -graphical object. +@itemize @bullet +@item +Messages @code{text-1} and @code{text-3} are displayed because @code{.(} +is an immediate word; it behaves in the same way whether it is used inside +or outside a colon definition. +@item +Message @code{text-4} is displayed because of Gforth's added interpretation +semantics for @code{."}. +@item +Message @code{text-2} is @var{not} displayed, because the text interpreter +performs the compilation semantics for @code{."} within the definition of +@code{my-word}. +@end itemize -@comment nac TODO add a 2nd operation eg perimeter.. and use for -@comment a concrete example +Here are some examples of executing @code{my-word} and @code{my-char}: -@cindex abstract class -How do we create a graphical object? With the present definitions, -we cannot create a useful graphical object. The class -@code{graphical} describes graphical objects in general, but not -any concrete graphical object type (C++ users would call it an -@emph{abstract class}); e.g., there is no method for the selector -@code{draw} in the class @code{graphical}. +@example +@kbd{my-word } text-2 + ok +@kbd{my-char fred } Af ok +@kbd{my-char jim } Aj ok +@end example -For concrete graphical objects, we define child classes of the -class @code{graphical}, e.g.: +@itemize @bullet +@item +Message @code{text-2} is displayed because of the run-time behaviour of +@code{."}. +@item +@code{[char]} compiles the ``A'' from ``ALPHABET'' and puts its display code +on the stack at run-time. @code{emit} always displays the character +when @code{my-char} is executed. +@item +@code{char} parses a string at run-time and the second @code{emit} displays +the first character of the string. +@item +If you type @code{see my-char} you can see that @code{[char]} discarded +the text ``LPHABET'' and only compiled the display code for ``A'' into the +definition of @code{my-char}. +@end itemize -@cindex @code{overrides} usage -@cindex @code{field} usage in class definition -@example -graphical class \ "graphical" is the parent class - cell% field circle-radius -:noname ( x y circle -- ) - circle-radius @@ draw-circle ; -overrides draw -:noname ( n-radius circle -- ) - circle-radius ! ; -overrides construct +@node Input, , Displaying characters and strings, Other I/O +@subsection Input +@cindex input +@comment TODO more index entries -end-class circle -@end example +Blah on traditional and recommended string formats. -Here we define a class @code{circle} as a child of @code{graphical}, -with field @code{circle-radius} (which behaves just like a field -(@pxref{Structures}); it defines (using @code{overrides}) new methods -for the selectors @code{draw} and @code{construct} (@code{construct} is -defined in @code{object}, the parent class of @code{graphical}). +doc--trailing +doc-/string +doc-convert +doc->number +doc->float +doc-accept +doc-query +doc-expect +doc-evaluate +doc-key +doc-key? -Now we can create a circle on the heap (i.e., -@code{allocate}d memory) with: +TODO reference the block move stuff elsewhere -@cindex @code{heap-new} usage -@example -50 circle heap-new constant my-circle -@end example +TODO convert and >number might be better in the numeric input section. -@noindent -@code{heap-new} invokes @code{construct}, thus -initializing the field @code{circle-radius} with 50. We can draw -this new circle at (100,100) with: +TODO maybe some of these shouldn't be here but should be in a ``parsing'' section -@example -100 100 my-circle draw -@end example -@cindex selector invocation, restrictions -@cindex class definition, restrictions -Note: You can only invoke a selector if the object on the TOS -(the receiving object) belongs to the class where the selector was -defined or one of its descendents; e.g., you can invoke -@code{draw} only for objects belonging to @code{graphical} -or its descendents (e.g., @code{circle}). Immediately before -@code{end-class}, the search order has to be the same as -immediately after @code{class}. +@c ------------------------------------------------------------- +@node Programming Tools, Assembler and Code Words, Other I/O, Words +@section Programming Tools +@cindex programming tools -@node The Objects base class, Creating objects, Basic Objects Usage, Objects -@subsubsection The @file{object.fs} base class -@cindex @code{object} class +@menu +* Debugging:: Simple and quick. +* Assertions:: Making your programs self-checking. +* Singlestep Debugger:: Executing your program word by word. +@end menu -When you define a class, you have to specify a parent class. So how do -you start defining classes? There is one class available from the start: -@code{object}. It is ancestor for all classes and so is the -only class that has no parent. It has two selectors: @code{construct} -and @code{print}. +@node Debugging, Assertions, Programming Tools, Programming Tools +@subsection Debugging +@cindex debugging -@node Creating objects, Object-Oriented Programming Style, The Objects base class, Objects -@subsubsection Creating objects -@cindex creating objects -@cindex object creation -@cindex object allocation options +Languages with a slow edit/compile/link/test development loop tend to +require sophisticated tracing/stepping debuggers to facilate +productive debugging. -@cindex @code{heap-new} discussion -@cindex @code{dict-new} discussion -@cindex @code{construct} discussion -You can create and initialize an object of a class on the heap with -@code{heap-new} ( ... class -- object ) and in the dictionary -(allocation with @code{allot}) with @code{dict-new} ( -... class -- object ). Both words invoke @code{construct}, which -consumes the stack items indicated by "..." above. +A much better (faster) way in fast-compiling languages is to add +printing code at well-selected places, let the program run, look at +the output, see where things went wrong, add more printing code, etc., +until the bug is found. -@cindex @code{init-object} discussion -@cindex @code{class-inst-size} discussion -If you want to allocate memory for an object yourself, you can get its -alignment and size with @code{class-inst-size 2@@} ( class -- -align size ). Once you have memory for an object, you can initialize -it with @code{init-object} ( ... class object -- ); -@code{construct} does only a part of the necessary work. +The simple debugging aids provided in @file{debugs.fs} +are meant to support this style of debugging. In addition, there are +words for non-destructively inspecting the stack and memory: -@node Object-Oriented Programming Style, Class Binding, Creating objects, Objects -@subsubsection Object-Oriented Programming Style -@cindex object-oriented programming style +doc-.s +doc-f.s -This section is not exhaustive. +There is a word @code{.r} but it does @var{not} display the return +stack! It is used for formatted numeric output. -@cindex stack effects of selectors -@cindex selectors and stack effects -In general, it is a good idea to ensure that all methods for the -same selector have the same stack effect: when you invoke a selector, -you often have no idea which method will be invoked, so, unless all -methods have the same stack effect, you will not know the stack effect -of the selector invocation. +doc-depth +doc-fdepth +doc-clearstack +doc-? +doc-dump -One exception to this rule is methods for the selector -@code{construct}. We know which method is invoked, because we -specify the class to be constructed at the same place. Actually, I -defined @code{construct} as a selector only to give the users a -convenient way to specify initialization. The way it is used, a -mechanism different from selector invocation would be more natural -(but probably would take more code and more space to explain). +The word @code{~~} prints debugging information (by default the source +location and the stack contents). It is easy to insert. If you use Emacs +it is also easy to remove (@kbd{C-x ~} in the Emacs Forth mode to +query-replace them with nothing). The deferred words +@code{printdebugdata} and @code{printdebugline} control the output of +@code{~~}. The default source location output format works well with +Emacs' compilation mode, so you can step through the program at the +source level using @kbd{C-x `} (the advantage over a stepping debugger +is that you can step in any direction and you know where the crash has +happened or where the strange data has occurred). -@node Class Binding, Method conveniences, Object-Oriented Programming Style, Objects -@subsubsection Class Binding -@cindex class binding -@cindex early binding +The default actions of @code{~~} clobber the contents of the pictured +numeric output string, so you should not use @code{~~}, e.g., between +@code{<#} and @code{#>}. -@cindex late binding -Normal selector invocations determine the method at run-time depending -on the class of the receiving object. This run-time selection is called -@var{late binding}. +doc-~~ +doc-printdebugdata +doc-printdebugline -Sometimes it's preferable to invoke a different method. For example, -you might want to use the simple method for @code{print}ing -@code{object}s instead of the possibly long-winded @code{print} method -of the receiver class. You can achieve this by replacing the invocation -of @code{print} with: +doc-see +doc-marker + +Here's an example of using @code{marker} at the start of a source file +that you are debugging; it ensures that you only ever have one copy of +the file's definitions compiled at any time: -@cindex @code{[bind]} usage @example -[bind] object print -@end example +[IFDEF] my-code + my-code +[ENDIF] -@noindent -in compiled code or: +marker my-code -@cindex @code{bind} usage -@example -bind object print +\ .. definitions start here +\ . +\ . +\ end @end example -@cindex class binding, alternative to -@noindent -in interpreted code. Alternatively, you can define the method with a -name (e.g., @code{print-object}), and then invoke it through the -name. Class binding is just a (often more convenient) way to achieve -the same effect; it avoids name clutter and allows you to invoke -methods directly without naming them first. - -@cindex superclass binding -@cindex parent class binding -A frequent use of class binding is this: When we define a method -for a selector, we often want the method to do what the selector does -in the parent class, and a little more. There is a special word for -this purpose: @code{[parent]}; @code{[parent] -@emph{selector}} is equivalent to @code{[bind] @emph{parent -selector}}, where @code{@emph{parent}} is the parent -class of the current class. E.g., a method definition might look like: -@cindex @code{[parent]} usage -@example -:noname - dup [parent] foo \ do parent's foo on the receiving object - ... \ do some more -; overrides foo -@end example -@cindex class binding as optimization -In @cite{Object-oriented programming in ANS Forth} (Forth Dimensions, -March 1997), Andrew McKewan presents class binding as an optimization -technique. I recommend not using it for this purpose unless you are in -an emergency. Late binding is pretty fast with this model anyway, so the -benefit of using class binding is small; the cost of using class binding -where it is not appropriate is reduced maintainability. +@node Assertions, Singlestep Debugger, Debugging, Programming Tools +@subsection Assertions +@cindex assertions -While we are at programming style questions: You should bind -selectors only to ancestor classes of the receiving object. E.g., say, -you know that the receiving object is of class @code{foo} or its -descendents; then you should bind only to @code{foo} and its -ancestors. +It is a good idea to make your programs self-checking, especially if you +make an assumption that may become invalid during maintenance (for +example, that a certain field of a data structure is never zero). Gforth +supports @var{assertions} for this purpose. They are used like this: -@node Method conveniences, Classes and Scoping, Class Binding, Objects -@subsubsection Method conveniences -@cindex method conveniences +@example +assert( @var{flag} ) +@end example -In a method you usually access the receiving object pretty often. If -you define the method as a plain colon definition (e.g., with -@code{:noname}), you may have to do a lot of stack -gymnastics. To avoid this, you can define the method with @code{m: -... ;m}. E.g., you could define the method for -@code{draw}ing a @code{circle} with +The code between @code{assert(} and @code{)} should compute a flag, that +should be true if everything is alright and false otherwise. It should +not change anything else on the stack. The overall stack effect of the +assertion is @code{( -- )}. E.g. -@cindex @code{this} usage -@cindex @code{m:} usage -@cindex @code{;m} usage @example -m: ( x y circle -- ) - ( x y ) this circle-radius @@ draw-circle ;m +assert( 1 1 + 2 = ) \ what we learn in school +assert( dup 0<> ) \ assert that the top of stack is not zero +assert( false ) \ this code should not be reached @end example -@cindex @code{exit} in @code{m: ... ;m} -@cindex @code{exitm} discussion -@cindex @code{catch} in @code{m: ... ;m} -When this method is executed, the receiver object is removed from the -stack; you can access it with @code{this} (admittedly, in this -example the use of @code{m: ... ;m} offers no advantage). Note -that I specify the stack effect for the whole method (i.e. including -the receiver object), not just for the code between @code{m:} -and @code{;m}. You cannot use @code{exit} in -@code{m:...;m}; instead, use -@code{exitm}.@footnote{Moreover, for any word that calls -@code{catch} and was defined before loading -@code{objects.fs}, you have to redefine it like I redefined -@code{catch}: @code{: catch this >r catch r> to-this ;}} +The need for assertions is different at different times. During +debugging, we want more checking, in production we sometimes care more +for speed. Therefore, assertions can be turned off, i.e., the assertion +becomes a comment. Depending on the importance of an assertion and the +time it takes to check it, you may want to turn off some assertions and +keep others turned on. Gforth provides several levels of assertions for +this purpose: -@cindex @code{inst-var} usage -You will frequently use sequences of the form @code{this -@emph{field}} (in the example above: @code{this -circle-radius}). If you use the field only in this way, you can -define it with @code{inst-var} and eliminate the -@code{this} before the field name. E.g., the @code{circle} -class above could also be defined with: +doc-assert0( +doc-assert1( +doc-assert2( +doc-assert3( +doc-assert( +doc-) -@example -graphical class - cell% inst-var radius +The variable @code{assert-level} specifies the highest assertions that +are turned on. I.e., at the default @code{assert-level} of one, +@code{assert0(} and @code{assert1(} assertions perform checking, while +@code{assert2(} and @code{assert3(} assertions are treated as comments. + +The value of @code{assert-level} is evaluated at compile-time, not at +run-time. Therefore you cannot turn assertions on or off at run-time; +you have to set the @code{assert-level} appropriately before compiling a +piece of code. You can compile different pieces of code at different +@code{assert-level}s (e.g., a trusted library at level 1 and +newly-written code at level 3). -m: ( x y circle -- ) - radius @@ draw-circle ;m -overrides draw +doc-assert-level -m: ( n-radius circle -- ) - radius ! ;m -overrides construct +If an assertion fails, a message compatible with Emacs' compilation mode +is produced and the execution is aborted (currently with @code{ABORT"}. +If there is interest, we will introduce a special throw code. But if you +intend to @code{catch} a specific condition, using @code{throw} is +probably more appropriate than an assertion). -end-class circle -@end example +Definitions in ANS Forth for these assertion words are provided +in @file{compat/assert.fs}. -@code{radius} can only be used in @code{circle} and its -descendent classes and inside @code{m:...;m}. -@cindex @code{inst-value} usage -You can also define fields with @code{inst-value}, which is -to @code{inst-var} what @code{value} is to -@code{variable}. You can change the value of such a field with -@code{[to-inst]}. E.g., we could also define the class -@code{circle} like this: +@node Singlestep Debugger, , Assertions, Programming Tools +@subsection Singlestep Debugger +@cindex singlestep Debugger +@cindex debugging Singlestep +@cindex @code{dbg} +@cindex @code{BREAK:} +@cindex @code{BREAK"} -@example -graphical class - inst-value radius +When you create a new word there's often the need to check whether it +behaves correctly or not. You can do this by typing @code{dbg +badword}. A debug session might look like this: -m: ( x y circle -- ) - radius draw-circle ;m -overrides draw +@example +: badword 0 DO i . LOOP ; ok +2 dbg badword +: badword +Scanning code... -m: ( n-radius circle -- ) - [to-inst] radius ;m -overrides construct +Nesting debugger ready! -end-class circle +400D4738 8049BC4 0 -> [ 2 ] 00002 00000 +400D4740 8049F68 DO -> [ 0 ] +400D4744 804A0C8 i -> [ 1 ] 00000 +400D4748 400C5E60 . -> 0 [ 0 ] +400D474C 8049D0C LOOP -> [ 0 ] +400D4744 804A0C8 i -> [ 1 ] 00001 +400D4748 400C5E60 . -> 1 [ 0 ] +400D474C 8049D0C LOOP -> [ 0 ] +400D4758 804B384 ; -> ok @end example +Each line displayed is one step. You always have to hit return to +execute the next word that is displayed. If you don't want to execute +the next word in a whole, you have to type @kbd{n} for @code{nest}. Here is +an overview what keys are available: -@node Classes and Scoping, Object Interfaces, Method conveniences, Objects -@subsubsection Classes and Scoping -@cindex classes and scoping -@cindex scoping and classes +@table @i -Inheritance is frequent, unlike structure extension. This exacerbates -the problem with the field name convention (@pxref{Structure Naming -Convention}): One always has to remember in which class the field was -originally defined; changing a part of the class structure would require -changes for renaming in otherwise unaffected code. +@item +Next; Execute the next word. -@cindex @code{inst-var} visibility -@cindex @code{inst-value} visibility -To solve this problem, I added a scoping mechanism (which was not in my -original charter): A field defined with @code{inst-var} (or -@code{inst-value}) is visible only in the class where it is defined and in -the descendent classes of this class. Using such fields only makes -sense in @code{m:}-defined methods in these classes anyway. +@item n +Nest; Single step through next word. -This scoping mechanism allows us to use the unadorned field name, -because name clashes with unrelated words become much less likely. +@item u +Unnest; Stop debugging and execute rest of word. If we got to this word +with nest, continue debugging with the calling word. -@cindex @code{protected} discussion -@cindex @code{private} discussion -Once we have this mechanism, we can also use it for controlling the -visibility of other words: All words defined after -@code{protected} are visible only in the current class and its -descendents. @code{public} restores the compilation -(i.e. @code{current}) word list that was in effect before. If you -have several @code{protected}s without an intervening -@code{public} or @code{set-current}, @code{public} -will restore the compilation word list in effect before the first of -these @code{protected}s. +@item d +Done; Stop debugging and execute rest. -@node Object Interfaces, Objects Implementation, Classes and Scoping, Objects -@subsubsection Object Interfaces -@cindex object interfaces -@cindex interfaces for objects +@item s +Stop; Abort immediately. -In this model you can only call selectors defined in the class of the -receiving objects or in one of its ancestors. If you call a selector -with a receiving object that is not in one of these classes, the -result is undefined; if you are lucky, the program crashes -immediately. +@end table -@cindex selectors common to hardly-related classes -Now consider the case when you want to have a selector (or several) -available in two classes: You would have to add the selector to a -common ancestor class, in the worst case to @code{object}. You -may not want to do this, e.g., because someone else is responsible for -this ancestor class. +Debugging large application with this mechanism is very difficult, because +you have to nest very deeply into the program before the interesting part +begins. This takes a lot of time. -The solution for this problem is interfaces. An interface is a -collection of selectors. If a class implements an interface, the -selectors become available to the class and its descendents. A class -can implement an unlimited number of interfaces. For the problem -discussed above, we would define an interface for the selector(s), and -both classes would implement the interface. +To do it more directly put a @code{BREAK:} command into your source code. +When program execution reaches @code{BREAK:} the single step debugger is +invoked and you have all the features described above. -As an example, consider an interface @code{storage} for -writing objects to disk and getting them back, and a class -@code{foo} that implements it. The code would look like this: +If you have more than one part to debug it is useful to know where the +program has stopped at the moment. You can do this by the +@code{BREAK" string"} command. This behaves like @code{BREAK:} except that +string is typed out when the ``breakpoint'' is reached. -@cindex @code{interface} usage -@cindex @code{end-interface} usage -@cindex @code{implementation} usage -@example -interface - selector write ( file object -- ) - selector read1 ( file object -- ) -end-interface storage +doc-dbg +doc-BREAK: +doc-BREAK" -bar class - storage implementation -... overrides write -... overrides read -... -end-class foo -@end example +@c ------------------------------------------------------------- +@node Assembler and Code Words, Threading Words, Programming Tools, Words +@section Assembler and Code Words +@cindex assembler +@cindex code words -@noindent -(I would add a word @code{read} @var{( file -- object )} that uses -@code{read1} internally, but that's beyond the point illustrated -here.) +Gforth provides some words for defining primitives (words written in +machine code), and for defining the the machine-code equivalent of +@code{DOES>}-based defining words. However, the machine-independent +nature of Gforth poses a few problems: First of all, Gforth runs on +several architectures, so it can provide no standard assembler. What's +worse is that the register allocation not only depends on the processor, +but also on the @code{gcc} version and options used. -Note that you cannot use @code{protected} in an interface; and -of course you cannot define fields. +The words that Gforth offers encapsulate some system dependences (e.g., the +header structure), so a system-independent assembler may be used in +Gforth. If you do not have an assembler, you can compile machine code +directly with @code{,} and @code{c,}. -In the Neon model, all selectors are available for all classes; -therefore it does not need interfaces. The price you pay in this model -is slower late binding, and therefore, added complexity to avoid late -binding. +doc-assembler +doc-code +doc-end-code +doc-;code +doc-flush-icache -@node Objects Implementation, Objects Glossary, Object Interfaces, Objects -@subsubsection @file{objects.fs} Implementation -@cindex @file{objects.fs} implementation +If @code{flush-icache} does not work correctly, @code{code} words +etc. will not work (reliably), either. -@cindex @code{object-map} discussion -An object is a piece of memory, like one of the data structures -described with @code{struct...end-struct}. It has a field -@code{object-map} that points to the method map for the object's -class. +@code{flush-icache} is always present. The other words are rarely used +and reside in @code{code.fs}, which is usually not loaded. You can load +it with @code{require code.fs}. -@cindex method map -@cindex virtual function table -The @emph{method map}@footnote{This is Self terminology; in C++ -terminology: virtual function table.} is an array that contains the -execution tokens (@var{xt}s) of the methods for the object's class. Each -selector contains an offset into a method map. +@cindex registers of the inner interpreter +In the assembly code you will want to refer to the inner interpreter's +registers (e.g., the data stack pointer) and you may want to use other +registers for temporary storage. Unfortunately, the register allocation +is installation-dependent. -@cindex @code{selector} implementation, class -@code{selector} is a defining word that uses -@code{CREATE} and @code{DOES>}. The body of the -selector contains the offset; the @code{does>} action for a -class selector is, basically: +The easiest solution is to use explicit register declarations +(@pxref{Explicit Reg Vars, , Variables in Specified Registers, gcc.info, +GNU C Manual}) for all of the inner interpreter's registers: You have to +compile Gforth with @code{-DFORCE_REG} (configure option +@code{--enable-force-reg}) and the appropriate declarations must be +present in the @code{machine.h} file (see @code{mips.h} for an example; +you can find a full list of all declarable register symbols with +@code{grep register engine.c}). If you give explicit registers to all +variables that are declared at the beginning of @code{engine()}, you +should be able to use the other caller-saved registers for temporary +storage. Alternatively, you can use the @code{gcc} option +@code{-ffixed-REG} (@pxref{Code Gen Options, , Options for Code +Generation Conventions, gcc.info, GNU C Manual}) to reserve a register +(however, this restriction on register allocation may slow Gforth +significantly). -@example -( object addr ) @@ over object-map @@ + @@ execute -@end example +If this solution is not viable (e.g., because @code{gcc} does not allow +you to explicitly declare all the registers you need), you have to find +out by looking at the code where the inner interpreter's registers +reside and which registers can be used for temporary storage. You can +get an assembly listing of the engine's code with @code{make engine.s}. -Since @code{object-map} is the first field of the object, it -does not generate any code. As you can see, calling a selector has a -small, constant cost. +In any case, it is good practice to abstract your assembly code from the +actual register allocation. E.g., if the data stack pointer resides in +register @code{$17}, create an alias for this register called @code{sp}, +and use that in your assembly code. -@cindex @code{current-interface} discussion -@cindex class implementation and representation -A class is basically a @code{struct} combined with a method -map. During the class definition the alignment and size of the class -are passed on the stack, just as with @code{struct}s, so -@code{field} can also be used for defining class -fields. However, passing more items on the stack would be -inconvenient, so @code{class} builds a data structure in memory, -which is accessed through the variable -@code{current-interface}. After its definition is complete, the -class is represented on the stack by a pointer (e.g., as parameter for -a child class definition). +@cindex code words, portable +Another option for implementing normal and defining words efficiently +is to add the desired functionality to the source of Gforth. For normal +words you just have to edit @file{primitives} (@pxref{Automatic +Generation}). Defining words (equivalent to @code{;CODE} words, for fast +defined words) may require changes in @file{engine.c}, @file{kernel.fs}, +@file{prims2x.fs}, and possibly @file{cross.fs}. -A new class starts off with the alignment and size of its parent, -and a copy of the parent's method map. Defining new fields extends the -size and alignment; likewise, defining new selectors extends the -method map. @code{overrides} just stores a new @var{xt} in the method -map at the offset given by the selector. -@cindex class binding, implementation -Class binding just gets the @var{xt} at the offset given by the selector -from the class's method map and @code{compile,}s (in the case of -@code{[bind]}) it. +@c ------------------------------------------------------------- +@node Threading Words, Locals, Assembler and Code Words, Words +@section Threading Words +@cindex threading words -@cindex @code{this} implementation -@cindex @code{catch} and @code{this} -@cindex @code{this} and @code{catch} -I implemented @code{this} as a @code{value}. At the -start of an @code{m:...;m} method the old @code{this} is -stored to the return stack and restored at the end; and the object on -the TOS is stored @code{TO this}. This technique has one -disadvantage: If the user does not leave the method via -@code{;m}, but via @code{throw} or @code{exit}, -@code{this} is not restored (and @code{exit} may -crash). To deal with the @code{throw} problem, I have redefined -@code{catch} to save and restore @code{this}; the same -should be done with any word that can catch an exception. As for -@code{exit}, I simply forbid it (as a replacement, there is -@code{exitm}). +@cindex code address +These words provide access to code addresses and other threading stuff +in Gforth (and, possibly, other interpretive Forths). It more or less +abstracts away the differences between direct and indirect threading +(and, for direct threading, the machine dependences). However, at +present this wordset is still incomplete. It is also pretty low-level; +some day it will hopefully be made unnecessary by an internals wordset +that abstracts implementation details away completely. -@cindex @code{inst-var} implementation -@code{inst-var} is just the same as @code{field}, with -a different @code{does>} action: -@example -@@ this + -@end example -Similar for @code{inst-value}. +doc-threading-method +doc->code-address +doc->does-code +doc-code-address! +doc-does-code! +doc-does-handler! +doc-/does-handler -@cindex class scoping implementation -Each class also has a word list that contains the words defined with -@code{inst-var} and @code{inst-value}, and its protected -words. It also has a pointer to its parent. @code{class} pushes -the word lists of the class and all its ancestors onto the search order stack, -and @code{end-class} drops them. +The code addresses produced by various defining words are produced by +the following words: -@cindex interface implementation -An interface is like a class without fields, parent and protected -words; i.e., it just has a method map. If a class implements an -interface, its method map contains a pointer to the method map of the -interface. The positive offsets in the map are reserved for class -methods, therefore interface map pointers have negative -offsets. Interfaces have offsets that are unique throughout the -system, unlike class selectors, whose offsets are only unique for the -classes where the selector is available (invokable). +doc-docol: +doc-docon: +doc-dovar: +doc-douser: +doc-dodefer: +doc-dofield: -This structure means that interface selectors have to perform one -indirection more than class selectors to find their method. Their body -contains the interface map pointer offset in the class method map, and -the method offset in the interface method map. The -@code{does>} action for an interface selector is, basically: +You can recognize words defined by a @code{CREATE}...@code{DOES>} word +with @code{>does-code}. If the word was defined in that way, the value +returned is non-zero and identifies the @code{DOES>} used by the +defining word. +@comment TODO should that be ``identifies the xt of the DOES> ??'' -@example -( object selector-body ) -2dup selector-interface @@ ( object selector-body object interface-offset ) -swap object-map @@ + @@ ( object selector-body map ) -swap selector-offset @@ + @@ execute -@end example +@c ------------------------------------------------------------- +@node Locals, Structures, Threading Words, Words +@section Locals +@cindex locals -where @code{object-map} and @code{selector-offset} are -first fields and generate no code. +Local variables can make Forth programming more enjoyable and Forth +programs easier to read. Unfortunately, the locals of ANS Forth are +laden with restrictions. Therefore, we provide not only the ANS Forth +locals wordset, but also our own, more powerful locals wordset (we +implemented the ANS Forth locals wordset through our locals wordset). -As a concrete example, consider the following code: +The ideas in this section have also been published in the paper +@cite{Automatic Scoping of Local Variables} by M. Anton Ertl, presented +at EuroForth '94; it is available at +@*@url{http://www.complang.tuwien.ac.at/papers/ertl94l.ps.gz}. -@example -interface - selector if1sel1 - selector if1sel2 -end-interface if1 +@menu +* Gforth locals:: +* ANS Forth locals:: +@end menu -object class - if1 implementation - selector cl1sel1 - cell% inst-var cl1iv1 +@node Gforth locals, ANS Forth locals, Locals, Locals +@subsection Gforth locals +@cindex Gforth locals +@cindex locals, Gforth style -' m1 overrides construct -' m2 overrides if1sel1 -' m3 overrides if1sel2 -' m4 overrides cl1sel2 -end-class cl1 +Locals can be defined with -create obj1 object dict-new drop -create obj2 cl1 dict-new drop +@example +@{ local1 local2 ... -- comment @} +@end example +or +@example +@{ local1 local2 ... @} @end example -The data structure created by this code (including the data structure -for @code{object}) is shown in the figure, assuming a cell size of 4. -@comment nac TODO add this diagram.. +E.g., +@example +: max @{ n1 n2 -- n3 @} + n1 n2 > if + n1 + else + n2 + endif ; +@end example -@node Objects Glossary, , Objects Implementation, Objects -@subsubsection @file{objects.fs} Glossary -@cindex @file{objects.fs} Glossary +The similarity of locals definitions with stack comments is intended. A +locals definition often replaces the stack comment of a word. The order +of the locals corresponds to the order in a stack comment and everything +after the @code{--} is really a comment. -doc---objects-bind -doc---objects- -doc---objects-bind' -doc---objects-[bind] -doc---objects-class -doc---objects-class->map -doc---objects-class-inst-size -doc---objects-class-override! -doc---objects-construct -doc---objects-current' -doc---objects-[current] -doc---objects-current-interface -doc---objects-dict-new -doc---objects-drop-order -doc---objects-end-class -doc---objects-end-class-noname -doc---objects-end-interface -doc---objects-end-interface-noname -doc---objects-exitm -doc---objects-heap-new -doc---objects-implementation -doc---objects-init-object -doc---objects-inst-value -doc---objects-inst-var -doc---objects-interface -doc---objects-;m -doc---objects-m: -doc---objects-method -doc---objects-object -doc---objects-overrides -doc---objects-[parent] -doc---objects-print -doc---objects-protected -doc---objects-public -doc---objects-push-order -doc---objects-selector -doc---objects-this -doc---objects- -doc---objects-[to-inst] -doc---objects-to-this -doc---objects-xt-new +This similarity has one disadvantage: It is too easy to confuse locals +declarations with stack comments, causing bugs and making them hard to +find. However, this problem can be avoided by appropriate coding +conventions: Do not use both notations in the same program. If you do, +they should be distinguished using additional means, e.g. by position. -@c ------------------------------------------------------------- -@node OOF, Mini-OOF, Objects, Object-oriented Forth -@subsection The @file{oof.fs} model -@cindex oof -@cindex object-oriented programming +@cindex types of locals +@cindex locals types +The name of the local may be preceded by a type specifier, e.g., +@code{F:} for a floating point value: -@cindex @file{objects.fs} -@cindex @file{oof.fs} +@example +: CX* @{ F: Ar F: Ai F: Br F: Bi -- Cr Ci @} +\ complex multiplication + Ar Br f* Ai Bi f* f- + Ar Bi f* Ai Br f* f+ ; +@end example + +@cindex flavours of locals +@cindex locals flavours +@cindex value-flavoured locals +@cindex variable-flavoured locals +Gforth currently supports cells (@code{W:}, @code{W^}), doubles +(@code{D:}, @code{D^}), floats (@code{F:}, @code{F^}) and characters +(@code{C:}, @code{C^}) in two flavours: a value-flavoured local (defined +with @code{W:}, @code{D:} etc.) produces its value and can be changed +with @code{TO}. A variable-flavoured local (defined with @code{W^} etc.) +produces its address (which becomes invalid when the variable's scope is +left). E.g., the standard word @code{emit} can be defined in terms of +@code{type} like this: -This section describes the @file{oof.fs} packet. +@example +: emit @{ C^ char* -- @} + char* 1 type ; +@end example -The packet described in this section has been used in bigFORTH since 1991, and -used for two large applications: a chromatographic system used to -create new medicaments, and a graphic user interface library (MINOS). +@cindex default type of locals +@cindex locals, default type +A local without type specifier is a @code{W:} local. Both flavours of +locals are initialized with values from the data or FP stack. -You can find a description (in German) of @file{oof.fs} in @cite{Object -oriented bigFORTH} by Bernd Paysan, published in @cite{Vierte Dimension} -10(2), 1994. +Currently there is no way to define locals with user-defined data +structures, but we are working on it. + +Gforth allows defining locals everywhere in a colon definition. This +poses the following questions: @menu -* Properties of the OOF model:: -* Basic OOF Usage:: -* The OOF base class:: -* Class Declaration:: -* Class Implementation:: +* Where are locals visible by name?:: +* How long do locals live?:: +* Programming Style:: +* Implementation:: @end menu -@node Properties of the OOF model, Basic OOF Usage, OOF, OOF -@subsubsection Properties of the @file{oof.fs} model -@cindex @file{oof.fs} properties +@node Where are locals visible by name?, How long do locals live?, Gforth locals, Gforth locals +@subsubsection Where are locals visible by name? +@cindex locals visibility +@cindex visibility of locals +@cindex scope of locals -@itemize @bullet -@item -This model combines object oriented programming with information -hiding. It helps you writing large application, where scoping is -necessary, because it provides class-oriented scoping. +Basically, the answer is that locals are visible where you would expect +it in block-structured languages, and sometimes a little longer. If you +want to restrict the scope of a local, enclose its definition in +@code{SCOPE}...@code{ENDSCOPE}. -@item -Named objects, object pointers, and object arrays can be created, -selector invocation uses the "object selector" syntax. Selector invocation -to objects and/or selectors on the stack is a bit less convenient, but -possible. +doc-scope +doc-endscope -@item -Selector invocation and instance variable usage of the active object is -straightforward, since both make use of the active object. +These words behave like control structure words, so you can use them +with @code{CS-PICK} and @code{CS-ROLL} to restrict the scope in +arbitrary ways. -@item -Late binding is efficient and easy to use. +If you want a more exact answer to the visibility question, here's the +basic principle: A local is visible in all places that can only be +reached through the definition of the local@footnote{In compiler +construction terminology, all places dominated by the definition of the +local.}. In other words, it is not visible in places that can be reached +without going through the definition of the local. E.g., locals defined +in @code{IF}...@code{ENDIF} are visible until the @code{ENDIF}, locals +defined in @code{BEGIN}...@code{UNTIL} are visible after the +@code{UNTIL} (until, e.g., a subsequent @code{ENDSCOPE}). -@item -State-smart objects parse selectors. However, extensibility is provided -using a (parsing) selector @code{postpone} and a selector @code{'}. +The reasoning behind this solution is: We want to have the locals +visible as long as it is meaningful. The user can always make the +visibility shorter by using explicit scoping. In a place that can +only be reached through the definition of a local, the meaning of a +local name is clear. In other places it is not: How is the local +initialized at the control flow path that does not contain the +definition? Which local is meant, if the same name is defined twice in +two independent control flow paths? -@item -An implementation in ANS Forth is available. +This should be enough detail for nearly all users, so you can skip the +rest of this section. If you really must know all the gory details and +options, read on. -@end itemize +In order to implement this rule, the compiler has to know which places +are unreachable. It knows this automatically after @code{AHEAD}, +@code{AGAIN}, @code{EXIT} and @code{LEAVE}; in other cases (e.g., after +most @code{THROW}s), you can use the word @code{UNREACHABLE} to tell the +compiler that the control flow never reaches that place. If +@code{UNREACHABLE} is not used where it could, the only consequence is +that the visibility of some locals is more limited than the rule above +says. If @code{UNREACHABLE} is used where it should not (i.e., if you +lie to the compiler), buggy code will be produced. +doc-unreachable -@node Basic OOF Usage, The OOF base class, Properties of the OOF model, OOF -@subsubsection Basic @file{oof.fs} Usage -@cindex @file{oof.fs} usage +Another problem with this rule is that at @code{BEGIN}, the compiler +does not know which locals will be visible on the incoming +back-edge. All problems discussed in the following are due to this +ignorance of the compiler (we discuss the problems using @code{BEGIN} +loops as examples; the discussion also applies to @code{?DO} and other +loops). Perhaps the most insidious example is: +@example +AHEAD +BEGIN + x +[ 1 CS-ROLL ] THEN + @{ x @} + ... +UNTIL +@end example -This section uses the same example as for @code{objects} (@pxref{Basic Objects Usage}). +This should be legal according to the visibility rule. The use of +@code{x} can only be reached through the definition; but that appears +textually below the use. -You can define a class for graphical objects like this: +From this example it is clear that the visibility rules cannot be fully +implemented without major headaches. Our implementation treats common +cases as advertised and the exceptions are treated in a safe way: The +compiler makes a reasonable guess about the locals visible after a +@code{BEGIN}; if it is too pessimistic, the +user will get a spurious error about the local not being defined; if the +compiler is too optimistic, it will notice this later and issue a +warning. In the case above the compiler would complain about @code{x} +being undefined at its use. You can see from the obscure examples in +this section that it takes quite unusual control structures to get the +compiler into trouble, and even then it will often do fine. -@cindex @code{class} usage -@cindex @code{class;} usage -@cindex @code{method} usage +If the @code{BEGIN} is reachable from above, the most optimistic guess +is that all locals visible before the @code{BEGIN} will also be +visible after the @code{BEGIN}. This guess is valid for all loops that +are entered only through the @code{BEGIN}, in particular, for normal +@code{BEGIN}...@code{WHILE}...@code{REPEAT} and +@code{BEGIN}...@code{UNTIL} loops and it is implemented in our +compiler. When the branch to the @code{BEGIN} is finally generated by +@code{AGAIN} or @code{UNTIL}, the compiler checks the guess and +warns the user if it was too optimistic: @example -object class graphical \ "object" is the parent class - method draw ( x y graphical -- ) -class; +IF + @{ x @} +BEGIN + \ x ? +[ 1 cs-roll ] THEN + ... +UNTIL @end example -This code defines a class @code{graphical} with an -operation @code{draw}. We can perform the operation -@code{draw} on any @code{graphical} object, e.g.: - +Here, @code{x} lives only until the @code{BEGIN}, but the compiler +optimistically assumes that it lives until the @code{THEN}. It notices +this difference when it compiles the @code{UNTIL} and issues a +warning. The user can avoid the warning, and make sure that @code{x} +is not used in the wrong area by using explicit scoping: @example -100 100 t-rex draw +IF + SCOPE + @{ x @} + ENDSCOPE +BEGIN +[ 1 cs-roll ] THEN + ... +UNTIL @end example -@noindent -where @code{t-rex} is an object or object pointer, created with e.g. -@code{graphical : t-rex}. +Since the guess is optimistic, there will be no spurious error messages +about undefined locals. -@cindex abstract class -How do we create a graphical object? With the present definitions, -we cannot create a useful graphical object. The class -@code{graphical} describes graphical objects in general, but not -any concrete graphical object type (C++ users would call it an -@emph{abstract class}); e.g., there is no method for the selector -@code{draw} in the class @code{graphical}. +If the @code{BEGIN} is not reachable from above (e.g., after +@code{AHEAD} or @code{EXIT}), the compiler cannot even make an +optimistic guess, as the locals visible after the @code{BEGIN} may be +defined later. Therefore, the compiler assumes that no locals are +visible after the @code{BEGIN}. However, the user can use +@code{ASSUME-LIVE} to make the compiler assume that the same locals are +visible at the BEGIN as at the point where the top control-flow stack +item was created. -For concrete graphical objects, we define child classes of the -class @code{graphical}, e.g.: +doc-assume-live +E.g., @example -graphical class circle \ "graphical" is the parent class - cell var circle-radius -how: - : draw ( x y -- ) - circle-radius @@ draw-circle ; - - : init ( n-radius -- ( - circle-radius ! ; -class; +@{ x @} +AHEAD +ASSUME-LIVE +BEGIN + x +[ 1 CS-ROLL ] THEN + ... +UNTIL @end example -Here we define a class @code{circle} as a child of @code{graphical}, -with a field @code{circle-radius}; it defines new methods for the -selectors @code{draw} and @code{init} (@code{init} is defined in -@code{object}, the parent class of @code{graphical}). - -Now we can create a circle in the dictionary with +Other cases where the locals are defined before the @code{BEGIN} can be +handled by inserting an appropriate @code{CS-ROLL} before the +@code{ASSUME-LIVE} (and changing the control-flow stack manipulation +behind the @code{ASSUME-LIVE}). +Cases where locals are defined after the @code{BEGIN} (but should be +visible immediately after the @code{BEGIN}) can only be handled by +rearranging the loop. E.g., the ``most insidious'' example above can be +arranged into: @example -50 circle : my-circle +BEGIN + @{ x @} + ... 0= +WHILE + x +REPEAT @end example -@noindent -@code{:} invokes @code{init}, thus initializing the field -@code{circle-radius} with 50. We can draw this new circle at (100,100) -with: +@node How long do locals live?, Programming Style, Where are locals visible by name?, Gforth locals +@subsubsection How long do locals live? +@cindex locals lifetime +@cindex lifetime of locals -@example -100 100 my-circle draw -@end example +The right answer for the lifetime question would be: A local lives at +least as long as it can be accessed. For a value-flavoured local this +means: until the end of its visibility. However, a variable-flavoured +local could be accessed through its address far beyond its visibility +scope. Ultimately, this would mean that such locals would have to be +garbage collected. Since this entails un-Forth-like implementation +complexities, I adopted the same cowardly solution as some other +languages (e.g., C): The local lives only as long as it is visible; +afterwards its address is invalid (and programs that access it +afterwards are erroneous). -@cindex selector invocation, restrictions -@cindex class definition, restrictions -Note: You can only invoke a selector if the receiving object belongs to -the class where the selector was defined or one of its descendents; -e.g., you can invoke @code{draw} only for objects belonging to -@code{graphical} or its descendents (e.g., @code{circle}). The scoping -mechanism will check if you try to invoke a selector that is not -defined in this class hierarchy, so you'll get an error at compilation -time. +@node Programming Style, Implementation, How long do locals live?, Gforth locals +@subsubsection Programming Style +@cindex locals programming style +@cindex programming style, locals +The freedom to define locals anywhere has the potential to change +programming styles dramatically. In particular, the need to use the +return stack for intermediate storage vanishes. Moreover, all stack +manipulations (except @code{PICK}s and @code{ROLL}s with run-time +determined arguments) can be eliminated: If the stack items are in the +wrong order, just write a locals definition for all of them; then +write the items in the order you want. -@node The OOF base class, Class Declaration, Basic OOF Usage, OOF -@subsubsection The @file{oof.fs} base class -@cindex @file{oof.fs} base class +This seems a little far-fetched and eliminating stack manipulations is +unlikely to become a conscious programming objective. Still, the number +of stack manipulations will be reduced dramatically if local variables +are used liberally (e.g., compare @code{max} in @ref{Gforth locals} with +a traditional implementation of @code{max}). -When you define a class, you have to specify a parent class. So how do -you start defining classes? There is one class available from the start: -@code{object}. You have to use it as ancestor for all classes. It is the -only class that has no parent. Classes are also objects, except that -they don't have instance variables; class manipulation such as -inheritance or changing definitions of a class is handled through -selectors of the class @code{object}. +This shows one potential benefit of locals: making Forth programs more +readable. Of course, this benefit will only be realized if the +programmers continue to honour the principle of factoring instead of +using the added latitude to make the words longer. -@code{object} provides a number of selectors: +@cindex single-assignment style for locals +Using @code{TO} can and should be avoided. Without @code{TO}, +every value-flavoured local has only a single assignment and many +advantages of functional languages apply to Forth. I.e., programs are +easier to analyse, to optimize and to read: It is clear from the +definition what the local stands for, it does not turn into something +different later. -@itemize @bullet -@item -@code{class} for subclassing, @code{definitions} to add definitions -later on, and @code{class?} to get type informations (is the class a -subclass of the class passed on the stack?). -doc---object-class -doc---object-definitions -doc---object-class? +E.g., a definition using @code{TO} might look like this: +@example +: strcmp @{ addr1 u1 addr2 u2 -- n @} + u1 u2 min 0 + ?do + addr1 c@@ addr2 c@@ - + ?dup-if + unloop exit + then + addr1 char+ TO addr1 + addr2 char+ TO addr2 + loop + u1 u2 - ; +@end example +Here, @code{TO} is used to update @code{addr1} and @code{addr2} at +every loop iteration. @code{strcmp} is a typical example of the +readability problems of using @code{TO}. When you start reading +@code{strcmp}, you think that @code{addr1} refers to the start of the +string. Only near the end of the loop you realize that it is something +else. -@item -@code{init} and @code{dispose} as constructor and destructor of the -object. @code{init} is invocated after the object's memory is allocated, -while @code{dispose} also handles deallocation. Thus if you redefine -@code{dispose}, you have to call the parent's dispose with @code{super -dispose}, too. -doc---object-init -doc---object-dispose +This can be avoided by defining two locals at the start of the loop that +are initialized with the right value for the current iteration. +@example +: strcmp @{ addr1 u1 addr2 u2 -- n @} + addr1 addr2 + u1 u2 min 0 + ?do @{ s1 s2 @} + s1 c@@ s2 c@@ - + ?dup-if + unloop exit + then + s1 char+ s2 char+ + loop + 2drop + u1 u2 - ; +@end example +Here it is clear from the start that @code{s1} has a different value +in every loop iteration. -@item -@code{new}, @code{new[]}, @code{:}, @code{ptr}, @code{asptr}, and -@code{[]} to create named and unnamed objects and object arrays or -object pointers. -doc---object-new -doc---object-new[] -doc---object-: -doc---object-ptr -doc---object-asptr -doc---object-[] +@node Implementation, , Programming Style, Gforth locals +@subsubsection Implementation +@cindex locals implementation +@cindex implementation of locals -@item -@code{::} and @code{super} for explicit scoping. You should use explicit -scoping only for super classes or classes with the same set of instance -variables. Explicitly-scoped selectors use early binding. -doc---object-:: -doc---object-super +@cindex locals stack +Gforth uses an extra locals stack. The most compelling reason for +this is that the return stack is not float-aligned; using an extra stack +also eliminates the problems and restrictions of using the return stack +as locals stack. Like the other stacks, the locals stack grows toward +lower addresses. A few primitives allow an efficient implementation: -@item -@code{self} to get the address of the object -doc---object-self +doc-@local# +doc-f@local# +doc-laddr# +doc-lp+!# +doc-lp! +doc->l +doc-f>l -@item -@code{bind}, @code{bound}, @code{link}, and @code{is} to assign object -pointers and instance defers. -doc---object-bind -doc---object-bound -doc---object-link -doc---object-is +In addition to these primitives, some specializations of these +primitives for commonly occurring inline arguments are provided for +efficiency reasons, e.g., @code{@@local0} as specialization of +@code{@@local#} for the inline argument 0. The following compiling words +compile the right specialized version, or the general version, as +appropriate: -@item -@code{'} to obtain selector tokens, @code{send} to invocate selectors -form the stack, and @code{postpone} to generate selector invocation code. -doc---object-' -doc---object-postpone +doc-compile-@local +doc-compile-f@local +doc-compile-lp+! -@item -@code{with} and @code{endwith} to select the active object from the -stack, and enable its scope. Using @code{with} and @code{endwith} -also allows you to create code using selector @code{postpone} without being -trapped by the state-smart objects. -doc---object-with -doc---object-endwith +Combinations of conditional branches and @code{lp+!#} like +@code{?branch-lp+!#} (the locals pointer is only changed if the branch +is taken) are provided for efficiency and correctness in loops. -@end itemize +A special area in the dictionary space is reserved for keeping the +local variable names. @code{@{} switches the dictionary pointer to this +area and @code{@}} switches it back and generates the locals +initializing code. @code{W:} etc.@ are normal defining words. This +special area is cleared at the start of every colon definition. -@node Class Declaration, Class Implementation, The OOF base class, OOF -@subsubsection Class Declaration -@cindex class declaration +@cindex word list for defining locals +A special feature of Gforth's dictionary is used to implement the +definition of locals without type specifiers: every word list (aka +vocabulary) has its own methods for searching +etc. (@pxref{Word Lists}). For the present purpose we defined a word list +with a special search method: When it is searched for a word, it +actually creates that word using @code{W:}. @code{@{} changes the search +order to first search the word list containing @code{@}}, @code{W:} etc., +and then the word list for defining locals without type specifiers. -@itemize @bullet -@item -Instance variables -doc---oof-var +The lifetime rules support a stack discipline within a colon +definition: The lifetime of a local is either nested with other locals +lifetimes or it does not overlap them. -@item -Object pointers -doc---oof-ptr -doc---oof-asptr - -@item -Instance defers -doc---oof-defer - -@item -Method selectors -doc---oof-early -doc---oof-method - -@item -Class-wide variables -doc---oof-static +At @code{BEGIN}, @code{IF}, and @code{AHEAD} no code for locals stack +pointer manipulation is generated. Between control structure words +locals definitions can push locals onto the locals stack. @code{AGAIN} +is the simplest of the other three control flow words. It has to +restore the locals stack depth of the corresponding @code{BEGIN} +before branching. The code looks like this: +@format +@code{lp+!#} current-locals-size @minus{} dest-locals-size +@code{branch} +@end format -@item -End declaration -doc---oof-how: -doc---oof-class; +@code{UNTIL} is a little more complicated: If it branches back, it +must adjust the stack just like @code{AGAIN}. But if it falls through, +the locals stack must not be changed. The compiler generates the +following code: +@format +@code{?branch-lp+!#} current-locals-size @minus{} dest-locals-size +@end format +The locals stack pointer is only adjusted if the branch is taken. -@end itemize +@code{THEN} can produce somewhat inefficient code: +@format +@code{lp+!#} current-locals-size @minus{} orig-locals-size +: +@code{lp+!#} orig-locals-size @minus{} new-locals-size +@end format +The second @code{lp+!#} adjusts the locals stack pointer from the +level at the @var{orig} point to the level after the @code{THEN}. The +first @code{lp+!#} adjusts the locals stack pointer from the current +level to the level at the orig point, so the complete effect is an +adjustment from the current level to the right level after the +@code{THEN}. -@c ------------------------------------------------------------- -@node Class Implementation, , Class Declaration, OOF -@subsubsection Class Implementation -@cindex class implementation +@cindex locals information on the control-flow stack +@cindex control-flow stack items, locals information +In a conventional Forth implementation a dest control-flow stack entry +is just the target address and an orig entry is just the address to be +patched. Our locals implementation adds a word list to every orig or dest +item. It is the list of locals visible (or assumed visible) at the point +described by the entry. Our implementation also adds a tag to identify +the kind of entry, in particular to differentiate between live and dead +(reachable and unreachable) orig entries. -@c ------------------------------------------------------------- -@node Mini-OOF, Comparison with other object models, OOF, Object-oriented Forth -@subsection The @file{mini-oof.fs} model -@cindex mini-oof +A few unusual operations have to be performed on locals word lists: -Gforth's third object oriented Forth package is a 12-liner. It uses a -mixture of the @file{object.fs} and the @file{oof.fs} syntax, -and reduces to the bare minimum of features. This is based on a posting -of Bernd Paysan in comp.arch. +doc-common-list +doc-sub-list? +doc-list-size -@menu -* Basic Mini-OOF Usage:: -* Mini-OOF Example:: -* Mini-OOF Implementation:: -@end menu +Several features of our locals word list implementation make these +operations easy to implement: The locals word lists are organised as +linked lists; the tails of these lists are shared, if the lists +contain some of the same locals; and the address of a name is greater +than the address of the names behind it in the list. -@c ------------------------------------------------------------- -@node Basic Mini-OOF Usage, Mini-OOF Example, , Mini-OOF -@subsubsection Basic @file{mini-oof.fs} Usage -@cindex mini-oof usage +Another important implementation detail is the variable +@code{dead-code}. It is used by @code{BEGIN} and @code{THEN} to +determine if they can be reached directly or only through the branch +that they resolve. @code{dead-code} is set by @code{UNREACHABLE}, +@code{AHEAD}, @code{EXIT} etc., and cleared at the start of a colon +definition, by @code{BEGIN} and usually by @code{THEN}. -There is a base class (@code{class}, which allocates one cell -for the object pointer) plus seven other words: to define a method, a -variable, a class; to end a class, to resolve binding, to allocate an -object and to compile a class method. -@comment TODO better description of the last one +Counted loops are similar to other loops in most respects, but +@code{LEAVE} requires special attention: It performs basically the same +service as @code{AHEAD}, but it does not create a control-flow stack +entry. Therefore the information has to be stored elsewhere; +traditionally, the information was stored in the target fields of the +branches created by the @code{LEAVE}s, by organizing these fields into a +linked list. Unfortunately, this clever trick does not provide enough +space for storing our extended control flow information. Therefore, we +introduce another stack, the leave stack. It contains the control-flow +stack entries for all unresolved @code{LEAVE}s. -doc-object -doc-method -doc-var -doc-class -doc-end-class -doc-defines -doc-new -doc-:: +Local names are kept until the end of the colon definition, even if +they are no longer visible in any control-flow path. In a few cases +this may lead to increased space needs for the locals name area, but +usually less than reclaiming this space would cost in code size. -@c ------------------------------------------------------------- -@node Mini-OOF Example, Mini-OOF Implementation, Basic Mini-OOF Usage, Mini-OOF -@subsubsection Mini-OOF Example -@cindex mini-oof example +@node ANS Forth locals, , Gforth locals, Locals +@subsection ANS Forth locals +@cindex locals, ANS Forth style -A short example shows how to use this package. -@comment nac TODO could flesh this out with some comments from the Forthwrite article +The ANS Forth locals wordset does not define a syntax for locals, but +words that make it possible to define various syntaxes. One of the +possible syntaxes is a subset of the syntax we used in the Gforth locals +wordset, i.e.: @example -object class - method init - method draw -end-class graphical +@{ local1 local2 ... -- comment @} @end example - -This code defines a class @code{graphical} with an -operation @code{draw}. We can perform the operation -@code{draw} on any @code{graphical} object, e.g.: - +@noindent +or @example -100 100 t-rex draw +@{ local1 local2 ... @} @end example -where @code{t-rex} is an object or object pointer, created with e.g. -@code{graphical new Constant t-rex}. - -For concrete graphical objects, we define child classes of the -class @code{graphical}, e.g.: +The order of the locals corresponds to the order in a stack comment. The +restrictions are: -@example -graphical class - cell var circle-radius -end-class circle \ "graphical" is the parent class +@itemize @bullet +@item +Locals can only be cell-sized values (no type specifiers are allowed). +@item +Locals can be defined only outside control structures. +@item +Locals can interfere with explicit usage of the return stack. For the +exact (and long) rules, see the standard. If you don't use return stack +accessing words in a definition using locals, you will be all right. The +purpose of this rule is to make locals implementation on the return +stack easier. +@item +The whole definition must be in one line. +@end itemize -:noname ( x y -- ) - circle-radius @@ draw-circle ; circle defines draw -:noname ( r -- ) - circle-radius ! ; circle defines init -@end example +Locals defined in this way behave like @code{VALUE}s (@xref{Simple +Defining Words}). I.e., they are initialized from the stack. Using their +name produces their value. Their value can be changed using @code{TO}. -There is no implicit init method, so we have to define one. The creation -code of the object now has to call init explicitely. +Since this syntax is supported by Gforth directly, you need not do +anything to use it. If you want to port a program using this syntax to +another ANS Forth system, use @file{compat/anslocal.fs} to implement the +syntax on the other system. -@example -circle new Constant my-circle -50 my-circle init -@end example +Note that a syntax shown in the standard, section A.13 looks +similar, but is quite different in having the order of locals +reversed. Beware! -It is also possible to add a function to create named objects with -automatic call of @code{init}, given that all objects have @code{init} -on the same place: +The ANS Forth locals wordset itself consists of a word: -@example -: new: ( .. o "name" -- ) - new dup Constant init ; -80 circle new: large-circle -@end example +doc-(local) -We can draw this new circle at (100,100) with: +The ANS Forth locals extension wordset defines a syntax using @code{locals|}, but it is so +awful that we strongly recommend not to use it. We have implemented this +syntax to make porting to Gforth easy, but do not document it here. The +problem with this syntax is that the locals are defined in an order +reversed with respect to the standard stack comment notation, making +programs harder to read, and easier to misread and miswrite. The only +merit of this syntax is that it is easy to implement using the ANS Forth +locals wordset. -@example -100 100 my-circle draw -@end example -@node Mini-OOF Implementation, , Mini-OOF Example, Mini-OOF -@subsubsection @file{mini-oof.fs} Implementation +@c ---------------------------------------------------------- +@node Structures, Object-oriented Forth, Locals, Words +@section Structures +@cindex structures +@cindex records -Object-oriented systems with late binding typically use a -"vtable"-approach: the first variable in each object is a pointer to a -table, which contains the methods as function pointers. The vtable -may also contain other information. +This section presents the structure package that comes with Gforth. A +version of the package implemented in ANS Forth is available in +@file{compat/struct.fs}. This package was inspired by a posting on +comp.lang.forth in 1989 (unfortunately I don't remember, by whom; +possibly John Hayes). A version of this section has been published in +???. Marcel Hendrix provided helpful comments. -So first, let's declare methods: +@menu +* Why explicit structure support?:: +* Structure Usage:: +* Structure Naming Convention:: +* Structure Implementation:: +* Structure Glossary:: +@end menu -@example -: method ( m v -- m' v ) Create over , swap cell+ swap - DOES> ( ... o -- ... ) @ over @ + @ execute ; -@end example +@node Why explicit structure support?, Structure Usage, Structures, Structures +@subsection Why explicit structure support? -During method declaration, the number of methods and instance -variables is on the stack (in address units). @code{method} creates -one method and increments the method number. To execute a method, it -takes the object, fetches the vtable pointer, adds the offset, and -executes the @var{xt} stored there. Each method takes the object it is -invoked from as top of stack parameter. The method itself should -consume that object. +@cindex address arithmetic for structures +@cindex structures using address arithmetic +If we want to use a structure containing several fields, we could simply +reserve memory for it, and access the fields using address arithmetic +(@pxref{Address arithmetic}). As an example, consider a structure with +the following fields -Now, we also have to declare instance variables +@table @code +@item a +is a float +@item b +is a cell +@item c +is a float +@end table -@example -: var ( m v size -- m v' ) Create over , + - DOES> ( o -- addr ) @ + ; -@end example +Given the (float-aligned) base address of the structure we get the +address of the field -As before, a word is created with the current offset. Instance -variables can have different sizes (cells, floats, doubles, chars), so -all we do is take the size and add it to the offset. If your machine -has alignment restrictions, put the proper @code{aligned} or -@code{faligned} before the variable, to adjust the variable -offset. That's why it is on the top of stack. +@table @code +@item a +without doing anything further. +@item b +with @code{float+} +@item c +with @code{float+ cell+ faligned} +@end table -We need a starting point (the base object) and some syntactic sugar: +It is easy to see that this can become quite tiring. -@example -Create object 1 cells , 2 cells , -: class ( class -- class methods vars ) dup 2@ ; -@end example +Moreover, it is not very readable, because seeing a +@code{cell+} tells us neither which kind of structure is +accessed nor what field is accessed; we have to somehow infer the kind +of structure, and then look up in the documentation, which field of +that structure corresponds to that offset. -For inheritance, the vtable of the parent object has to be -copied when a new, derived class is declared. This gives all the -methods of the parent class, which can be overridden, though. +Finally, this kind of address arithmetic also causes maintenance +troubles: If you add or delete a field somewhere in the middle of the +structure, you have to find and change all computations for the fields +afterwards. + +So, instead of using @code{cell+} and friends directly, how +about storing the offsets in constants: @example -: end-class ( class methods vars -- ) - Create here >r , dup , 2 cells ?DO ['] noop , 1 cells +LOOP - cell+ dup cell+ r> rot @ 2 cells /string move ; +0 constant a-offset +0 float+ constant b-offset +0 float+ cell+ faligned c-offset @end example -The first line creates the vtable, initialized with -@code{noop}s. The second line is the inheritance mechanism, it -copies the xts from the parent vtable. - -We still have no way to define new methods, let's do that now: +Now we can get the address of field @code{x} with @code{x-offset ++}. This is much better in all respects. Of course, you still +have to change all later offset definitions if you add a field. You can +fix this by declaring the offsets in the following way: @example -: defines ( xt class -- ) ' >body @ + ! ; +0 constant a-offset +a-offset float+ constant b-offset +b-offset cell+ faligned constant c-offset @end example -To allocate a new object, we need a word, too: +Since we always use the offsets with @code{+}, we could use a defining +word @code{cfield} that includes the @code{+} in the action of the +defined word: @example -: new ( class -- o ) here over @ allot swap over ! ; +: cfield ( n "name" -- ) + create , +does> ( name execution: addr1 -- addr2 ) + @@ + ; + +0 cfield a +0 a float+ cfield b +0 b cell+ faligned cfield c @end example -Sometimes derived classes want to access the method of the -parent object. There are two ways to achieve this with Mini-OOF: -first, you could use named words, and second, you could look up the -vtable of the parent object. +Instead of @code{x-offset +}, we now simply write @code{x}. + +The structure field words now can be used quite nicely. However, +their definition is still a bit cumbersome: We have to repeat the +name, the information about size and alignment is distributed before +and after the field definitions etc. The structure package presented +here addresses these problems. + +@node Structure Usage, Structure Naming Convention, Why explicit structure support?, Structures +@subsection Structure Usage +@cindex structure usage +@cindex @code{field} usage +@cindex @code{struct} usage +@cindex @code{end-struct} usage +You can define a structure for a (data-less) linked list with: @example -: :: ( class "name" -- ) ' >body @ + @ compile, ; +struct + cell% field list-next +end-struct list% @end example - -Nothing can be more confusing than a good example, so here is -one. First let's declare a text object (called -@code{button}), that stores text and position: +With the address of the list node on the stack, you can compute the +address of the field that contains the address of the next node with +@code{list-next}. E.g., you can determine the length of a list +with: @example -object class - cell var text - cell var len - cell var x - cell var y - method init - method draw -end-class button +: list-length ( list -- n ) +\ "list" is a pointer to the first element of a linked list +\ "n" is the length of the list + 0 BEGIN ( list1 n1 ) + over + WHILE ( list1 n1 ) + 1+ swap list-next @@ swap + REPEAT + nip ; @end example -@noindent -Now, implement the two methods, @code{draw} and @code{init}: +You can reserve memory for a list node in the dictionary with +@code{list% %allot}, which leaves the address of the list node on the +stack. For the equivalent allocation on the heap you can use @code{list% +%alloc} (or, for an @code{allocate}-like stack effect (i.e., with ior), +use @code{list% %allocate}). You can get the the size of a list +node with @code{list% %size} and its alignment with @code{list% +%alignment}. +Note that in ANS Forth the body of a @code{create}d word is +@code{aligned} but not necessarily @code{faligned}; +therefore, if you do a: @example -:noname ( o -- ) - >r r@ x @ r@ y @ at-xy r@ text @ r> len @ type ; - button defines draw -:noname ( addr u o -- ) - >r 0 r@ x ! 0 r@ y ! r@ len ! r> text ! ; - button defines init +create @emph{name} foo% %allot @end example @noindent -To demonstrate inheritance, we define a class @code{bold-button}, with no -new data and no new methods. +then the memory alloted for @code{foo%} is +guaranteed to start at the body of @code{@emph{name}} only if +@code{foo%} contains only character, cell and double fields. +@cindex strcutures containing structures +You can include a structure @code{foo%} as a field of +another structure, like this: @example -button class -end-class bold-button - -: bold 27 emit ." [1m" ; -: normal 27 emit ." [0m" ; +struct +... + foo% field ... +... +end-struct ... +@end example -@noindent -The class @code{bold-button} has a different draw method to -@code{button}, but the new method is defined in terms of the draw method -for @code{button}: +@cindex structure extension +@cindex extended records +Instead of starting with an empty structure, you can extend an +existing structure. E.g., a plain linked list without data, as defined +above, is hardly useful; You can extend it to a linked list of integers, +like this:@footnote{This feature is also known as @emph{extended +records}. It is the main innovation in the Oberon language; in other +words, adding this feature to Modula-2 led Wirth to create a new +language, write a new compiler etc. Adding this feature to Forth just +required a few lines of code.} -:noname bold [ button :: draw ] normal ; bold-button defines draw +@example +list% + cell% field intlist-int +end-struct intlist% @end example -@noindent -Finally, create two objects and apply methods: +@code{intlist%} is a structure with two fields: +@code{list-next} and @code{intlist-int}. + +@cindex structures containing arrays +You can specify an array type containing @emph{n} elements of +type @code{foo%} like this: @example -button new Constant foo -s" thin foo" foo init -page -foo draw -bold-button new Constant bar -s" fat bar" bar init -1 bar y ! -bar draw +foo% @emph{n} * @end example +You can use this array type in any place where you can use a normal +type, e.g., when defining a @code{field}, or with +@code{%allot}. -@node Comparison with other object models, , Mini-OOF, Object-oriented Forth -@subsubsection Comparison with other object models -@cindex comparison of object models -@cindex object models, comparison +@cindex first field optimization +The first field is at the base address of a structure and the word +for this field (e.g., @code{list-next}) actually does not change +the address on the stack. You may be tempted to leave it away in the +interest of run-time and space efficiency. This is not necessary, +because the structure package optimizes this case and compiling such +words does not generate any code. So, in the interest of readability +and maintainability you should include the word for the field when +accessing the field. -Many object-oriented Forth extensions have been proposed (@cite{A survey -of object-oriented Forths} (SIGPLAN Notices, April 1996) by Bradford -J. Rodriguez and W. F. S. Poehlman lists 17). This section discusses the -relation of the object models described here to two well-known and two -closely-related (by the use of method maps) models. +@node Structure Naming Convention, Structure Implementation, Structure Usage, Structures +@subsection Structure Naming Convention +@cindex structure naming convention -@cindex Neon model -The most popular model currently seems to be the Neon model (see -@cite{Object-oriented programming in ANS Forth} (Forth Dimensions, March -1997) by Andrew McKewan) but this model has a number of limitations -@footnote{A longer version of this critique can be -found in @cite{On Standardizing Object-Oriented Forth Extensions} (Forth -Dimensions, May 1997) by Anton Ertl.}: +The field names that come to (my) mind are often quite generic, and, +if used, would cause frequent name clashes. E.g., many structures +probably contain a @code{counter} field. The structure names +that come to (my) mind are often also the logical choice for the names +of words that create such a structure. + +Therefore, I have adopted the following naming conventions: @itemize @bullet +@cindex field naming convention @item -It uses a @code{@emph{selector -object}} syntax, which makes it unnatural to pass objects on the -stack. +The names of fields are of the form +@code{@emph{struct}-@emph{field}}, where +@code{@emph{struct}} is the basic name of the structure, and +@code{@emph{field}} is the basic name of the field. You can +think of field words as converting the (address of the) +structure into the (address of the) field. +@cindex structure naming convention @item -It requires that the selector parses the input stream (at -compile time); this leads to reduced extensibility and to bugs that are+ -hard to find. +The names of structures are of the form +@code{@emph{struct}%}, where +@code{@emph{struct}} is the basic name of the structure. +@end itemize -@item -It allows using every selector to every object; -this eliminates the need for classes, but makes it harder to create -efficient implementations. -@end itemize +This naming convention does not work that well for fields of extended +structures; e.g., the integer list structure has a field +@code{intlist-int}, but has @code{list-next}, not +@code{intlist-next}. -@cindex Pountain's object-oriented model -Another well-known publication is @cite{Object-Oriented Forth} (Academic -Press, London, 1987) by Dick Pountain. However, it is not really about -object-oriented programming, because it hardly deals with late -binding. Instead, it focuses on features like information hiding and -overloading that are characteristic of modular languages like Ada (83). +@node Structure Implementation, Structure Glossary, Structure Naming Convention, Structures +@subsection Structure Implementation +@cindex structure implementation +@cindex implementation of structures -@cindex Zsoter's object-oriented model -In @cite{Does late binding have to be slow?} (Forth Dimensions 18(1) 1996, pages 31-35) -Andras Zsoter describes a model that makes heavy use of an active object -(like @code{this} in @file{objects.fs}): The active object is not only -used for accessing all fields, but also specifies the receiving object -of every selector invocation; you have to change the active object -explicitly with @code{@{ ... @}}, whereas in @file{objects.fs} it -changes more or less implicitly at @code{m: ... ;m}. Such a change at -the method entry point is unnecessary with the Zsoter's model, because -the receiving object is the active object already. On the other hand, the explicit -change is absolutely necessary in that model, because otherwise no one -could ever change the active object. An ANS Forth implementation of this -model is available at @url{http://www.forth.org/fig/oopf.html}. +The central idea in the implementation is to pass the data about the +structure being built on the stack, not in some global +variable. Everything else falls into place naturally once this design +decision is made. -@cindex @file{oof.fs}, differences to other models -The @file{oof.fs} model combines information hiding and overloading -resolution (by keeping names in various word lists) with object-oriented -programming. It sets the active object implicitly on method entry, but -also allows explicit changing (with @code{>o...o>} or with -@code{with...endwith}). It uses parsing and state-smart objects and -classes for resolving overloading and for early binding: the object or -class parses the selector and determines the method from this. If the -selector is not parsed by an object or class, it performs a call to the -selector for the active object (late binding), like Zsoter's model. -Fields are always accessed through the active object. The big -disadvantage of this model is the parsing and the state-smartness, which -reduces extensibility and increases the opportunities for subtle bugs; -essentially, you are only safe if you never tick or @code{postpone} an -object or class (Bernd disagrees, but I (Anton) am not convinced). +The type description on the stack is of the form @emph{align +size}. Keeping the size on the top-of-stack makes dealing with arrays +very simple. -@cindex @file{mini-oof.fs}, differences to other models -The @file{mini-oof.fs} model is quite similar to a very stripped-down version of -the @file{objects.fs} model, but syntactically it is a mixture of the @file{objects.fs} and -@file{oof.fs} models. +@code{field} is a defining word that uses @code{Create} +and @code{DOES>}. The body of the field contains the offset +of the field, and the normal @code{DOES>} action is simply: +@example +@ + +@end example +@noindent +i.e., add the offset to the address, giving the stack effect +@var{addr1 -- addr2} for a field. -@c ------------------------------------------------------------- -@node Tokens for Words, Word Lists, Object-oriented Forth, Words -@section Tokens for Words -@cindex tokens for words +@cindex first field optimization, implementation +This simple structure is slightly complicated by the optimization +for fields with offset 0, which requires a different +@code{DOES>}-part (because we cannot rely on there being +something on the stack if such a field is invoked during +compilation). Therefore, we put the different @code{DOES>}-parts +in separate words, and decide which one to invoke based on the +offset. For a zero offset, the field is basically a noop; it is +immediate, and therefore no code is generated when it is compiled. -This chapter describes the creation and use of tokens that represent -words on the stack (and in data space). +@node Structure Glossary, , Structure Implementation, Structures +@subsection Structure Glossary +@cindex structure glossary -Named words have interpretation and compilation semantics. Unnamed words -just have execution semantics. +doc-%align +doc-%alignment +doc-%alloc +doc-%allocate +doc-%allot +doc-cell% +doc-char% +doc-dfloat% +doc-double% +doc-end-struct +doc-field +doc-float% +doc-naligned +doc-sfloat% +doc-%size +doc-struct -@comment TODO ?normally interpretation semantics are the execution semantics. -@comment this should all be covered in earlier ss +@c ------------------------------------------------------------- +@node Object-oriented Forth, Passing Commands to the OS, Structures, Words +@section Object-oriented Forth -@cindex execution token -An @dfn{execution token} represents the execution semantics of an -unnamed word. An execution token occupies one cell. As explained in -@ref{Supplying names}, the execution token of the last word -defined can be produced with @code{lastxt}. +Gforth comes with three packages for object-oriented programming: +@file{objects.fs}, @file{oof.fs}, and @file{mini-oof.fs}; none of them +is preloaded, so you have to @code{include} them before use. The most +important differences between these packages (and others) are discussed +in @ref{Comparison with other object models}. All packages are written +in ANS Forth and can be used with any other ANS Forth. -You can perform the semantics represented by an execution token with: -doc-execute -You can compile the word with: -doc-compile, +@menu +* Why object-oriented programming?:: +* Object-Oriented Terminology:: +* Objects:: +* OOF:: +* Mini-OOF:: +* Comparison with other object models:: +@end menu -@cindex code field address -@cindex CFA -In Gforth, the abstract data type @emph{execution token} is implemented -as CFA (code field address). -@comment TODO note that the standard does not say what it represents.. -@comment and you cannot necessarily compile it in all Forths (eg native -@comment compilers?). -The interpretation semantics of a named word are also represented by an -execution token. You can get it with +@node Why object-oriented programming?, Object-Oriented Terminology, , Object-oriented Forth +@subsubsection Why object-oriented programming? +@cindex object-oriented programming motivation +@cindex motivation for object-oriented programming -doc-['] -doc-' +Often we have to deal with several data structures (@emph{objects}), +that have to be treated similarly in some respects, but differently in +others. Graphical objects are the textbook example: circles, triangles, +dinosaurs, icons, and others, and we may want to add more during program +development. We want to apply some operations to any graphical object, +e.g., @code{draw} for displaying it on the screen. However, @code{draw} +has to do something different for every kind of object. +@comment TODO add some other operations eg perimeter, area +@comment and tie in to concrete examples later.. -For literals, you use @code{'} in interpreted code and @code{[']} in -compiled code. Gforth's @code{'} and @code{[']} behave somewhat unusual -by complaining about compile-only words. To get an execution token for a -compiling word @var{X}, use @code{COMP' @var{X} drop} or @code{[COMP'] -@var{X} drop}. +We could implement @code{draw} as a big @code{CASE} +control structure that executes the appropriate code depending on the +kind of object to be drawn. This would be not be very elegant, and, +moreover, we would have to change @code{draw} every time we add +a new kind of graphical object (say, a spaceship). -@cindex compilation token -The compilation semantics are represented by a @dfn{compilation token} -consisting of two cells: @var{w xt}. The top cell @var{xt} is an -execution token. The compilation semantics represented by the -compilation token can be performed with @code{execute}, which consumes -the whole compilation token, with an additional stack effect determined -by the represented compilation semantics. +What we would rather do is: When defining spaceships, we would tell +the system: ``Here's how you @code{draw} a spaceship; you figure +out the rest''. -doc-[comp'] -doc-comp' +This is the problem that all systems solve that (rightfully) call +themselves object-oriented; the object-oriented packages presented here +solve this problem (and not much else). +@comment TODO ?list properties of oo systems.. oo vs o-based? -You can compile the compilation semantics with @code{postpone,}. I.e., -@code{COMP' @var{word} POSTPONE,} is equivalent to @code{POSTPONE -@var{word}}. +@node Object-Oriented Terminology, Objects, Why object-oriented programming?, Object-oriented Forth +@subsubsection Object-Oriented Terminology +@cindex object-oriented terminology +@cindex terminology for object-oriented programming -doc-postpone, +This section is mainly for reference, so you don't have to understand +all of it right away. The terminology is mainly Smalltalk-inspired. In +short: -At present, the @var{w} part of a compilation token is an execution -token, and the @var{xt} part represents either @code{execute} or -@code{compile,}. However, don't rely on that knowledge, unless necessary; -we may introduce unusual compilation tokens in the future (e.g., -compilation tokens representing the compilation semantics of literals). +@table @emph +@cindex class +@item class +a data structure definition with some extras. -@cindex name token -@cindex name field address -@cindex NFA -Named words are also represented by the @dfn{name token}. The abstract -data type @emph{name token} is implemented as NFA (name field address). +@cindex object +@item object +an instance of the data structure described by the class definition. -doc-find-name -doc-name>int -doc-name?int -doc-name>comp -doc-name>string +@cindex instance variables +@item instance variables +fields of the data structure. -@node Word Lists, Environmental Queries, Tokens for Words, Words -@section Word Lists -@cindex word lists -@cindex name dictionary +@cindex selector +@cindex method selector +@cindex virtual function +@item selector +(or @emph{method selector}) a word (e.g., +@code{draw}) that performs an operation on a variety of data +structures (classes). A selector describes @emph{what} operation to +perform. In C++ terminology: a (pure) virtual function. -@cindex wid -All definitions other than those created by @code{:noname} have an entry -in the name dictionary. The name dictionary is fragmented into a number -of parts, called @var{word lists}. A word list is identified by a -cell-sized word list identifier (@var{wid}) in much the same way as a -file is identified by a file handle. The numerical value of the wid has -no (portable) meaning, and might change from session to session. +@cindex method +@item method +the concrete definition that performs the operation +described by the selector for a specific class. A method specifies +@emph{how} the operation is performed for a specific class. -@cindex compilation word list -At any one time, a single word list is defined as the word list to which -all new definitions will be added -- this is called the @var{compilation -word list}. When Gforth is started, the compilation word list is the -word list called @code{FORTH-WORDLIST}. +@cindex selector invocation +@cindex message send +@cindex invoking a selector +@item selector invocation +a call of a selector. One argument of the call (the TOS (top-of-stack)) +is used for determining which method is used. In Smalltalk terminology: +a message (consisting of the selector and the other arguments) is sent +to the object. -@cindex search order stack -Forth maintains a stack of word lists, representing the @var{search -order}. When the name dictionary is searched (for example, when -attempting to find a word's execution token during compilation), only -those word lists that are currently in the search order are -searched. The most recently-defined word in the word list at the top of -the word list stack is searched first, and the search proceeds until -either the word is located or the oldest definition in the word list at -the bottom of the stack is reached. Definitions of the word may exist in -more than one word lists; the search order determines which version will -be found. +@cindex receiving object +@item receiving object +the object used for determining the method executed by a selector +invocation. In the @file{objects.fs} model, it is the object that is on +the TOS when the selector is invoked. (@emph{Receiving} comes from +the Smalltalk @emph{message} terminology.) -The ANS Forth Standard "Search order" word set is intended to provide a -set of low-level tools that allow various different schemes to be -implemented. Gforth provides @code{vocabulary}, a traditional Forth -word. @file{compat/vocabulary.fs} provides an implementation in ANS -Standard Forth. +@cindex child class +@cindex parent class +@cindex inheritance +@item child class +a class that has (@emph{inherits}) all properties (instance variables, +selectors, methods) from a @emph{parent class}. In Smalltalk +terminology: The subclass inherits from the superclass. In C++ +terminology: The derived class inherits from the base class. -TODO: locals section refers to here, saying that every word list (aka -vocabulary) has its own methods for searching etc. Need to document that. +@end table -doc-forth-wordlist -doc-definitions -doc-get-current -doc-set-current +@c If you wonder about the message sending terminology, it comes from +@c a time when each object had it's own task and objects communicated via +@c message passing; eventually the Smalltalk developers realized that +@c they can do most things through simple (indirect) calls. They kept the +@c terminology. -@comment TODO when a defn (like set-order) is instanced twice, the second instance gets documented. -@comment In general that might be fine, but in this example (search.fs) the second instance is an -@comment alias, so it would not naturally have documentation -doc-get-order -doc-set-order -doc-wordlist -doc-also -doc-forth -doc-only -doc-order -doc-previous +@node Objects, OOF, Object-Oriented Terminology, Object-oriented Forth +@subsection The @file{objects.fs} model +@cindex objects +@cindex object-oriented programming -doc-find -doc-search-wordlist +@cindex @file{objects.fs} +@cindex @file{oof.fs} -doc-words -doc-vlist +This section describes the @file{objects.fs} package. This material also has been published in @cite{Yet Another Forth Objects Package} by Anton Ertl and appeared in Forth Dimensions 19(2), pages 37--43 (@url{http://www.complang.tuwien.ac.at/forth/objects/objects.html}). +@c McKewan's and Zsoter's packages + +This section assumes that you have read @ref{Structures}. + +The techniques on which this model is based have been used to implement +the parser generator, Gray, and have also been used in Gforth for +implementing the various flavours of word lists (hashed or not, +case-sensitive or not, special-purpose word lists for locals etc.). -doc-mappedwordlist -doc-root -doc-vocabulary -doc-seal -doc-vocs -doc-current -doc-context @menu -* Why use word lists?:: -* Word list examples:: +* Properties of the Objects model:: +* Basic Objects Usage:: +* The Objects base class:: +* Creating objects:: +* Object-Oriented Programming Style:: +* Class Binding:: +* Method conveniences:: +* Classes and Scoping:: +* Object Interfaces:: +* Objects Implementation:: +* Objects Glossary:: @end menu -@node Why use word lists?, Word list examples, Word Lists, Word Lists -@subsection Why use word lists? -@cindex word lists - why use them? +Marcel Hendrix provided helpful comments on this section. Andras Zsoter +and Bernd Paysan helped me with the related works section. -There are several reasons for using multiple word lists: +@node Properties of the Objects model, Basic Objects Usage, Objects, Objects +@subsubsection Properties of the @file{objects.fs} model +@cindex @file{objects.fs} properties @itemize @bullet @item -To improve compilation speed by reducing the number of name dictionary -entries that must be searched. This is achieved by creating a new -word list that contains all of the definitions that are used in the -definition of a Forth system but which would not usually be used by -programs running on that system. That word list would be on the search -list when the Forth system was compiled but would be removed from the -search list for normal operation. This can be a useful technique for -low-performance systems (for example, 8-bit processors in embedded -systems) but is unlikely to be necessary in high-performance desktop -systems. +It is straightforward to pass objects on the stack. Passing +selectors on the stack is a little less convenient, but possible. + @item -To prevent a set of words from being used outside the context in which -they are valid. Two classic examples of this are an integrated editor -(all of the edit commands are defined in a separate word list; the -search order is set to the editor word list when the editor is invoked; -the old search order is restored when the editor is terminated) and an -integrated assembler (the op-codes for the machine are defined in a -separate word list which is used when a @code{CODE} word is defined). +Objects are just data structures in memory, and are referenced by their +address. You can create words for objects with normal defining words +like @code{constant}. Likewise, there is no difference between instance +variables that contain objects and those that contain other data. + @item -To prevent a name-space clash between multiple definitions with the same -name. For example, when building a cross-compiler you might have a word -@code{IF} that generates conditional code for your target system. By -placing this definition in a different word list you can control whether -the host system's @code{IF} or the target system's @code{IF} get used in -any particular context by controlling the order of the word lists on the -search order stack. -@end itemize +Late binding is efficient and easy to use. -@node Word list examples, ,Why use word lists?, Word Lists -@subsection Word list examples -@cindex word lists - examples +@item +It avoids parsing, and thus avoids problems with state-smartness +and reduced extensibility; for convenience there are a few parsing +words, but they have non-parsing counterparts. There are also a few +defining words that parse. This is hard to avoid, because all standard +defining words parse (except @code{:noname}); however, such +words are not as bad as many other parsing words, because they are not +state-smart. -Here is an example of creating and using a new wordlist using ANS -Standard words: +@item +It does not try to incorporate everything. It does a few things and does +them well (IMO). In particular, this model was not designed to support +information hiding (although it has features that may help); you can use +a separate package for achieving this. -@example -wordlist constant my-new-words-wordlist -: my-new-words get-order nip my-new-words-wordlist swap set-order ; +@item +It is layered; you don't have to learn and use all features to use this +model. Only a few features are necessary (@xref{Basic Objects Usage}, +@xref{The Objects base class}, @xref{Creating objects}.), the others +are optional and independent of each other. -\ add it to the search order -also my-new-words +@item +An implementation in ANS Forth is available. -\ alternatively, add it to the search order and make it -\ the compilation word list -also my-new-words definitions -\ type "order" to see the problem -@end example +@end itemize -The problem with this example is that @code{order} has no way to -associate the name @code{my-new-words} with the wid of the word list (in -Gforth, @code{order} and @code{vocs} will display @code{???} for a wid -that has no associated name). There is no Standard way of associating a -name with a wid. -In Gforth, this example can be re-coded using @code{vocabulary}, which -associates a name with a wid: +@node Basic Objects Usage, The Objects base class, Properties of the Objects model, Objects +@subsubsection Basic @file{objects.fs} Usage +@cindex basic objects usage +@cindex objects, basic usage + +You can define a class for graphical objects like this: +@cindex @code{class} usage +@cindex @code{end-class} usage +@cindex @code{selector} usage @example -vocabulary my-new-words +object class \ "object" is the parent class + selector draw ( x y graphical -- ) +end-class graphical +@end example -\ add it to the search order -my-new-words +This code defines a class @code{graphical} with an +operation @code{draw}. We can perform the operation +@code{draw} on any @code{graphical} object, e.g.: -\ alternatively, add it to the search order and make it -\ the compilation word list -my-new-words definitions -\ type "order" to see that the problem is solved +@example +100 100 t-rex draw @end example +@noindent +where @code{t-rex} is a word (say, a constant) that produces a +graphical object. -@node Environmental Queries, Files, Word Lists, Words -@section Environmental Queries -@cindex environmental queries -@comment TODO more index entries +@comment nac TODO add a 2nd operation eg perimeter.. and use for +@comment a concrete example -The ANS Standard introduced the idea of "environmental queries" as a way -for a program running on a system to determine certain characteristics of the system. -The Standard specifies a number of strings that might be recognised by a system. +@cindex abstract class +How do we create a graphical object? With the present definitions, +we cannot create a useful graphical object. The class +@code{graphical} describes graphical objects in general, but not +any concrete graphical object type (C++ users would call it an +@emph{abstract class}); e.g., there is no method for the selector +@code{draw} in the class @code{graphical}. -The Standard requires that the name space used for environmental queries -be distinct from the name space used for definitions. +For concrete graphical objects, we define child classes of the +class @code{graphical}, e.g.: -Typically, environmental queries are supported by creating a set of -definitions in a word set that is @var{only} used during environmental -queries; that is what Gforth does. There is no Standard way of adding -definitions to the set of recognised environmental queries, but any -implementation that supports the loading of optional word sets must have -some mechanism for doing this (after loading the word set, the -associated environmental query string must return @code{true}). In -Gforth, the word set used to honour environmental queries can be -manipulated just like any other word set. +@cindex @code{overrides} usage +@cindex @code{field} usage in class definition +@example +graphical class \ "graphical" is the parent class + cell% field circle-radius -doc-environment? -doc-environment-wordlist +:noname ( x y circle -- ) + circle-radius @@ draw-circle ; +overrides draw -doc-gforth -doc-os-class +:noname ( n-radius circle -- ) + circle-radius ! ; +overrides construct -Note that, whilst the documentation for (eg) @code{gforth} shows it -returning two items on the stack, querying it using @code{environment?} -will return an additional item; the @code{true} flag that shows that the -string was recognised. +end-class circle +@end example -TODO Document the standard strings or note where they are documented herein +Here we define a class @code{circle} as a child of @code{graphical}, +with field @code{circle-radius} (which behaves just like a field +(@pxref{Structures}); it defines (using @code{overrides}) new methods +for the selectors @code{draw} and @code{construct} (@code{construct} is +defined in @code{object}, the parent class of @code{graphical}). -Here are some examples of using environmental queries: +Now we can create a circle on the heap (i.e., +@code{allocate}d memory) with: +@cindex @code{heap-new} usage @example -s" address-unit-bits" environment? 0= -[IF] - cr .( environmental attribute address-units-bits unknown... ) cr -[THEN] - -s" block" environment? [IF] DROP include block.fs [THEN] - -s" gforth" environment? [IF] 2DROP include compat/vocabulary.fs [THEN] +50 circle heap-new constant my-circle +@end example -s" gforth" environment? [IF] .( Gforth version ) TYPE [ELSE] .( Not Gforth..) [THEN] +@noindent +@code{heap-new} invokes @code{construct}, thus +initializing the field @code{circle-radius} with 50. We can draw +this new circle at (100,100) with: +@example +100 100 my-circle draw @end example +@cindex selector invocation, restrictions +@cindex class definition, restrictions +Note: You can only invoke a selector if the object on the TOS +(the receiving object) belongs to the class where the selector was +defined or one of its descendents; e.g., you can invoke +@code{draw} only for objects belonging to @code{graphical} +or its descendents (e.g., @code{circle}). Immediately before +@code{end-class}, the search order has to be the same as +immediately after @code{class}. -Here is an example of adding a definition to the environment word list: +@node The Objects base class, Creating objects, Basic Objects Usage, Objects +@subsubsection The @file{object.fs} base class +@cindex @code{object} class -@example -get-current environment-wordlist set-current -true constant block -true constant block-ext -set-current -@end example +When you define a class, you have to specify a parent class. So how do +you start defining classes? There is one class available from the start: +@code{object}. It is ancestor for all classes and so is the +only class that has no parent. It has two selectors: @code{construct} +and @code{print}. -You can see what definitions are in the environment word list like this: +@node Creating objects, Object-Oriented Programming Style, The Objects base class, Objects +@subsubsection Creating objects +@cindex creating objects +@cindex object creation +@cindex object allocation options -@example -get-order 1+ environment-wordlist swap set-order words previous -@end example +@cindex @code{heap-new} discussion +@cindex @code{dict-new} discussion +@cindex @code{construct} discussion +You can create and initialize an object of a class on the heap with +@code{heap-new} ( ... class -- object ) and in the dictionary +(allocation with @code{allot}) with @code{dict-new} ( +... class -- object ). Both words invoke @code{construct}, which +consumes the stack items indicated by "..." above. +@cindex @code{init-object} discussion +@cindex @code{class-inst-size} discussion +If you want to allocate memory for an object yourself, you can get its +alignment and size with @code{class-inst-size 2@@} ( class -- +align size ). Once you have memory for an object, you can initialize +it with @code{init-object} ( ... class object -- ); +@code{construct} does only a part of the necessary work. +@node Object-Oriented Programming Style, Class Binding, Creating objects, Objects +@subsubsection Object-Oriented Programming Style +@cindex object-oriented programming style -@node Files, Including Files, Environmental Queries, Words -@section Files +This section is not exhaustive. -This chapter describes how to operate on files from Forth. +@cindex stack effects of selectors +@cindex selectors and stack effects +In general, it is a good idea to ensure that all methods for the +same selector have the same stack effect: when you invoke a selector, +you often have no idea which method will be invoked, so, unless all +methods have the same stack effect, you will not know the stack effect +of the selector invocation. -Files are opened/created by name and type. The following types are -recognised: +One exception to this rule is methods for the selector +@code{construct}. We know which method is invoked, because we +specify the class to be constructed at the same place. Actually, I +defined @code{construct} as a selector only to give the users a +convenient way to specify initialization. The way it is used, a +mechanism different from selector invocation would be more natural +(but probably would take more code and more space to explain). -doc-r/o -doc-r/w -doc-w/o -doc-bin +@node Class Binding, Method conveniences, Object-Oriented Programming Style, Objects +@subsubsection Class Binding +@cindex class binding +@cindex early binding -When a file is opened/created, it returns a file identifier, -@var{wfileid} that is used for all other file commands. All file -commands also return a status value, @var{wior}, that is 0 for a -successful operation and an implementation-defined non-zero value in the -case of an error. +@cindex late binding +Normal selector invocations determine the method at run-time depending +on the class of the receiving object. This run-time selection is called +@var{late binding}. -doc-open-file -doc-create-file +Sometimes it's preferable to invoke a different method. For example, +you might want to use the simple method for @code{print}ing +@code{object}s instead of the possibly long-winded @code{print} method +of the receiver class. You can achieve this by replacing the invocation +of @code{print} with: -doc-close-file -doc-delete-file -doc-rename-file -doc-read-file -doc-read-line -doc-write-file -doc-write-line -doc-emit-file -doc-flush-file +@cindex @code{[bind]} usage +@example +[bind] object print +@end example -doc-file-status -doc-file-position -doc-reposition-file -doc-file-size -doc-resize-file +@noindent +in compiled code or: -@node Including Files, Blocks, Files, Words -@section Including Files -@cindex including files +@cindex @code{bind} usage +@example +bind object print +@end example -@menu -* Words for Including:: -* Search Path:: -* Forth Search Paths:: -* General Search Paths:: -@end menu +@cindex class binding, alternative to +@noindent +in interpreted code. Alternatively, you can define the method with a +name (e.g., @code{print-object}), and then invoke it through the +name. Class binding is just a (often more convenient) way to achieve +the same effect; it avoids name clutter and allows you to invoke +methods directly without naming them first. -@node Words for Including, Search Path, Including Files, Including Files -@subsection Words for Including +@cindex superclass binding +@cindex parent class binding +A frequent use of class binding is this: When we define a method +for a selector, we often want the method to do what the selector does +in the parent class, and a little more. There is a special word for +this purpose: @code{[parent]}; @code{[parent] +@emph{selector}} is equivalent to @code{[bind] @emph{parent +selector}}, where @code{@emph{parent}} is the parent +class of the current class. E.g., a method definition might look like: -doc-include-file -doc-included -doc-include +@cindex @code{[parent]} usage +@example +:noname + dup [parent] foo \ do parent's foo on the receiving object + ... \ do some more +; overrides foo +@end example -Usually you want to include a file only if it is not included already -(by, say, another source file): -@comment TODO describe what happens on error. Describes how the require -@comment stuff works and describe how to clear/reset the history (eg -@comment for debug). Might want to include that in the MARKER example. +@cindex class binding as optimization +In @cite{Object-oriented programming in ANS Forth} (Forth Dimensions, +March 1997), Andrew McKewan presents class binding as an optimization +technique. I recommend not using it for this purpose unless you are in +an emergency. Late binding is pretty fast with this model anyway, so the +benefit of using class binding is small; the cost of using class binding +where it is not appropriate is reduced maintainability. -doc-required -doc-require -doc-needs +While we are at programming style questions: You should bind +selectors only to ancestor classes of the receiving object. E.g., say, +you know that the receiving object is of class @code{foo} or its +descendents; then you should bind only to @code{foo} and its +ancestors. -A definition in ANS Standard Forth for @code{required} is provided in -@file{compat/required.fs}. +@node Method conveniences, Classes and Scoping, Class Binding, Objects +@subsubsection Method conveniences +@cindex method conveniences -@cindex stack effect of included files -@cindex including files, stack effect -I recommend that you write your source files such that interpreting them -does not change the stack. This allows using these files with -@code{required} and friends without complications. E.g., +In a method you usually access the receiving object pretty often. If +you define the method as a plain colon definition (e.g., with +@code{:noname}), you may have to do a lot of stack +gymnastics. To avoid this, you can define the method with @code{m: +... ;m}. E.g., you could define the method for +@code{draw}ing a @code{circle} with +@cindex @code{this} usage +@cindex @code{m:} usage +@cindex @code{;m} usage @example -1 require foo.fs drop +m: ( x y circle -- ) + ( x y ) this circle-radius @@ draw-circle ;m @end example -@node Search Path, Forth Search Paths, Words for Including, Including Files -@subsection Search Path -@cindex path for @code{included} -@cindex file search path -@cindex include search path -@cindex search path for files - -@comment what uses these search paths.. just inc;lude and friends? -If you specify an absolute filename (i.e., a filename starting with -@file{/} or @file{~}, or with @file{:} in the second position (as in -@samp{C:...})) for @code{included} and friends, that file is included -just as you would expect. +@cindex @code{exit} in @code{m: ... ;m} +@cindex @code{exitm} discussion +@cindex @code{catch} in @code{m: ... ;m} +When this method is executed, the receiver object is removed from the +stack; you can access it with @code{this} (admittedly, in this +example the use of @code{m: ... ;m} offers no advantage). Note +that I specify the stack effect for the whole method (i.e. including +the receiver object), not just for the code between @code{m:} +and @code{;m}. You cannot use @code{exit} in +@code{m:...;m}; instead, use +@code{exitm}.@footnote{Moreover, for any word that calls +@code{catch} and was defined before loading +@code{objects.fs}, you have to redefine it like I redefined +@code{catch}: @code{: catch this >r catch r> to-this ;}} -For relative filenames, Gforth uses a search path similar to Forth's -search order (@pxref{Word Lists}). It tries to find the given filename in -the directories present in the path, and includes the first one it -finds. +@cindex @code{inst-var} usage +You will frequently use sequences of the form @code{this +@emph{field}} (in the example above: @code{this +circle-radius}). If you use the field only in this way, you can +define it with @code{inst-var} and eliminate the +@code{this} before the field name. E.g., the @code{circle} +class above could also be defined with: -If the search path contains the directory @file{.} (as it should), this -refers to the directory that the present file was @code{included} -from. This allows files to include other files relative to their own -position (irrespective of the current working directory or the absolute -position). This feature is essential for libraries consisting of -several files, where a file may include other files from the library. -It corresponds to @code{#include "..."} in C. If the current input -source is not a file, @file{.} refers to the directory of the innermost -file being included, or, if there is no file being included, to the -current working directory. +@example +graphical class + cell% inst-var radius -Use @file{~+} to refer to the current working directory (as in the -@code{bash}). +m: ( x y circle -- ) + radius @@ draw-circle ;m +overrides draw -If the filename starts with @file{./}, the search path is not searched -(just as with absolute filenames), and the @file{.} has the same meaning -as described above. +m: ( n-radius circle -- ) + radius ! ;m +overrides construct -@node Forth Search Paths, General Search Paths, Search Path, Including Files -@subsection Forth Search Paths -@cindex search path control - forth +end-class circle +@end example -The search path is initialized when you start Gforth (@pxref{Invoking -Gforth}). You can display it with +@code{radius} can only be used in @code{circle} and its +descendent classes and inside @code{m:...;m}. -doc-.fpath +@cindex @code{inst-value} usage +You can also define fields with @code{inst-value}, which is +to @code{inst-var} what @code{value} is to +@code{variable}. You can change the value of such a field with +@code{[to-inst]}. E.g., we could also define the class +@code{circle} like this: -You can change it later with the following words: +@example +graphical class + inst-value radius -doc-fpath+ -doc-fpath= - -Using fpath and require would look like: +m: ( x y circle -- ) + radius draw-circle ;m +overrides draw -@example -fpath= /usr/lib/forth/|./ +m: ( n-radius circle -- ) + [to-inst] radius ;m +overrides construct -require timer.fs +end-class circle @end example -If you have the need to look for a file in the Forth search path, you could -use this Gforth feature in your application: -doc-open-fpath-file +@node Classes and Scoping, Object Interfaces, Method conveniences, Objects +@subsubsection Classes and Scoping +@cindex classes and scoping +@cindex scoping and classes -@node General Search Paths, , Forth Search Paths, Including Files -@subsection General Search Paths -@cindex search path control - for user applications +Inheritance is frequent, unlike structure extension. This exacerbates +the problem with the field name convention (@pxref{Structure Naming +Convention}): One always has to remember in which class the field was +originally defined; changing a part of the class structure would require +changes for renaming in otherwise unaffected code. -Your application may need to search files in sevaral directories, like -@code{included} does. For this purpose you can define and use your own -search paths. Create a search path like this: +@cindex @code{inst-var} visibility +@cindex @code{inst-value} visibility +To solve this problem, I added a scoping mechanism (which was not in my +original charter): A field defined with @code{inst-var} (or +@code{inst-value}) is visible only in the class where it is defined and in +the descendent classes of this class. Using such fields only makes +sense in @code{m:}-defined methods in these classes anyway. -@example -\ Make a buffer for the path: -create mypath 100 chars , \ maximum length (is checked) - 0 , \ real len - 100 chars allot \ space for path -@end example +This scoping mechanism allows us to use the unadorned field name, +because name clashes with unrelated words become much less likely. -You have the same functions for the forth search path in a generic version -for different paths. +@cindex @code{protected} discussion +@cindex @code{private} discussion +Once we have this mechanism, we can also use it for controlling the +visibility of other words: All words defined after +@code{protected} are visible only in the current class and its +descendents. @code{public} restores the compilation +(i.e. @code{current}) word list that was in effect before. If you +have several @code{protected}s without an intervening +@code{public} or @code{set-current}, @code{public} +will restore the compilation word list in effect before the first of +these @code{protected}s. -Gforth also provides generic equivalents of the Forth search path words: +@node Object Interfaces, Objects Implementation, Classes and Scoping, Objects +@subsubsection Object Interfaces +@cindex object interfaces +@cindex interfaces for objects -doc-.path -doc-path+ -doc-path= -doc-open-path-file +In this model you can only call selectors defined in the class of the +receiving objects or in one of its ancestors. If you call a selector +with a receiving object that is not in one of these classes, the +result is undefined; if you are lucky, the program crashes +immediately. +@cindex selectors common to hardly-related classes +Now consider the case when you want to have a selector (or several) +available in two classes: You would have to add the selector to a +common ancestor class, in the worst case to @code{object}. You +may not want to do this, e.g., because someone else is responsible for +this ancestor class. -@node Blocks, Other I/O, Including Files, Words -@section Blocks +The solution for this problem is interfaces. An interface is a +collection of selectors. If a class implements an interface, the +selectors become available to the class and its descendents. A class +can implement an unlimited number of interfaces. For the problem +discussed above, we would define an interface for the selector(s), and +both classes would implement the interface. -This chapter describes how to use block files within Gforth. +As an example, consider an interface @code{storage} for +writing objects to disk and getting them back, and a class +@code{foo} that implements it. The code would look like this: -Block files are traditionally means of data and source storage in -Forth. They have been very important in resource-starved computers -without OS in the past. Gforth doesn't encourage to use blocks as -source, and provides blocks only for backward compatibility. The ANS -standard requires blocks to be available when files are. +@cindex @code{interface} usage +@cindex @code{end-interface} usage +@cindex @code{implementation} usage +@example +interface + selector write ( file object -- ) + selector read1 ( file object -- ) +end-interface storage -@comment TODO what about errors on open-blocks? -doc-open-blocks -doc-use -doc-scr -doc-blk -doc-get-block-fid -doc-block-position -doc-update -doc-save-buffers -doc-save-buffer -doc-empty-buffers -doc-empty-buffer -doc-flush -doc-get-buffer -doc---block-block -doc-buffer -doc-updated? -doc-list -doc-load -doc-thru -doc-+load -doc-+thru -doc---block---> -doc-block-included +bar class + storage implementation -@node Other I/O, Programming Tools, Blocks, Words -@section Other I/O -@comment TODO more index entries +... overrides write +... overrides read +... +end-class foo +@end example -@menu -* Simple numeric output:: Predefined formats -* Formatted numeric output:: Formatted (pictured) output -* String Formats:: How Forth stores strings in memory -* Displaying characters and strings:: Other stuff -* Input:: Input -@end menu +@noindent +(I would add a word @code{read} @var{( file -- object )} that uses +@code{read1} internally, but that's beyond the point illustrated +here.) -@node Simple numeric output, Formatted numeric output, Other I/O, Other I/O -@subsection Simple numeric output -@cindex Simple numeric output -@comment TODO more index entries +Note that you cannot use @code{protected} in an interface; and +of course you cannot define fields. -The simplest output functions are those that display numbers from the -data or floating-point stacks. Floating-point output is always displayed -using base 10. Numbers displayed from the data stack use the value stored -in @code{base}. +In the Neon model, all selectors are available for all classes; +therefore it does not need interfaces. The price you pay in this model +is slower late binding, and therefore, added complexity to avoid late +binding. -doc-. -doc-dec. -doc-hex. -doc-u. -doc-.r -doc-u.r -doc-d. -doc-ud. -doc-d.r -doc-ud.r -doc-f. -doc-fe. -doc-fs. +@node Objects Implementation, Objects Glossary, Object Interfaces, Objects +@subsubsection @file{objects.fs} Implementation +@cindex @file{objects.fs} implementation -Examples of printing the number 1234.5678E23 in the different floating-point output -formats are shown below: +@cindex @code{object-map} discussion +An object is a piece of memory, like one of the data structures +described with @code{struct...end-struct}. It has a field +@code{object-map} that points to the method map for the object's +class. + +@cindex method map +@cindex virtual function table +The @emph{method map}@footnote{This is Self terminology; in C++ +terminology: virtual function table.} is an array that contains the +execution tokens (@var{xt}s) of the methods for the object's class. Each +selector contains an offset into a method map. + +@cindex @code{selector} implementation, class +@code{selector} is a defining word that uses +@code{CREATE} and @code{DOES>}. The body of the +selector contains the offset; the @code{does>} action for a +class selector is, basically: @example -f. 123456779999999000000000000. -fe. 123.456779999999E24 -fs. 1.23456779999999E26 +( object addr ) @@ over object-map @@ + @@ execute @end example +Since @code{object-map} is the first field of the object, it +does not generate any code. As you can see, calling a selector has a +small, constant cost. -@node Formatted numeric output, String Formats, Simple numeric output, Other I/O -@subsection Formatted numeric output -@cindex Formatted numeric output -@cindex pictured numeric output -@comment TODO more index entries +@cindex @code{current-interface} discussion +@cindex class implementation and representation +A class is basically a @code{struct} combined with a method +map. During the class definition the alignment and size of the class +are passed on the stack, just as with @code{struct}s, so +@code{field} can also be used for defining class +fields. However, passing more items on the stack would be +inconvenient, so @code{class} builds a data structure in memory, +which is accessed through the variable +@code{current-interface}. After its definition is complete, the +class is represented on the stack by a pointer (e.g., as parameter for +a child class definition). -Forth traditionally uses a technique called @var{pictured numeric -output} for formatted printing of integers. In this technique, -digits are extracted from the number (using the current output radix -defined by @code{base}), converted to ASCII codes and appended to a -string that is built in a scratch-pad area of memory -(@pxref{core-idef,Implementation-defined options}). During the extraction -sequence, other arbitrary characters can be appended to the string. The -completed string is specified by an address and length and can -be manipulated (@code{TYPE}ed, copied, modified) under program control. +A new class starts off with the alignment and size of its parent, +and a copy of the parent's method map. Defining new fields extends the +size and alignment; likewise, defining new selectors extends the +method map. @code{overrides} just stores a new @var{xt} in the method +map at the offset given by the selector. -All of the words described in the previous section for simple numeric -output are implemented in Gforth using pictured numeric output. +@cindex class binding, implementation +Class binding just gets the @var{xt} at the offset given by the selector +from the class's method map and @code{compile,}s (in the case of +@code{[bind]}) it. -Three important things to remember about Pictured Numeric Output: +@cindex @code{this} implementation +@cindex @code{catch} and @code{this} +@cindex @code{this} and @code{catch} +I implemented @code{this} as a @code{value}. At the +start of an @code{m:...;m} method the old @code{this} is +stored to the return stack and restored at the end; and the object on +the TOS is stored @code{TO this}. This technique has one +disadvantage: If the user does not leave the method via +@code{;m}, but via @code{throw} or @code{exit}, +@code{this} is not restored (and @code{exit} may +crash). To deal with the @code{throw} problem, I have redefined +@code{catch} to save and restore @code{this}; the same +should be done with any word that can catch an exception. As for +@code{exit}, I simply forbid it (as a replacement, there is +@code{exitm}). -@itemize @bullet -@item -It always operates on double-precision numbers; to display a single-precision number, -convert it first (@pxref{Double precision} for ways of doing this). -@item -It always treats the double-precision number as though it were unsigned. Refer to -the examples below for ways of printing signed numbers. -@item -The string is built up from right to left; least significant digit first. -@end itemize +@cindex @code{inst-var} implementation +@code{inst-var} is just the same as @code{field}, with +a different @code{DOES>} action: +@example +@@ this + +@end example +Similar for @code{inst-value}. -doc-<# -doc-# -doc-#s -doc-hold -doc-sign -doc-#> +@cindex class scoping implementation +Each class also has a word list that contains the words defined with +@code{inst-var} and @code{inst-value}, and its protected +words. It also has a pointer to its parent. @code{class} pushes +the word lists of the class and all its ancestors onto the search order stack, +and @code{end-class} drops them. -doc-represent +@cindex interface implementation +An interface is like a class without fields, parent and protected +words; i.e., it just has a method map. If a class implements an +interface, its method map contains a pointer to the method map of the +interface. The positive offsets in the map are reserved for class +methods, therefore interface map pointers have negative +offsets. Interfaces have offsets that are unique throughout the +system, unlike class selectors, whose offsets are only unique for the +classes where the selector is available (invokable). -Here are some examples of using pictured numeric output: +This structure means that interface selectors have to perform one +indirection more than class selectors to find their method. Their body +contains the interface map pointer offset in the class method map, and +the method offset in the interface method map. The +@code{does>} action for an interface selector is, basically: @example -: my-u. ( u -- ) - \ Simplest use of pns.. behaves like Standard u. - 0 \ convert to unsigned double - <# \ start conversion - #s \ convert all digits - #> \ complete conversion - TYPE SPACE ; \ display, with trailing space +( object selector-body ) +2dup selector-interface @@ ( object selector-body object interface-offset ) +swap object-map @@ + @@ ( object selector-body map ) +swap selector-offset @@ + @@ execute +@end example -: cents-only ( u -- ) - 0 \ convert to unsigned double - <# \ start conversion - # # \ convert two least-significant digits - #> \ complete conversion, discard other digits - TYPE SPACE ; \ display, with trailing space +where @code{object-map} and @code{selector-offset} are +first fields and generate no code. -: dollars-and-cents ( u -- ) - 0 \ convert to unsigned double - <# \ start conversion - # # \ convert two least-significant digits - [char] . hold \ insert decimal point - #s \ convert remaining digits - [char] $ hold \ append currency symbol - #> \ complete conversion - TYPE SPACE ; \ display, with trailing space +As a concrete example, consider the following code: -: my-. ( n -- ) - \ handling negatives.. behaves like Standard . - s>d \ convert to signed double - swap over dabs \ leave sign byte followed by unsigned double - <# \ start conversion - #s \ convert all digits - rot sign \ get at sign byte, append "-" if needed - #> \ complete conversion - TYPE SPACE ; \ display, with trailing space +@example +interface + selector if1sel1 + selector if1sel2 +end-interface if1 -: account. ( n -- ) - \ accountants don't like minus signs, they use braces - \ for negative numbers - s>d \ convert to signed double - swap over dabs \ leave sign byte followed by unsigned double - <# \ start conversion - 2 pick \ get copy of sign byte - 0< IF [char] ) hold THEN \ right-most character of output - #s \ convert all digits - rot \ get at sign byte - 0< IF [char] ( hold THEN - #> \ complete conversion - TYPE SPACE ; \ display, with trailing space -@end example +object class + if1 implementation + selector cl1sel1 + cell% inst-var cl1iv1 -Here are some examples of using these words: +' m1 overrides construct +' m2 overrides if1sel1 +' m3 overrides if1sel2 +' m4 overrides cl1sel2 +end-class cl1 -@example -1 my-u. 1 -hex -1 my-u. decimal FFFFFFFF -1 cents-only 01 -1234 cents-only 34 -2 dollars-and-cents $0.02 -1234 dollars-and-cents $12.34 -123 my-. 123 --123 my. -123 -123 account. 123 --456 account. (456) +create obj1 object dict-new drop +create obj2 cl1 dict-new drop @end example +The data structure created by this code (including the data structure +for @code{object}) is shown in the figure, assuming a cell size of 4. +@comment nac TODO add this diagram.. + +@node Objects Glossary, , Objects Implementation, Objects +@subsubsection @file{objects.fs} Glossary +@cindex @file{objects.fs} Glossary + +doc---objects-bind +doc---objects- +doc---objects-bind' +doc---objects-[bind] +doc---objects-class +doc---objects-class->map +doc---objects-class-inst-size +doc---objects-class-override! +doc---objects-construct +doc---objects-current' +doc---objects-[current] +doc---objects-current-interface +doc---objects-dict-new +doc---objects-drop-order +doc---objects-end-class +doc---objects-end-class-noname +doc---objects-end-interface +doc---objects-end-interface-noname +doc---objects-exitm +doc---objects-heap-new +doc---objects-implementation +doc---objects-init-object +doc---objects-inst-value +doc---objects-inst-var +doc---objects-interface +doc---objects-;m +doc---objects-m: +doc---objects-method +doc---objects-object +doc---objects-overrides +doc---objects-[parent] +doc---objects-print +doc---objects-protected +doc---objects-public +doc---objects-push-order +doc---objects-selector +doc---objects-this +doc---objects- +doc---objects-[to-inst] +doc---objects-to-this +doc---objects-xt-new + +@c ------------------------------------------------------------- +@node OOF, Mini-OOF, Objects, Object-oriented Forth +@subsection The @file{oof.fs} model +@cindex oof +@cindex object-oriented programming + +@cindex @file{objects.fs} +@cindex @file{oof.fs} + +This section describes the @file{oof.fs} package. + +The package described in this section has been used in bigFORTH since 1991, and +used for two large applications: a chromatographic system used to +create new medicaments, and a graphic user interface library (MINOS). + +You can find a description (in German) of @file{oof.fs} in @cite{Object +oriented bigFORTH} by Bernd Paysan, published in @cite{Vierte Dimension} +10(2), 1994. + +@menu +* Properties of the OOF model:: +* Basic OOF Usage:: +* The OOF base class:: +* Class Declaration:: +* Class Implementation:: +@end menu + +@node Properties of the OOF model, Basic OOF Usage, OOF, OOF +@subsubsection Properties of the @file{oof.fs} model +@cindex @file{oof.fs} properties + +@itemize @bullet +@item +This model combines object oriented programming with information +hiding. It helps you writing large application, where scoping is +necessary, because it provides class-oriented scoping. -@node String Formats, Displaying characters and strings, Formatted numeric output, Other I/O -@subsection String Formats -@cindex string formats +@item +Named objects, object pointers, and object arrays can be created, +selector invocation uses the ``object selector'' syntax. Selector invocation +to objects and/or selectors on the stack is a bit less convenient, but +possible. -@comment TODO more index entries +@item +Selector invocation and instance variable usage of the active object is +straightforward, since both make use of the active object. -Forth commonly uses two different methods for representing a string: +@item +Late binding is efficient and easy to use. -@itemize @bullet @item -@cindex address of counted string -As a @var{counted string}, represented by a c-addr. The char addressed -by c-addr contains a character-count, n, of the string and the string -occupies the subsequent n char addresses in memory. +State-smart objects parse selectors. However, extensibility is provided +using a (parsing) selector @code{postpone} and a selector @code{'}. + @item -As cell pair on the stack; c-addr u, where u is the length of the string -in characters, and c-addr is the address of the first byte of the string. +An implementation in ANS Forth is available. + @end itemize -The ANS Forth Standard encourages the use of the second format when -representing strings on the stack, whilst conceeding that the counted -string format remains useful as a way of storing strings in memory. -doc-count +@node Basic OOF Usage, The OOF base class, Properties of the OOF model, OOF +@subsubsection Basic @file{oof.fs} Usage +@cindex @file{oof.fs} usage -@xref{Memory Blocks} for words that move, copy and search -for strings. @xref{Displaying characters and strings,} for words that -display characters and strings. +This section uses the same example as for @code{objects} (@pxref{Basic Objects Usage}). +You can define a class for graphical objects like this: -@node Displaying characters and strings, Input, String Formats, Other I/O -@subsection Displaying characters and strings -@cindex displaying characters and strings -@cindex compiling characters and strings -@cindex cursor control +@cindex @code{class} usage +@cindex @code{class;} usage +@cindex @code{method} usage +@example +object class graphical \ "object" is the parent class + method draw ( x y graphical -- ) +class; +@end example -@comment TODO more index entries +This code defines a class @code{graphical} with an +operation @code{draw}. We can perform the operation +@code{draw} on any @code{graphical} object, e.g.: -This section starts with a glossary of Forth words and ends with a set -of examples. +@example +100 100 t-rex draw +@end example -doc-bl -doc-space -doc-spaces -doc-emit -doc-toupper -doc-." -doc-.( -doc-type -doc-cr -doc-at-xy -doc-page -doc-s" -doc-c" -doc-char -doc-[char] -doc-sliteral +@noindent +where @code{t-rex} is an object or object pointer, created with e.g. +@code{graphical : t-rex}. -As an example, consider the following text, stored in a file @file{test.fs}: +@cindex abstract class +How do we create a graphical object? With the present definitions, +we cannot create a useful graphical object. The class +@code{graphical} describes graphical objects in general, but not +any concrete graphical object type (C++ users would call it an +@emph{abstract class}); e.g., there is no method for the selector +@code{draw} in the class @code{graphical}. + +For concrete graphical objects, we define child classes of the +class @code{graphical}, e.g.: @example -.( text-1) -: my-word - ." text-2" cr - .( text-3) -; +graphical class circle \ "graphical" is the parent class + cell var circle-radius +how: + : draw ( x y -- ) + circle-radius @@ draw-circle ; -." text-4" + : init ( n-radius -- ( + circle-radius ! ; +class; +@end example -: my-char - [char] ALPHABET emit - char emit -; +Here we define a class @code{circle} as a child of @code{graphical}, +with a field @code{circle-radius}; it defines new methods for the +selectors @code{draw} and @code{init} (@code{init} is defined in +@code{object}, the parent class of @code{graphical}). + +Now we can create a circle in the dictionary with: + +@example +50 circle : my-circle @end example -When you load this code into Gforth, the following output is generated: +@noindent +@code{:} invokes @code{init}, thus initializing the field +@code{circle-radius} with 50. We can draw this new circle at (100,100) +with: @example -@kbd{include test.fs} text-1text-3text-4 ok +100 100 my-circle draw @end example +@cindex selector invocation, restrictions +@cindex class definition, restrictions +Note: You can only invoke a selector if the receiving object belongs to +the class where the selector was defined or one of its descendents; +e.g., you can invoke @code{draw} only for objects belonging to +@code{graphical} or its descendents (e.g., @code{circle}). The scoping +mechanism will check if you try to invoke a selector that is not +defined in this class hierarchy, so you'll get an error at compilation +time. + + +@node The OOF base class, Class Declaration, Basic OOF Usage, OOF +@subsubsection The @file{oof.fs} base class +@cindex @file{oof.fs} base class + +When you define a class, you have to specify a parent class. So how do +you start defining classes? There is one class available from the start: +@code{object}. You have to use it as ancestor for all classes. It is the +only class that has no parent. Classes are also objects, except that +they don't have instance variables; class manipulation such as +inheritance or changing definitions of a class is handled through +selectors of the class @code{object}. + +@code{object} provides a number of selectors: + @itemize @bullet @item -Messages @code{text-1} and @code{text-3} are displayed because @code{.(} -is an immediate word; it behaves in the same way whether it is used inside -or outside a colon definition. -@item -Message @code{text-4} is displayed because of Gforth's added interpretation -semantics for @code{."}. +@code{class} for subclassing, @code{definitions} to add definitions +later on, and @code{class?} to get type informations (is the class a +subclass of the class passed on the stack?). +doc---object-class +doc---object-definitions +doc---object-class? + @item -Message @code{text-2} is @var{not} displayed, because the text interpreter -performs the compilation semantics for @code{."} within the definition of -@code{my-word}. -@end itemize +@code{init} and @code{dispose} as constructor and destructor of the +object. @code{init} is invocated after the object's memory is allocated, +while @code{dispose} also handles deallocation. Thus if you redefine +@code{dispose}, you have to call the parent's dispose with @code{super +dispose}, too. +doc---object-init +doc---object-dispose -Here are some examples of executing @code{my-word} and @code{my-char}: +@item +@code{new}, @code{new[]}, @code{:}, @code{ptr}, @code{asptr}, and +@code{[]} to create named and unnamed objects and object arrays or +object pointers. +doc---object-new +doc---object-new[] +doc---object-: +doc---object-ptr +doc---object-asptr +doc---object-[] -@example -my-word text-2 - ok -@kbd{my-char fred} Af ok -@kbd{my-char jim} Aj ok -@end example +@item +@code{::} and @code{super} for explicit scoping. You should use explicit +scoping only for super classes or classes with the same set of instance +variables. Explicitly-scoped selectors use early binding. +doc---object-:: +doc---object-super -@itemize @bullet @item -Message @code{text-2} is displayed because of the run-time behaviour of -@code{."}. +@code{self} to get the address of the object +doc---object-self + @item -@code{[char]} compiles the "A" from "ALPHABET" and puts its display code -on the stack at run-time. @code{emit} always displays the character -when @code{my-char} is executed. +@code{bind}, @code{bound}, @code{link}, and @code{is} to assign object +pointers and instance defers. +doc---object-bind +doc---object-bound +doc---object-link +doc---object-is + @item -@code{char} parses a string at run-time and the second @code{emit} displays -the first character of the string. +@code{'} to obtain selector tokens, @code{send} to invocate selectors +form the stack, and @code{postpone} to generate selector invocation code. +doc---object-' +doc---object-postpone + @item -If you type @code{see my-char} you can see that @code{[char]} discarded -the text "LPHABET" and only compiled the display code for "A" into the -definition of @code{my-char}. +@code{with} and @code{endwith} to select the active object from the +stack, and enable its scope. Using @code{with} and @code{endwith} +also allows you to create code using selector @code{postpone} without being +trapped by the state-smart objects. +doc---object-with +doc---object-endwith + @end itemize +@node Class Declaration, Class Implementation, The OOF base class, OOF +@subsubsection Class Declaration +@cindex class declaration + +@itemize @bullet +@item +Instance variables +doc---oof-var +@item +Object pointers +doc---oof-ptr +doc---oof-asptr -@node Input, , Displaying characters and strings, Other I/O -@subsection Input -@cindex Input -@comment TODO more index entries +@item +Instance defers +doc---oof-defer -Blah on traditional and recommended string formats. +@item +Method selectors +doc---oof-early +doc---oof-method -doc--trailing -doc-/string -doc-convert -doc->number -doc->float -doc-accept -doc-query -doc-expect -doc-evaluate -doc-key -doc-key? +@item +Class-wide variables +doc---oof-static -TODO reference the block move stuff elsewhere +@item +End declaration +doc---oof-how: +doc---oof-class; -TODO convert and >number might be better in the numeric input section. +@end itemize -TODO maybe some of these shouldn't be here but should be in a "parsing" section +@c ------------------------------------------------------------- +@node Class Implementation, , Class Declaration, OOF +@subsubsection Class Implementation +@cindex class implementation +@c ------------------------------------------------------------- +@node Mini-OOF, Comparison with other object models, OOF, Object-oriented Forth +@subsection The @file{mini-oof.fs} model +@cindex mini-oof -@node Programming Tools, Assembler and Code Words, Other I/O, Words -@section Programming Tools -@cindex programming tools +Gforth's third object oriented Forth package is a 12-liner. It uses a +mixture of the @file{object.fs} and the @file{oof.fs} syntax, +and reduces to the bare minimum of features. This is based on a posting +of Bernd Paysan in comp.arch. @menu -* Debugging:: Simple and quick. -* Assertions:: Making your programs self-checking. -* Singlestep Debugger:: Executing your program word by word. +* Basic Mini-OOF Usage:: +* Mini-OOF Example:: +* Mini-OOF Implementation:: @end menu -@node Debugging, Assertions, Programming Tools, Programming Tools -@subsection Debugging -@cindex debugging - -Languages with a slow edit/compile/link/test development loop tend to -require sophisticated tracing/stepping debuggers to facilate -productive debugging. +@c ------------------------------------------------------------- +@node Basic Mini-OOF Usage, Mini-OOF Example, , Mini-OOF +@subsubsection Basic @file{mini-oof.fs} Usage +@cindex mini-oof usage -A much better (faster) way in fast-compiling languages is to add -printing code at well-selected places, let the program run, look at -the output, see where things went wrong, add more printing code, etc., -until the bug is found. +There is a base class (@code{class}, which allocates one cell +for the object pointer) plus seven other words: to define a method, a +variable, a class; to end a class, to resolve binding, to allocate an +object and to compile a class method. +@comment TODO better description of the last one -The simple debugging aids provided in @file{debugs.fs} -are meant to support this style of debugging. In addition, there are -words for non-destructively inspecting the stack and memory: +doc-object +doc-method +doc-var +doc-class +doc-end-class +doc-defines +doc-new +doc-:: -doc-.s -doc-f.s -There is a word @code{.r} but it does @var{not} display the return -stack! It is used for formatted numeric output. +@c ------------------------------------------------------------- +@node Mini-OOF Example, Mini-OOF Implementation, Basic Mini-OOF Usage, Mini-OOF +@subsubsection Mini-OOF Example +@cindex mini-oof example -doc-depth -doc-fdepth -doc-clearstack -doc-? -doc-dump +A short example shows how to use this package. This example, in slightly +extended form, is supplied as @file{moof-exm.fs} +@comment nac TODO could flesh this out with some comments from the Forthwrite article -The word @code{~~} prints debugging information (by default the source -location and the stack contents). It is easy to insert. If you use Emacs -it is also easy to remove (@kbd{C-x ~} in the Emacs Forth mode to -query-replace them with nothing). The deferred words -@code{printdebugdata} and @code{printdebugline} control the output of -@code{~~}. The default source location output format works well with -Emacs' compilation mode, so you can step through the program at the -source level using @kbd{C-x `} (the advantage over a stepping debugger -is that you can step in any direction and you know where the crash has -happened or where the strange data has occurred). +@example +object class + method init + method draw +end-class graphical +@end example -Note that the default actions clobber the contents of the pictured -numeric output string, so you should not use @code{~~}, e.g., between -@code{<#} and @code{#>}. +This code defines a class @code{graphical} with an +operation @code{draw}. We can perform the operation +@code{draw} on any @code{graphical} object, e.g.: -doc-~~ -doc-printdebugdata -doc-printdebugline +@example +100 100 t-rex draw +@end example -doc-see -doc-marker +where @code{t-rex} is an object or object pointer, created with e.g. +@code{graphical new Constant t-rex}. -Here's an example of using @code{marker} at the start of a source file -that you are debugging; it ensures that you only ever have one copy of -the file's definitions compiled at any time: +For concrete graphical objects, we define child classes of the +class @code{graphical}, e.g.: @example -[IFDEF] my-code - my-code -[ENDIF] - -marker my-code +graphical class + cell var circle-radius +end-class circle \ "graphical" is the parent class -\ .. definitions start here -\ . -\ . -\ end +:noname ( x y -- ) + circle-radius @@ draw-circle ; circle defines draw +:noname ( r -- ) + circle-radius ! ; circle defines init @end example - - -@node Assertions, Singlestep Debugger, Debugging, Programming Tools -@subsection Assertions -@cindex assertions - -It is a good idea to make your programs self-checking, in particular, if -you use an assumption (e.g., that a certain field of a data structure is -never zero) that may become wrong during maintenance. Gforth supports -assertions for this purpose. They are used like this: +There is no implicit init method, so we have to define one. The creation +code of the object now has to call init explicitely. @example -assert( @var{flag} ) +circle new Constant my-circle +50 my-circle init @end example -The code between @code{assert(} and @code{)} should compute a flag, that -should be true if everything is alright and false otherwise. It should -not change anything else on the stack. The overall stack effect of the -assertion is @code{( -- )}. E.g. +It is also possible to add a function to create named objects with +automatic call of @code{init}, given that all objects have @code{init} +on the same place: @example -assert( 1 1 + 2 = ) \ what we learn in school -assert( dup 0<> ) \ assert that the top of stack is not zero -assert( false ) \ this code should not be reached +: new: ( .. o "name" -- ) + new dup Constant init ; +80 circle new: large-circle @end example -The need for assertions is different at different times. During -debugging, we want more checking, in production we sometimes care more -for speed. Therefore, assertions can be turned off, i.e., the assertion -becomes a comment. Depending on the importance of an assertion and the -time it takes to check it, you may want to turn off some assertions and -keep others turned on. Gforth provides several levels of assertions for -this purpose: +We can draw this new circle at (100,100) with: -doc-assert0( -doc-assert1( -doc-assert2( -doc-assert3( -doc-assert( -doc-) +@example +100 100 my-circle draw +@end example -@code{Assert(} is the same as @code{assert1(}. The variable -@code{assert-level} specifies the highest assertions that are turned -on. I.e., at the default @code{assert-level} of one, @code{assert0(} and -@code{assert1(} assertions perform checking, while @code{assert2(} and -@code{assert3(} assertions are treated as comments. - -Note that the @code{assert-level} is evaluated at compile-time, not at -run-time. I.e., you cannot turn assertions on or off at run-time, you -have to set the @code{assert-level} appropriately before compiling a -piece of code. You can compile several pieces of code at several -@code{assert-level}s (e.g., a trusted library at level 1 and newly -written code at level 3). +@node Mini-OOF Implementation, , Mini-OOF Example, Mini-OOF +@subsubsection @file{mini-oof.fs} Implementation -doc-assert-level +Object-oriented systems with late binding typically use a +``vtable''-approach: the first variable in each object is a pointer to a +table, which contains the methods as function pointers. The vtable +may also contain other information. -If an assertion fails, a message compatible with Emacs' compilation mode -is produced and the execution is aborted (currently with @code{ABORT"}. -If there is interest, we will introduce a special throw code. But if you -intend to @code{catch} a specific condition, using @code{throw} is -probably more appropriate than an assertion). +So first, let's declare methods: -Definitions in ANS Standard Forth for these assertion words are provided -in @file{compat/assert.fs}. +@example +: method ( m v -- m' v ) Create over , swap cell+ swap + DOES> ( ... o -- ... ) @ over @ + @ execute ; +@end example +During method declaration, the number of methods and instance +variables is on the stack (in address units). @code{method} creates +one method and increments the method number. To execute a method, it +takes the object, fetches the vtable pointer, adds the offset, and +executes the @var{xt} stored there. Each method takes the object it is +invoked from as top of stack parameter. The method itself should +consume that object. -@node Singlestep Debugger, , Assertions, Programming Tools -@subsection Singlestep Debugger -@cindex singlestep Debugger -@cindex debugging Singlestep -@cindex @code{dbg} -@cindex @code{BREAK:} -@cindex @code{BREAK"} +Now, we also have to declare instance variables -When a new word is created there's often the need to check whether it behaves -correctly or not. You can do this by typing @code{dbg badword}. +@example +: var ( m v size -- m v' ) Create over , + + DOES> ( o -- addr ) @ + ; +@end example -doc-dbg +As before, a word is created with the current offset. Instance +variables can have different sizes (cells, floats, doubles, chars), so +all we do is take the size and add it to the offset. If your machine +has alignment restrictions, put the proper @code{aligned} or +@code{faligned} before the variable, to adjust the variable +offset. That's why it is on the top of stack. -This might look like: +We need a starting point (the base object) and some syntactic sugar: @example -: badword 0 DO i . LOOP ; ok -2 dbg badword -: badword -Scanning code... +Create object 1 cells , 2 cells , +: class ( class -- class methods vars ) dup 2@ ; +@end example -Nesting debugger ready! +For inheritance, the vtable of the parent object has to be +copied when a new, derived class is declared. This gives all the +methods of the parent class, which can be overridden, though. -400D4738 8049BC4 0 -> [ 2 ] 00002 00000 -400D4740 8049F68 DO -> [ 0 ] -400D4744 804A0C8 i -> [ 1 ] 00000 -400D4748 400C5E60 . -> 0 [ 0 ] -400D474C 8049D0C LOOP -> [ 0 ] -400D4744 804A0C8 i -> [ 1 ] 00001 -400D4748 400C5E60 . -> 1 [ 0 ] -400D474C 8049D0C LOOP -> [ 0 ] -400D4758 804B384 ; -> ok +@example +: end-class ( class methods vars -- ) + Create here >r , dup , 2 cells ?DO ['] noop , 1 cells +LOOP + cell+ dup cell+ r> rot @ 2 cells /string move ; @end example -Each line displayed is one step. You always have to hit return to -execute the next word that is displayed. If you don't want to execute -the next word in a whole, you have to type @kbd{n} for @code{nest}. Here is -an overview what keys are available: +The first line creates the vtable, initialized with +@code{noop}s. The second line is the inheritance mechanism, it +copies the xts from the parent vtable. -@table @i +We still have no way to define new methods, let's do that now: -@item -Next; Execute the next word. +@example +: defines ( xt class -- ) ' >body @ + ! ; +@end example -@item n -Nest; Single step through next word. +To allocate a new object, we need a word, too: -@item u -Unnest; Stop debugging and execute rest of word. If we got to this word -with nest, continue debugging with the calling word. +@example +: new ( class -- o ) here over @ allot swap over ! ; +@end example -@item d -Done; Stop debugging and execute rest. +Sometimes derived classes want to access the method of the +parent object. There are two ways to achieve this with Mini-OOF: +first, you could use named words, and second, you could look up the +vtable of the parent object. -@item s -Stopp; Abort immediately. +@example +: :: ( class "name" -- ) ' >body @ + @ compile, ; +@end example -@end table -Debugging large application with this mechanism is very difficult, because -you have to nest very deep into the program before the interesting part -begins. This takes a lot of time. +Nothing can be more confusing than a good example, so here is +one. First let's declare a text object (called +@code{button}), that stores text and position: -To do it more directly put a @code{BREAK:} command into your source code. -When program execution reaches @code{BREAK:} the single step debugger is -invoked and you have all the features described above. +@example +object class + cell var text + cell var len + cell var x + cell var y + method init + method draw +end-class button +@end example -If you have more than one part to debug it is useful to know where the -program has stopped at the moment. You can do this by the -@code{BREAK" string"} command. This behaves like @code{BREAK:} except that -string is typed out when the ``breakpoint'' is reached. +@noindent +Now, implement the two methods, @code{draw} and @code{init}: -@node Assembler and Code Words, Threading Words, Programming Tools, Words -@section Assembler and Code Words -@cindex assembler -@cindex code words +@example +:noname ( o -- ) + >r r@ x @ r@ y @ at-xy r@ text @ r> len @ type ; + button defines draw +:noname ( addr u o -- ) + >r 0 r@ x ! 0 r@ y ! r@ len ! r> text ! ; + button defines init +@end example -Gforth provides some words for defining primitives (words written in -machine code), and for defining the the machine-code equivalent of -@code{DOES>}-based defining words. However, the machine-independent -nature of Gforth poses a few problems: First of all, Gforth runs on -several architectures, so it can provide no standard assembler. What's -worse is that the register allocation not only depends on the processor, -but also on the @code{gcc} version and options used. +@noindent +To demonstrate inheritance, we define a class @code{bold-button}, with no +new data and no new methods: -The words that Gforth offers encapsulate some system dependences (e.g., the -header structure), so a system-independent assembler may be used in -Gforth. If you do not have an assembler, you can compile machine code -directly with @code{,} and @code{c,}. +@example +button class +end-class bold-button -doc-assembler -doc-code -doc-end-code -doc-;code -doc-flush-icache +: bold 27 emit ." [1m" ; +: normal 27 emit ." [0m" ; +@end example -If @code{flush-icache} does not work correctly, @code{code} words -etc. will not work (reliably), either. +@noindent +The class @code{bold-button} has a different draw method to +@code{button}, but the new method is defined in terms of the draw method +for @code{button}: -These words are rarely used. Therefore they reside in @code{code.fs}, -which is usually not loaded (except @code{flush-icache}, which is always -present). You can load them with @code{require code.fs}. +@example +:noname bold [ button :: draw ] normal ; bold-button defines draw +@end example -@cindex registers of the inner interpreter -In the assembly code you will want to refer to the inner interpreter's -registers (e.g., the data stack pointer) and you may want to use other -registers for temporary storage. Unfortunately, the register allocation -is installation-dependent. +@noindent +Finally, create two objects and apply methods: -The easiest solution is to use explicit register declarations -(@pxref{Explicit Reg Vars, , Variables in Specified Registers, gcc.info, -GNU C Manual}) for all of the inner interpreter's registers: You have to -compile Gforth with @code{-DFORCE_REG} (configure option -@code{--enable-force-reg}) and the appropriate declarations must be -present in the @code{machine.h} file (see @code{mips.h} for an example; -you can find a full list of all declarable register symbols with -@code{grep register engine.c}). If you give explicit registers to all -variables that are declared at the beginning of @code{engine()}, you -should be able to use the other caller-saved registers for temporary -storage. Alternatively, you can use the @code{gcc} option -@code{-ffixed-REG} (@pxref{Code Gen Options, , Options for Code -Generation Conventions, gcc.info, GNU C Manual}) to reserve a register -(however, this restriction on register allocation may slow Gforth -significantly). +@example +button new Constant foo +s" thin foo" foo init +page +foo draw +bold-button new Constant bar +s" fat bar" bar init +1 bar y ! +bar draw +@end example -If this solution is not viable (e.g., because @code{gcc} does not allow -you to explicitly declare all the registers you need), you have to find -out by looking at the code where the inner interpreter's registers -reside and which registers can be used for temporary storage. You can -get an assembly listing of the engine's code with @code{make engine.s}. -In any case, it is good practice to abstract your assembly code from the -actual register allocation. E.g., if the data stack pointer resides in -register @code{$17}, create an alias for this register called @code{sp}, -and use that in your assembly code. +@node Comparison with other object models, , Mini-OOF, Object-oriented Forth +@subsubsection Comparison with other object models +@cindex comparison of object models +@cindex object models, comparison -@cindex code words, portable -Another option for implementing normal and defining words efficiently -is: adding the wanted functionality to the source of Gforth. For normal -words you just have to edit @file{primitives} (@pxref{Automatic -Generation}), defining words (equivalent to @code{;CODE} words, for fast -defined words) may require changes in @file{engine.c}, @file{kernel.fs}, -@file{prims2x.fs}, and possibly @file{cross.fs}. +Many object-oriented Forth extensions have been proposed (@cite{A survey +of object-oriented Forths} (SIGPLAN Notices, April 1996) by Bradford +J. Rodriguez and W. F. S. Poehlman lists 17). This section discusses the +relation of the object models described here to two well-known and two +closely-related (by the use of method maps) models. +@cindex Neon model +The most popular model currently seems to be the Neon model (see +@cite{Object-oriented programming in ANS Forth} (Forth Dimensions, March +1997) by Andrew McKewan) but this model has a number of limitations +@footnote{A longer version of this critique can be +found in @cite{On Standardizing Object-Oriented Forth Extensions} (Forth +Dimensions, May 1997) by Anton Ertl.}: -@node Threading Words, Passing Commands to the OS, Assembler and Code Words, Words -@section Threading Words -@cindex threading words +@itemize @bullet +@item +It uses a @code{@emph{selector +object}} syntax, which makes it unnatural to pass objects on the +stack. -@cindex code address -These words provide access to code addresses and other threading stuff -in Gforth (and, possibly, other interpretive Forths). It more or less -abstracts away the differences between direct and indirect threading -(and, for direct threading, the machine dependences). However, at -present this wordset is still incomplete. It is also pretty low-level; -some day it will hopefully be made unnecessary by an internals wordset -that abstracts implementation details away completely. +@item +It requires that the selector parses the input stream (at +compile time); this leads to reduced extensibility and to bugs that are+ +hard to find. -doc-threading-method -doc->code-address -doc->does-code -doc-code-address! -doc-does-code! -doc-does-handler! -doc-/does-handler +@item +It allows using every selector to every object; +this eliminates the need for classes, but makes it harder to create +efficient implementations. +@end itemize -The code addresses produced by various defining words are produced by -the following words: +@cindex Pountain's object-oriented model +Another well-known publication is @cite{Object-Oriented Forth} (Academic +Press, London, 1987) by Dick Pountain. However, it is not really about +object-oriented programming, because it hardly deals with late +binding. Instead, it focuses on features like information hiding and +overloading that are characteristic of modular languages like Ada (83). -doc-docol: -doc-docon: -doc-dovar: -doc-douser: -doc-dodefer: -doc-dofield: +@cindex Zsoter's object-oriented model +In @cite{Does late binding have to be slow?} (Forth Dimensions 18(1) 1996, pages 31-35) +Andras Zsoter describes a model that makes heavy use of an active object +(like @code{this} in @file{objects.fs}): The active object is not only +used for accessing all fields, but also specifies the receiving object +of every selector invocation; you have to change the active object +explicitly with @code{@{ ... @}}, whereas in @file{objects.fs} it +changes more or less implicitly at @code{m: ... ;m}. Such a change at +the method entry point is unnecessary with the Zsoter's model, because +the receiving object is the active object already. On the other hand, the explicit +change is absolutely necessary in that model, because otherwise no one +could ever change the active object. An ANS Forth implementation of this +model is available at @url{http://www.forth.org/fig/oopf.html}. -You can recognize words defined by a @code{CREATE}...@code{DOES>} word -with @code{>DOES-CODE}. If the word was defined in that way, the value -returned is different from 0 and identifies the @code{DOES>} used by the -defining word. -@comment TODO should that be "identifies the xt of the DOES> ?? +@cindex @file{oof.fs}, differences to other models +The @file{oof.fs} model combines information hiding and overloading +resolution (by keeping names in various word lists) with object-oriented +programming. It sets the active object implicitly on method entry, but +also allows explicit changing (with @code{>o...o>} or with +@code{with...endwith}). It uses parsing and state-smart objects and +classes for resolving overloading and for early binding: the object or +class parses the selector and determines the method from this. If the +selector is not parsed by an object or class, it performs a call to the +selector for the active object (late binding), like Zsoter's model. +Fields are always accessed through the active object. The big +disadvantage of this model is the parsing and the state-smartness, which +reduces extensibility and increases the opportunities for subtle bugs; +essentially, you are only safe if you never tick or @code{postpone} an +object or class (Bernd disagrees, but I (Anton) am not convinced). + +@cindex @file{mini-oof.fs}, differences to other models +The @file{mini-oof.fs} model is quite similar to a very stripped-down version of +the @file{objects.fs} model, but syntactically it is a mixture of the @file{objects.fs} and +@file{oof.fs} models. -@node Passing Commands to the OS, Miscellaneous Words, Threading Words, Words +@c ------------------------------------------------------------- +@node Passing Commands to the OS, Miscellaneous Words, Object-oriented Forth, Words @section Passing Commands to the Operating System @cindex operating system - passing commands @cindex shell commands @@ -7102,12 +7275,12 @@ doc-system doc-$? doc-getenv - +@c ------------------------------------------------------------- @node Miscellaneous Words, , Passing Commands to the OS, Words @section Miscellaneous Words @cindex miscellaneous words -These section lists the ANS Standard Forth words that are not documented +These section lists the ANS Forth words that are not documented elsewhere in this manual. Ultimately, they all need proper homes. doc-, @@ -7126,10 +7299,10 @@ doc-word doc-[compile] doc-refill -These ANS Standard Forth words are not currently implemented in Gforth +These ANS Forth words are not currently implemented in Gforth (see TODO section on dependencies) -The following ANS Standard Forth words are not currently supported by Gforth +The following ANS Forth words are not currently supported by Gforth (@pxref{ANS conformance}) @code{EDITOR} @@ -7385,9 +7558,9 @@ installation-dependent. Currently a char @item character-set extensions and matching of names: @cindex character-set extensions and matching of names -@cindex case sensitivity for name lookup -@cindex name lookup, case sensitivity -@cindex locale and case sensitivity +@cindex case-sensitivity for name lookup +@cindex name lookup, case-sensitivity +@cindex locale and case-sensitivity Any character except the ASCII NUL character can be used in a name. Matching is case-insensitive (except in @code{TABLE}s). The matching is performed using the C function @code{strncasecmp}, whose @@ -7413,9 +7586,9 @@ like @code{PARSE} otherwise. @code{(NAME interpreter (aka text interpreter) by default, treats all white-space characters as delimiters. -@item format of the control flow stack: -@cindex control flow stack, format -The data stack is used as control flow stack. The size of a control flow +@item format of the control-flow stack: +@cindex control-flow stack, format +The data stack is used as control-flow stack. The size of a control-flow stack item in cells is given by the constant @code{cs-item-size}. At the time of this writing, an item consists of a (pointer to a) locals list (third), an address in the code (second), and a tag for identifying the @@ -7443,7 +7616,7 @@ The error string is stored into the vari @item input line terminator: @cindex input line terminator @cindex line terminator on input -@cindex newline charcter on input +@cindex newline character on input For interactive input, @kbd{C-m} (CR) and @kbd{C-j} (LF) terminate lines. One of these characters is typically produced when you type the @kbd{Enter} or @kbd{Return} key. @@ -7548,7 +7721,7 @@ The remainder of dictionary space. @code @item system case-sensitivity characteristics: @cindex case-sensitivity characteristics -Dictionary searches are case insensitive (except in +Dictionary searches are case-insensitive (except in @code{TABLE}s). However, as explained above under @i{character-set extensions}, the matching for non-ASCII characters is determined by the locale you are using. In the default @code{C} locale all non-ASCII @@ -7594,13 +7767,13 @@ No. @item a name is neither a word nor a number: @cindex name not found -@cindex Undefined word +@cindex undefined word @code{-13 throw} (Undefined word). Actually, @code{-13 bounce}, which preserves the data and FP stack, so you don't lose more work than necessary. @item a definition name exceeds the maximum length allowed: -@cindex Word name too long +@cindex word name too long @code{-19 throw} (Word name too long) @item addressing a region not inside the various data spaces of the forth system: @@ -7611,7 +7784,7 @@ the operating system. On decent systems: address). @item argument type incompatible with parameter: -@cindex Argument type mismatch +@cindex argument type mismatch This is usually not caught. Some words perform checks, e.g., the control flow words, and issue a @code{ABORT"} or @code{-12 THROW} (Argument type mismatch). @@ -7626,7 +7799,6 @@ get an execution token for @code{compile @item dividing by zero: @cindex dividing by zero @cindex floating point unidentified fault, integer division -@cindex divide by zero On better platforms, this produces a @code{-10 throw} (Division by zero); on other systems, this typically results in a @code{-55 throw} (Floating-point unidentified fault). @@ -7634,7 +7806,7 @@ zero); on other systems, this typically @item insufficient data stack or return stack space: @cindex insufficient data stack or return stack space @cindex stack overflow -@cindex Address alignment exception, stack overflow +@cindex address alignment exception, stack overflow @cindex Invalid memory address, stack overflow Depending on the operating system, the installation, and the invocation of Gforth, this is either checked by the memory management hardware, or @@ -7729,7 +7901,7 @@ Compiles a recursive call to the definin @item argument input source different than current input source for @code{RESTORE-INPUT}: @cindex argument input source different than current input source for @code{RESTORE-INPUT} -@cindex Argument type mismatch, @code{RESTORE-INPUT} +@cindex argument type mismatch, @code{RESTORE-INPUT} @cindex @code{RESTORE-INPUT}, Argument type mismatch @code{-12 THROW}. Note that, once an input file is closed (e.g., because the end of the file was reached), its source-id may be @@ -7748,7 +7920,7 @@ memory access faults or execution of ill @item data space read/write with incorrect alignment: @cindex data space read/write with incorrect alignment @cindex alignment faults -@cindex Address alignment exception +@cindex address alignment exception Processor-dependent. Typically results in a @code{-23 throw} (Address alignment exception). Under Linux-Intel on a 486 or later processor with alignment turned on, incorrect alignment results in a @code{-9 throw} @@ -7781,7 +7953,7 @@ defined by @code{CONSTANT}; in the latte @item name not found (@code{'}, @code{POSTPONE}, @code{[']}, @code{[COMPILE]}): @cindex name not found (@code{'}, @code{POSTPONE}, @code{[']}, @code{[COMPILE]}) -@cindex Undefined word, @code{'}, @code{POSTPONE}, @code{[']}, @code{[COMPILE]} +@cindex undefined word, @code{'}, @code{POSTPONE}, @code{[']}, @code{[COMPILE]} @code{-13 throw} (Undefined word) @item parameters are not of the same type (@code{DO}, @code{?DO}, @code{WITHIN}): @@ -7795,7 +7967,7 @@ Assume @code{: X POSTPONE TO ; IMMEDIATE compilation semantics of @code{TO}. @item String longer than a counted string returned by @code{WORD}: -@cindex String longer than a counted string returned by @code{WORD} +@cindex string longer than a counted string returned by @code{WORD} @cindex @code{WORD}, string overflow Not checked. The string will be ok, but the count will, of course, contain only the least significant bits of the length. @@ -8507,9 +8679,9 @@ as well as possible. @cindex @code{FORGET}, deleting the compilation word list Not implemented (yet). -@item fewer than @var{u}+1 items on the control flow stack (@code{CS-PICK}, @code{CS-ROLL}): -@cindex @code{CS-PICK}, fewer than @var{u}+1 items on the control flow stack -@cindex @code{CS-ROLL}, fewer than @var{u}+1 items on the control flow stack +@item fewer than @var{u}+1 items on the control-flow stack (@code{CS-PICK}, @code{CS-ROLL}): +@cindex @code{CS-PICK}, fewer than @var{u}+1 items on the control flow-stack +@cindex @code{CS-ROLL}, fewer than @var{u}+1 items on the control flow-stack @cindex control-flow stack underflow This typically results in an @code{abort"} with a descriptive error message (may change into a @code{-22 throw} (Control structure mismatch) @@ -8596,12 +8768,12 @@ are applied to the latest defined word ( @item search order empty (@code{previous}): @cindex @code{previous}, search order empty -@cindex Vocstack empty, @code{previous} +@cindex vocstack empty, @code{previous} @code{abort" Vocstack empty"}. @item too many word lists in search order (@code{also}): @cindex @code{also}, too many word lists in search order -@cindex Vocstack full, @code{also} +@cindex vocstack full, @code{also} @code{abort" Vocstack full"}. @end table @@ -8664,6 +8836,7 @@ Signals? Accessing the Stacks +@c ****************************************************************** @node Emacs and Gforth, Image Files, Integrating Gforth, Top @chapter Emacs and Gforth @cindex Emacs and Gforth @@ -8678,13 +8851,22 @@ Accessing the Stacks @cindex Forth mode in Emacs Gforth comes with @file{gforth.el}, an improved version of @file{forth.el} by Goran Rydqvist (included in the TILE package). The -improvements are a better (but still not perfect) handling of -indentation. I have also added comment paragraph filling (@kbd{M-q}), -commenting (@kbd{C-x \}) and uncommenting (@kbd{C-u C-x \}) regions and -removing debugging tracers (@kbd{C-x ~}, @pxref{Debugging}). I left the -stuff I do not use alone, even though some of it only makes sense for -TILE. To get a description of these features, enter Forth mode and type -@kbd{C-h m}. +improvements are: + +@itemize @bullet +@item +A better (but still not perfect) handling of indentation. +@item +Comment paragraph filling (@kbd{M-q}) +@item +Commenting (@kbd{C-x \}) and uncommenting (@kbd{C-u C-x \}) of regions +@item +Removal of debugging tracers (@kbd{C-x ~}, @pxref{Debugging}). +@end itemize + +I left the stuff I do not use alone, even though some of it only makes +sense for TILE. To get a description of these features, enter Forth mode +and type @kbd{C-h m}. @cindex source location of error or debugging output in Emacs @cindex error output, finding the source location in Emacs @@ -8700,8 +8882,8 @@ message is only a few keystrokes away (@ @cindex @file{TAGS} file @cindex @file{etags.fs} @cindex viewing the source of a word in Emacs -Also, if you @code{include} @file{etags.fs}, a new @file{TAGS} file -(@pxref{Tags, , Tags Tables, emacs, Emacs Manual}) will be produced that +Also, if you @code{include} @file{etags.fs}, a new @file{TAGS} file will +be produced (@pxref{Tags, , Tags Tables, emacs, Emacs Manual}) that contains the definitions of all words defined afterwards. You can then find the source for a word using @kbd{M-.}. Note that emacs can use several tags files at the same time (e.g., one for the Gforth sources @@ -8719,10 +8901,11 @@ file: (setq auto-mode-alist (cons '("\\.fs\\'" . forth-mode) auto-mode-alist)) @end example +@c ****************************************************************** @node Image Files, Engine, Emacs and Gforth, Top @chapter Image Files -@cindex image files -@cindex @code{.fi} files +@cindex image file +@cindex @file{.fi} files @cindex precompiled Forth code @cindex dictionary in persistent form @cindex persistent form of dictionary @@ -8774,24 +8957,25 @@ Our Forth system consists not only of pr definitions written in Forth. Since the Forth compiler itself belongs to those definitions, it is not possible to start the system with the primitives and the Forth source alone. Therefore we provide the Forth -code as an image file in nearly executable form. At the start of the -system a C routine loads the image file into memory, optionally -relocates the addresses, then sets up the memory (stacks etc.) according -to information in the image file, and starts executing Forth code. +code as an image file in nearly executable form. When Gforth starts up, +a C routine loads the image file into memory, optionally relocates the +addresses, then sets up the memory (stacks etc.) according to +information in the image file, and (finally) starts executing Forth +code. The image file variants represent different compromises between the goals of making it easy to generate image files and making them portable. @cindex relocation at run-time -Win32Forth 3.4 and Mitch Bradleys @code{cforth} use relocation at +Win32Forth 3.4 and Mitch Bradley's @code{cforth} use relocation at run-time. This avoids many of the complications discussed below (image files are data relocatable without further ado), but costs performance (one addition per memory access). @cindex relocation at load-time -By contrast, our loader performs relocation at image load time. The -loader also has to replace tokens standing for primitive calls with the +By contrast, the Gforth loader performs relocation at image load time. The +loader also has to replace tokens that represent primitive calls with the appropriate code-field addresses (or code addresses in the case of direct threading). @@ -8809,7 +8993,7 @@ caused by the design of the image file l @item There is only one segment; in particular, this means, that an image file cannot represent @code{ALLOCATE}d memory chunks (and pointers to -them). And the contents of the stacks are not represented, either. +them). The contents of the stacks are not represented, either. @item The only kinds of relocation supported are: adding the same offset to @@ -8855,7 +9039,7 @@ a place where it is stored in a non-mang @node Non-Relocatable Image Files, Data-Relocatable Image Files, Image File Background, Image Files @section Non-Relocatable Image Files @cindex non-relocatable image files -@cindex image files, non-relocatable +@cindex image file, non-relocatable These files are simple memory dumps of the dictionary. They are specific to the executable (i.e., @file{gforth} file) they were created @@ -8873,7 +9057,7 @@ doc-savesystem @node Data-Relocatable Image Files, Fully Relocatable Image Files, Non-Relocatable Image Files, Image Files @section Data-Relocatable Image Files @cindex data-relocatable image files -@cindex image files, data-relocatable +@cindex image file, data-relocatable These files contain relocatable data addresses, but fixed code addresses (instead of tokens). They are specific to the executable (i.e., @@ -8886,7 +9070,7 @@ Relocatable Image Files}). @node Fully Relocatable Image Files, Stack and Dictionary Sizes, Data-Relocatable Image Files, Image Files @section Fully Relocatable Image Files @cindex fully relocatable image files -@cindex image files, fully relocatable +@cindex image file, fully relocatable @cindex @file{kern*.fi}, relocatability @cindex @file{gforth.fi}, relocatability @@ -9021,13 +9205,32 @@ gforth -i @var{image} @end example @cindex executable image file -@cindex image files, executable +@cindex image file, executable If your operating system supports starting scripts with a line of the form @code{#! ...}, you just have to type the image file name to start Gforth with this image file (note that the file extension @code{.fi} is just a convention). I.e., to run Gforth with the image file @var{image}, you can just type @var{image} instead of @code{gforth -i @var{image}}. +For example, if you place this text in a file: + +@example +#! /usr/local/bin/gforth + +." Hello, world" CR +bye + +@end example + +@noindent +And then make the file executable (chmod +x in Unix), you can run it +directly from the command line. The sequence @code{#!} is used in two +ways; firstly, it is recognised as a ``magic sequence'' by the operating +system, secondly it is treated as a comment character by Gforth. Because +of the second usage, a space is required between @code{#!} and the path +to the executable. +@comment TODO describe the #! magic with reference to the Power Tools book. + doc-#! @node Modifying the Startup Sequence, , Running Image Files, Image Files @@ -9037,13 +9240,9 @@ doc-#! @cindex initialization sequence of image file You can add your own initialization to the startup sequence through the -deferred word - -doc-'cold - -@code{'cold} is invoked just before the image-specific command line -processing (by default, loading files and evaluating (@code{-e}) strings) -starts. +deferred word @code{'cold}. @code{'cold} is invoked just before the +image-specific command line processing (by default, loading files and +evaluating (@code{-e}) strings) starts. A sequence for adding your initialization usually looks like this: @@ -9055,7 +9254,7 @@ A sequence for adding your initializatio @end example @cindex turnkey image files -@cindex image files, turnkey applications +@cindex image file, turnkey applications You can make a turnkey image by letting @code{'cold} execute a word (your turnkey application) that never returns; instead, it exits Gforth via @code{bye} or @code{throw}. @@ -9063,16 +9262,18 @@ via @code{bye} or @code{throw}. @cindex command-line arguments, access @cindex arguments on the command line, access You can access the (image-specific) command-line arguments through the -variables @code{argc} and @code{argv}. @code{arg} provides conventient +variables @code{argc} and @code{argv}. @code{arg} provides convenient access to @code{argv}. +If @code{'cold} exits normally, Gforth processes the command-line +arguments as files to be loaded and strings to be evaluated. Therefore, +@code{'cold} should remove the arguments it has used in this case. + +doc-'cold doc-argc doc-argv doc-arg -If @code{'cold} exits normally, Gforth processes the command-line -arguments as files to be loaded and strings to be evaluated. Therefore, -@code{'cold} should remove the arguments it has used in this case. @c ****************************************************************** @node Engine, Binding to System Library, Image Files, Top @@ -9080,7 +9281,7 @@ arguments as files to be loaded and stri @cindex engine @cindex virtual machine -Reading this section is not necessary for programming with Gforth. It +Reading this chapter is not necessary for programming with Gforth. It may be helpful for finding your way in the Gforth sources. The ideas in this section have also been published in the papers @@ -9100,11 +9301,12 @@ Ertl, presented at EuroForth '93; the la @section Portability @cindex engine portability -One of the main goals of the effort is availability across a wide range -of personal machines. fig-Forth, and, to a lesser extent, F83, achieved -this goal by manually coding the engine in assembly language for several -then-popular processors. This approach is very labor-intensive and the -results are short-lived due to progress in computer architecture. +An important goal of the Gforth Project is availability across a wide +range of personal machines. fig-Forth, and, to a lesser extent, F83, +achieved this goal by manually coding the engine in assembly language +for several then-popular processors. This approach is very +labor-intensive and the results are short-lived due to progress in +computer architecture. @cindex C, using C for the engine Others have avoided this problem by coding in C, e.g., Mitch Bradley @@ -9169,10 +9371,10 @@ makes it possible to take the address of @code{goto *@var{address}}. I.e., @code{goto *&&x} is the same as @code{goto x}. -@cindex NEXT, indirect threaded +@cindex @code{NEXT}, indirect threaded @cindex indirect threaded inner interpreter @cindex inner interpreter, indirect threaded -With this feature an indirect threaded NEXT looks like: +With this feature an indirect threaded @code{NEXT} looks like: @example cfa = *ip++; ca = *cfa; @@ -9186,7 +9388,7 @@ executed; The @code{ca} (code address) f executable code, e.g., a primitive or the colon definition handler @code{docol}. -@cindex NEXT, direct threaded +@cindex @code{NEXT}, direct threaded @cindex direct threaded inner interpreter @cindex inner interpreter, direct threaded Direct threading is even simpler: @@ -9196,7 +9398,7 @@ goto *ca; @end example Of course we have packaged the whole thing neatly in macros called -@code{NEXT} and @code{NEXT1} (the part of NEXT after fetching the cfa). +@code{NEXT} and @code{NEXT1} (the part of @code{NEXT} after fetching the cfa). @menu * Scheduling:: @@ -9221,7 +9423,7 @@ sp++; sp[0]=n; NEXT; @end example -the NEXT comes strictly after the other code, i.e., there is nearly no +the @code{NEXT} comes strictly after the other code, i.e., there is nearly no scheduling. After a little thought the problem becomes clear: The compiler cannot know that @code{sp} and @code{ip} point to different addresses (and the version of @code{gcc} we used would not know it even @@ -9229,7 +9431,7 @@ if it was possible), so it could not mov store to the TOS. Indeed the pointers could be the same, if code on or very near the top of stack were executed. In the interest of speed we chose to forbid this probably unused ``feature'' and helped the compiler -in scheduling: NEXT is divided into the loading part (@code{NEXT_P1}) +in scheduling: @code{NEXT} is divided into the loading part (@code{NEXT_P1}) and the goto part (@code{NEXT_P2}). @code{+} now looks like: @example n=sp[0]+sp[1]; @@ -9274,17 +9476,17 @@ supported on all machines. @subsection DOES> @cindex @code{DOES>} implementation -@cindex dodoes routine -@cindex DOES-code +@cindex @code{dodoes} routine +@cindex @code{DOES>}-code One of the most complex parts of a Forth engine is @code{dodoes}, i.e., the chunk of code executed by every word defined by a @code{CREATE}...@code{DOES>} pair. The main problem here is: How to find the Forth code to be executed, i.e. the code after the -@code{DOES>} (the DOES-code)? There are two solutions: +@code{DOES>} (the @code{DOES>}-code)? There are two solutions: In fig-Forth the code field points directly to the @code{dodoes} and the -DOES-code address is stored in the cell after the code address (i.e. at -@code{@var{cfa} cell+}). It may seem that this solution is illegal in +@code{DOES>}code address is stored in the cell after the code address (i.e. at +@code{@var{CFA} cell+}). It may seem that this solution is illegal in the Forth-79 and all later standards, because in fig-Forth this address lies in the body (which is illegal in these standards). However, by making the code field larger for all words this solution becomes legal @@ -9296,15 +9498,15 @@ to avoid having different image files fo systems (direct threaded systems require two-cell code fields on many machines). -@cindex DOES-handler +@cindex @code{DOES>}-handler The other approach is that the code field points or jumps to the cell -after @code{DOES}. In this variant there is a jump to @code{dodoes} at -this address (the DOES-handler). @code{dodoes} can then get the -DOES-code address by computing the code address, i.e., the address of +after @code{DOES>}. In this variant there is a jump to @code{dodoes} at +this address (the @code{DOES>}-handler). @code{dodoes} can then get the +@code{DOES>}-code address by computing the code address, i.e., the address of the jump to dodoes, and add the length of that jump field. A variant of this is to have a call to @code{dodoes} after the @code{DOES>}; then the return address (which can be found in the return register on RISCs) is -the DOES-code address. Since the two cells available in the code field +the @code{DOES>}-code address. Since the two cells available in the code field are used up by the jump to the code address in direct threading on many architectures, we use this approach for direct threading on these architectures. We did not want to add another cell to the code field. @@ -9388,8 +9590,8 @@ well and produces optimal code for @code HP RISC machines: Defining the @code{n}s does not produce any code, and using them as intermediate storage also adds no cost. -There are also other optimizations, that are not illustrated by this -example: Assignments between simple variables are usually for free (copy +There are also other optimizations that are not illustrated by this +example: assignments between simple variables are usually for free (copy propagation). If one of the stack items is not used by the primitive (e.g. in @code{drop}), the compiler eliminates the load from the stack (dead code elimination). On the other hand, there are some things that @@ -9400,7 +9602,7 @@ a stack item to the place where it just While programming a primitive is usually easy, there are a few cases where the programmer has to take the actions of the generator into account, most notably @code{?dup}, but also words that do not (always) -fall through to NEXT. +fall through to @code{NEXT}. @node TOS Optimization, Produced code, Automatic Generation, Primitives @subsection TOS Optimization @@ -9530,19 +9732,19 @@ matmul 1.00 1.47 1.35 1.46 0.74 fib 1.00 1.52 1.34 1.22 0.86 1.74 2.99 4.30 @end example -You may find the good performance of Gforth compared with the systems -written in assembly language quite surprising. One important reason for -the disappointing performance of these systems is probably that they are -not written optimally for the 486 (e.g., they use the @code{lods} -instruction). In addition, Win32Forth uses a comfortable, but costly -method for relocating the Forth image: like @code{cforth}, it computes -the actual addresses at run time, resulting in two address computations -per NEXT (@pxref{Image File Background}). - -Only Eforth with the peephole optimizer performs comparable to -Gforth. The speedups achieved with peephole optimization of threaded -code are quite remarkable. Adding a peephole optimizer to Gforth should -cause similar speedups. +You may be quite surprised by the good performance of Gforth when +compared with systems written in assembly language. One important reason +for the disappointing performance of these other systems is probably +that they are not written optimally for the 486 (e.g., they use the +@code{lods} instruction). In addition, Win32Forth uses a comfortable, +but costly method for relocating the Forth image: like @code{cforth}, it +computes the actual addresses at run time, resulting in two address +computations per @code{NEXT} (@pxref{Image File Background}). + +Only Eforth with the peephole optimizer has a performance that is +comparable to Gforth. The speedups achieved with peephole optimization +of threaded code are quite remarkable. Adding a peephole optimizer to +Gforth should cause similar speedups. The speedup of Gforth over PFE, ThisForth and TILE can be easily explained with the self-imposed restriction of the latter systems to @@ -9552,15 +9754,15 @@ Vars, , Defining Global Register Variabl Moreover, current C compilers have a hard time optimizing other aspects of the ThisForth and the TILE source. -Note that the performance of Gforth on 386 architecture processors -varies widely with the version of @code{gcc} used. E.g., @code{gcc-2.5.8} -failed to allocate any of the virtual machine registers into real -machine registers by itself and would not work correctly with explicit -register declarations, giving a 1.3 times slower engine (on a 486DX2/66 -running the Sieve) than the one measured above. +The performance of Gforth on 386 architecture processors varies widely +with the version of @code{gcc} used. E.g., @code{gcc-2.5.8} failed to +allocate any of the virtual machine registers into real machine +registers by itself and would not work correctly with explicit register +declarations, giving a 1.3 times slower engine (on a 486DX2/66 running +the Sieve) than the one measured above. -Note also that there have been several releases of Win32Forth since the -release presented here, so the results presented here may have little +Note that there have been several releases of Win32Forth since the +release presented here, so the results presented above may have little predictive value for the performance of Win32Forth today. @cindex @file{Benchres} @@ -9575,6 +9777,7 @@ newer version of these measurements at @url{http://www.complang.tuwien.ac.at/forth/performance.html}. You can find numbers for Gforth on various machines in @file{Benchres}. +@c ****************************************************************** @node Binding to System Library, Cross Compiler, Engine, Top @chapter Binding to System Library @@ -9652,7 +9855,7 @@ was developed across the Internet, and i physically for the first 4 years of development. @section Pedigree -@cindex Pedigree of Gforth +@cindex pedigree of Gforth Gforth descends from bigFORTH (1993) and fig-Forth. Gforth and PFE (by Dirk Zoller) will cross-fertilize each other. Of course, a significant @@ -9702,7 +9905,7 @@ information about Forth there. @node Internet resources, Books, Forth-related information, Forth-related information @section Internet resources -@cindex Internet resources +@cindex internet resources @cindex comp.lang.forth @cindex frequently asked questions @@ -9738,7 +9941,7 @@ Research (JFAR) and a searchable Forth b @node Books, The Forth Interest Group, Internet resources, Forth-related information @section Books -@cindex Books +@cindex books on Forth As the Standard is relatively new, there are not many books out yet. It is not recommended to learn Forth by using Gforth and a book that is not @@ -9749,7 +9952,7 @@ should be ok, because ANS Forth is prima @cindex standard document for ANS Forth @cindex ANS Forth document The definite reference if you want to write ANS Forth programs is, of -course, the ANS Forth Standard. It is available in printed form from the +course, the ANS Forth document. It is available in printed form from the National Standards Institute Sales Department (Tel.: USA (212) 642-4900; Fax.: USA (212) 302-1286) as document @cite{X3.215-1994} for about $200. You can also get it from Global Engineering Documents (Tel.: USA @@ -9763,8 +9966,8 @@ format); this HTML version also includes Interpretation (RFIs). Some pointers to these versions can be found through @*@url{http://www.complang.tuwien.ac.at/projects/forth.html}. -@cindex introductory book -@cindex book, introductory +@cindex introductory book on Forth +@cindex book on Forth, introductory @cindex Woehr, Jack: @cite{Forth: The New Model} @cindex @cite{Forth: The new model} (book) @cite{Forth: The New Model} by Jack Woehr (Prentice-Hall, 1993) is an @@ -9795,11 +9998,12 @@ hardly more useful than a pre-ANS book. @cindex Forth interest group (FIG) The Forth Interest Group (FIG) is a world-wide, non-profit, -member-supported organisation. It publishes a regular magazine and -offers other benefits of membership. You can contact the FIG through -their office email address: @email{office@@forth.org} or by visiting -their web site at @url{http://www.forth.org/}. This web site also -includes links to FIG chapters in other countries and American cities +member-supported organisation. It publishes a regular magazine, +@var{FORTH Dimensions}, and offers other benefits of membership. You can +contact the FIG through their office email address: +@email{office@@forth.org} or by visiting their web site at +@url{http://www.forth.org/}. This web site also includes links to FIG +chapters in other countries and American cities (@url{http://www.forth.org/chapters.html}). @node Conferences, , The Forth Interest Group, Forth-related information @@ -9807,7 +10011,8 @@ includes links to FIG chapters in other @cindex Conferences There are several regular conferences related to Forth. They are all -well-publicised in FIG magazine and on the comp.lang.forth news group: +well-publicised in @var{FORTH Dimensions} and on the comp.lang.forth +news group: @itemize @bullet @item @@ -9824,17 +10029,18 @@ EuroForth -- this European conference ta @node Word Index, Concept Index, Forth-related information, Top @unnumbered Word Index -This index is as incomplete as the manual. Each word is listed with -stack effect and wordset. +This index is a list of Forth words that have ``glossary'' entries +within this manual. Each word is listed with its stack effect and +wordset. @printindex fn @node Concept Index, , Word Index, Top @unnumbered Concept and Word Index -This index is as incomplete as the manual. Not all entries listed are -present verbatim in the text. Only the names are listed for the words -here. +Not all entries listed in this index are present verbatim in the +text. This index also duplicates, in abbreviated form, all of the words +listed in the Word Index (only the names are listed for the words here). @printindex cp