NTF/LXF is a Forth system developed by Peter Fälth. It tries to be ANS Forth compatible when possible. It includes all wordsets of the forth-2012 standard. The aim of the system is to research implementation techniques that provide compact code and high execution speed. It is a hobby project that has been developed during a 20 year period. It started life as a porting project, porting an public domain os2 forth by Rick VanNorman to Windows NT. It provided a great start and structure to build on. Today all parts except the assembler and disassembler are completely rewritten. The disassembler comes from win32forth. I am very grateful to Rick and Tom and Andrew for their work.
ntf runs under WIN XP, 7, 10. lxf runs under Linux.
Zip-file or Tar.gz-file . The contents are the same in both archives.
Unzip the distribution file to a directory of your choice. ntf.exe runs the system.
Untar the distribution file to a directory of your choice. Make lxf and lxd executable. ( chmod 755 lxf ). ./lxf runs the system. lxf uses only Linux systemcalls. lxd can also use standard library calls and is dependant on libc. Lxf/lxd expects the console/terminal to be in utf8 mode to work correctly.
The system can be seen as a virtual machine running at compile-time generating native code. The virtual machine has 2 stacks (data and return-stack). Stack operations do not generate code, they just change pointers on the virtual stacks. Operations that need to generate code, like +, look where the stack items are located and generate code using the locations. The result, often in a register, becomes the new top of stack. When a block boundary is reached code is generated to restore the real stacks to a known state. This happens when a call to other code is made or a control flow statement reached.
Operations that can be resolved at compile-time are done so. Adding 2 know numbers will be done during compilation and the result pushed on the virtual stack.
There are over 450 words with this type of special compilation semantics.
The virtual machine has its own assembler located in the VCOMPILER vocabulary. There are about 150 words and support words (unfortunately not yet documented)
An example can show how it works. Lets define MYSWAP
First we define the compile-time action
:p MYSWAP 1 0 v-swap ;p
The :p ;p pair defines the compile-time action of the word. If we execute it from the commandline we will get an error saying it is a compile-only word. But we can use it in compilation
: SWAP2 myswap ;
see swap2
A48BA4 407192 9 C80000 5 normal SWAP2
407192 8B4500 mov eax , [ebp]
407195 895D00 mov [ebp] , ebx
407198 8BD8 mov ebx , eax
40719A C3 ret near
ok
We can also give MYSWAP an interpretation action
:r MYSWAP myswap ;r
We can also define swap3 and swap4 like
: swap3 2>r r> r> ; and : swap4 locals| a b | a b ;
Both will produce the same code as above. In all this examples the code is generated by ; when it executes the word 1 v-exit. v-exit ( n -- ) resolves the stacks to a known state with the n top data items in registers. There is also a v-init ( n -- ) that sets up the stacks and is invoked by : .
The 450 words defined in this way make up the base system. The rest of the system is written using them. There are just a few code definitions in the Linux version (initialization code).
Every time a call is made the stacks must be resolved. Short words would benefit from being inlined. The call is removed and more efficient code can be generated. This can be achieved with macros. There are 2 ways to define macros
:m mymac dup c@ swap 1+ swap ;
or
macro mymac " dup c@ swap 1+ swap"
macros can also be defined instead of does> like
: array create ;macro swap cells + ;
Be careful only to use macros under the search order intended.
All words have 2 xts, one for compilation and one for interpretation. MYSWAP from above can illustrate this
see myswap
A48B8C 40719B 9 407176 28 primMYSWAP
40719B 8B4500 mov eax , [ebp]
40719E 895D00 mov [ebp] , ebx
4071A1 8BD8 mov ebx , eax
4071A3 C3 ret near
ok
seec myswap
A48B8C 40719B 9 407176 28 primMYSWAP
407176 E80FFC8700 call SETCOND
40717B C745F801000000 mov dword [ebp-8h] , # 1h
407182 895DFC mov [ebp-4h] , ebx
407185 BB00000000 mov ebx , # 0h
40718A 8D6DF8 lea ebp , [ebp-8h]
40718D E97DB58700 jmp V-SWAP
ok
STATE is only used twice in the system. Both times in the interpret function. One use is to determine which xt to execute and what parameters to give it. The other one is to handle a number; either push it to the real stack (interpretation) or the virtual stack (compilation).
.mem will show the memory use
.mem
Area used space free space
Code 25014 3378466
Data 5872 3139852
Name 35820 2323472
Code-sys 86309 165443
Data-sys 9280 252864
User 32136 33400
ok
The separation was originally done to be able to drop headers when turnkeying a program. It has also turned out to be a speed boost on modern processors. Name, Code-sys and Data-sys will be dropped in a turnkey system. Generally only the Data area is accessible for the user.
The Windows version is only dependant on KERNEL32.dll The Linux version (lxf) uses only systemcalls and has no dependencies. The lxd version can call external libraries and is dependant on libc.
Windows system calls are performed in 2 steps. First defining the DLL to use and then the system call.
DLLs are defined by s" DLLNAME" library: libname
an example:
s" ws2_32.dll" library: ws2
Executing ws2 will load the dll and return the entrypoint to it.
A function is defined by s" FunctionName" syscall: libname funcname
an example:
s" connect" syscall: ws2 (connect)
Executing (connect) will resolve and execute the function, if the library has not been loaded before it will be done. Please be aware that the Windows systemcalls are case sensitive. Parameters are pushed from right to left and removed by the system. Return codes are always returned.
To call an ordinary library (non systemcall) the number of parameters needs to be given so they later can be removed.
an example:
s" mcvcrt.dll" library: libc
1 s" puts" libcall: libc puts
z" Hello World" puts
will print Hello World and leave 0 on the stack
On Linux the kernel is called directly by issuing an interupt $80. Parameters are passed in the registers. To help there are 7 words to handle the calls; they are syscall0 - syscall6. The number shows how many parameters they take. The syscall number is on top of the stack. Consult unistd.h in the Linux source to find the syscall numbers. Best way to find out what parameters the syscall takes is to study the Linux source code. The order on the stack is ebx, ecx, edx, esi, edi, ebp.
an example
: type swap 1 4 syscall3 drop ;
will define type on Linux
4 is the write syscall 1 is the stdout handle swap places the parameters in the right order drop removes the return code
Library calls are available if lxd is used. The syntax is the same as for windows library calls.
MAP-FILE ( addr len fam -- addr' len' ior ) \ map a file in memory
S" myfile.txt" R/W MAP-FILE
will map the file in memory and return the start addr and len. Changes made to the memory are written to the file. Using R/O instead will not allow writing to the memory
UNMAP-FILE ( addr' len' -- ior ) \ unmap-file
will unmap the file and release the memory used. These words are used in the block words as well as include and compile-file.
COMPILE-FILE filename \ include file and print statistics
compile-file will include a file and also check that the stack didn't change and print how much memory was used.
MODULE \ start a new module definition
PRIVATE \ make the following words private
PUBLIC \ make the following words public
END-MODULE \ end module and remove all references to private words
These tools are used to remove words that have no meaning outside the module.
The Block wordset is fully implemented using a file. The file can be changed by:
using name or s" name" use-block-file
There is a built in block editor invoked by n edit.
You can send mail to peter.m.falth at gmail.com . I am interested in bugs and suggestions for improvement.