Assembler/Disassembler Reference

LOADing the source code

The default block file forth.blk, which is included in the strongForth package, contains the source code of an assembler and a disassembler. The assembler's source code is contained in blocks 100 to 129, while the source code of the disassembler is in blocks 130 to 163. The disassembler uses a few words from the Programming-Tools word set. Before using the assembler, type:

100 129 THRU \ Assembler

To make the disassembler available, type:

130 163 THRU \ Disassembler

Note that CODE, which initiates a code definition, is part of the strongForth core. There's a rather subtile reason for this. CODE is also the name of a data type, namely the data type of code space addresses. Since the defining word CODE takes no input parameters, while the data type CODE has both input and output parameters, the first one could prevent finding the second one in the dictionary. Therefore, FIND has to come across the data type CODE before it comes across the defining word CODE. Of course, this has nothing to do with operator overloading. It is just an unfortunate coincidence, that both words have the same name.

First step

Let's begin with a very simple code definition:

CODE 2+ ( INTEGER -- 1ST )
 OK
AX POP, AX INC, AX INC, AX PUSH, NEXT, END-CODE
 OK
4 2+ .
6 OK

As usual, defining a new strongForth word requires supplying a stack diagram. That's nothing new. Other than normal assemblers, Forth assemblers prefer postfix notation, which means the operands precede the instruction words, as in

AX POP,

A comma is appended to the names of all instruction words in order to mark the end of an assembly instruction, and to indicate that something is appended to the code space. NEXT, is not a single assembly instruction, but a macro consisting of three assembly instructions. It should always be compiled as the last instruction of a code definition, because it performs the semantics of the inner interpreter, which fetches the next token and jumps to the corresponding runtime code.

Finally, END-CODE terminates the code definition, just as ; terminates a colon definition. Note that a code definition is compiled while staying in interpretation state. Since CODE does not switch to compilation state, all assembly instructions are immediately executed, although they are not immediate words. This explains the OK prompt between CODE and END-CODE.

Now, let's try the disassembler:

DISASSEMBLE 2+
1AA9: AX POP,
1AAA: AX INC,
1AAB: AX INC,
1AAC: AX PUSH,
1AAD: ES: LODSW,
1AAF: BX AX MOV,
1AB1: ES: [BX] JMP,
 OK

We can easily recognise the first four assembly instructions.But what about the last three? Yes, thats the code generated by NEXT, or, in other words, these three assembly instructions constitute the inner interpreter. We'll get back to the inner interpreter later.

Second step

Our second example is slightly more complicated than the first one, because it uses ;CODE to start a sequence of assembly instructions:

: :N+ ( INTEGER -- )
CREATE , ;CODE ( INTEGER -- 1ST )
 OK
AX POP, AX ES: 2 [BX]+ ADD, AX PUSH, NEXT, END-CODE
 OK
CONST-SPACE
 OK
3 :N+ 3+
 OK
4 3+ .
7  OK

:N+ is a defining word that creates words that add a constant value to any integer. It stores the constant value in the data field of the new definition. ;CODE is succeeded by the stack diagram of the definitions created by :N+. These definitions simply expect an item of data type INTEGER on the data stack and return an item of the same data type as output parameter.

AX POP,

pops this integer from the data stack into AX. The next assembler instruction adds the constant value stored in the definitions data field to the value in AX:

AX ES: 2 [BX]+ ADD,

According to the usual 8086 assembler syntax, the destination operand (AX) comes first. The source operand consists of three parts. First, ES: tells the processor that this operand is located in the extra segment, i .e., in the constant data space. [BX]+ is the addressing mode, BX indirect with displacement, with 2 as the actual displacement. This means, the address of the operand is calculated by adding 2 to the content of the BX register. As will be shown in the next paragraph, the inner interpreter jumps to a definition's runtime code with BX pointing to the code field. The data field starts immediately after the code field, which is 2 bytes (= 1 cell) long. And the data field contains the constant value that is to be added to the input parameter.

The remaining instructions are the same as in the first example: Push the result onto the data stack, execute the inner interpreter and terminate :N+.

Excursion: The Inner Interpreter

The inner interpreter is responsible for interpreting compiled strongForth code. Some Forth systems do not require an inner interpreter, because they compile machine code to be executed directly by the processor. However, strongForth compiles indirect threaded code and thus needs an inner interpreter. A strongForth colon definition is compiled into a sequence of tokens, where each token is a pointer to the code field of a definition. A definition's code field is a memory cell that contains the address of the definition's runtime code. In a code definition, the runtime code is the machine code corresponding to the assembler instructions supplied between CODE and END-CODE. In a colon definition, the runtime code is always the same. It performs a kind of subroutine call that pushes the instruction pointer of the inner interpreter onto the return stack, and reloads the instruction pointer with the address of the parameter field of the colon definition. Since a colon definition's data field contains a new sequence of tokens, the inner interpreter just continues until it stumbles into the token of the Forth word (EXIT). (EXIT) is a code definition that pops the instruction pointer off the return stack, so that the inner interpreter continues where it left the original sequence of tokens. We'll have a closer look at the runtime code of colon definitions and of (EXIT) at the end of this section.

On common 16- and 32-bit processors, the inner interpreter is rather small. In strongForth, it fits into 3 assembly instructions with a total length of 7 bytes. Therefore, strongForth simply appends a copy of the inner interpreter to the end of each code definition, instead of compiling a jump instruction to a single occurance of the inner interpreter. That's exactly what NEXT, does:

: NEXT, ( -- )
  ES: LODSW, BX AX MOV, ES: [BX] JMP, ; OK

To understand what's happening in the inner interpreter, we have to investigate strongForth's register usage. The 8086 processor has14 16-bit registers:

AX
BX
CX
DX
general purpose registers
SP
BP
SI
DI
data stack pointer
return stack pointer
inner interpreter instruction pointer
reserved
IP
F
8086 instruction pointer
8086 flag register
CS
DS
SS
ES
code space segment register
data space segment register
data space segment register
constand data space segment register

The four general purpose registers have no dedicated functionality in strongForth. They can be used freely without being saved or restored inbetween words. Therefore, we can just use AX and BX in the inner interpreter without bothering what these registers contained before.

SP and BP are strongForth's data and return stack pointers, respectively. SI contains the instruction pointer of the inner interpreter. DI is reserved for future usage by strongForth. IP and F are the 8086 instruction pointer and flag register, so they are not available for dedicated usage by strongForth.

The last four registers are segment registers. StrongForth uses separate memory segments for code, data and name spaces, and the data space is additionally splitted in a RAM area called data space (in a narrow sense) and a ROM area called constant data space. DS and SS both point to the data space, because data and return stack are located in the RAM area together with all system and user variables. The data fields of colon definitions are always located in the constant data space, which can be accessed by using the ES segment register.

Looking again at the definition of NEXT, we can see that the first assembly instruction fetches the next token from the address in the constant data space, where the inner interpreter's instruction pointer (SI) points to. LODSW, automatically increments SI to point to the next token in the current sequence. Next, this token is copied from the AX to the BX register, because AX can not be used as an index register. BX now contains a pointer to the code field of the definition. Finally, ES: [BX] JMP, accesses the code field and performs an indirect jump to the runtime code of the current token. Note that the code field is also located in constant data space, while the jump destination always is in code space.

Now it should be easy to understand what the runtime code of colon definitions and of (EXIT) do. Since all colon definitions share the same runtime code, we can simply disassemble an arbitrary colon definition, for example SPACES:

DISASSEMBLE SPACES
0649: BP DEC,
064A: BP DEC,
064B: +00 [BP]+ SI MOV,
064E: BX INC,
064F: BX INC,
0650: SI BX MOV,
0652: ES: LODSW,
0654: BX AX MOV,
0656: ES: [BX] JMP,
 OK
DISASSEMBLE (EXIT)
1586: SI +00 [BP]+ MOV,
1589: BP INC,
158A: BP INC,
158B: ES: LODSW,
158D: BX AX MOV,
158F: ES: [BX] JMP,
 OK

The runtime code of colon definitions first decrements the return stack pointer in order to allocate space for one cell, and then stores the current inner interpreter instruction pointer in this cell. This is actually a push-to-return-stack operation. BX still points to the code field of the colon definition. After incrementing it by the size of one cell, it points to the colon definition's data field. This address becomes the new instruction pointer. Finally, the inner interpreter (NEXT,) is executed.

(EXIT) is even more simple. It's runtime code pops the inner interpreter instruction pointer from the return stack and then executes the inner interpreter.

Addressing modes

So far, you've already seen a small number of addressing modes. For example, AX is the name of an 8086 general-purpose register, but in the context of the strongForth assembler it denotes an addressing mode. AX is a so-called register direct addressing mode, which simply determines that the operand is in the AX register. Another example is [BX]+. This addressing mode determines that the operand is in the memory cell whose address is calculated by adding a constant displacement to the content of the BX register. The constant displacement is an input operand to [BX]+. Here is a list of all addressing modes explicitly defined in the strongForth assembler:

Word operands

AX register direct
CX
DX
BX
SP
BP
SI
DI
WORD[] memory direct
[BX] indirect
[BP]
[SI]
[DI]
[BX+SI] base with index
[BX+DI]
[BP+SI]
[BP+DI]
n [BX]+ indirect with displacement
n [BP]+
n [SI]+
n [DI]+
n [BX+SI]+ base with index and displacement
n [BX+DI]+
n [BP+SI]+
n [BP+DI]+
ES segment register direct
CS
SS
DS

Byte operands

AL register direct
CL
DL
BL
AH
CH
DH
BH
BYTE[] memory direct
BYTE[BX] indirect
BYTE[BP]
BYTE[SI]
BYTE[DI]
BYTE[BX+SI] base with index
BYTE[BX+DI]
BYTE[BP+SI]
BYTE[BP+DI]
n BYTE[BX]+ indirect with displacement
n BYTE[BP]+
n BYTE[SI]+
n BYTE[DI]+
n BYTE[BX+SI]+ base with index and displacement
n BYTE[BX+DI]+
n BYTE[BP+SI]+
n BYTE[BP+DI]+

As can be seen, byte and word operands are clearly distinguished. All addressing modes not listed here are implicit to specific assembly instructions. This is also true for immediate addressing, because immediate operands do not require a special word to indicate the addressing mode. For example,

AX 23 MOV,

moves the immediate value 23 into the AX register. MOV, is actually an overloaded word, where one version takes an addressing mode and an immediate operand as input parameters, whereas the second version requires two addressing modes.

[BP] is an addressing mode not defined for the 8086. It is actually emulated by 0 [BP]+. Whenever an addressing mode with displacement (n) is used, the assembler decides whether a 16-bit displacement is required or a signed 8-bit displacement is sufficient.

Note that BL is an addressing mode as well (register direct):

BL ( -- MODE )

This definition makes BL from the Core word set invisible, because both definitions have no input parameters that could give FIND a hint which of them to chose. As long as the Search-Order word set is not implemented in strongForth, renaming the assemblers version of BL is the only secure means to avoid problems. Renaming BL from the Core word set is obviously not a good idea.

Instruction words

The assembler provides only those instructions that are included in the 8086 instruction set. The additional 80186 and 80286 instructions are not implemented, but they can easily be added if desired.

Most assembly instructions expect one or two operands on the stack, which can be immediate values, branch destination addresses, registers or memory locations. Registers and memory addresses are always specified by addressing modes as explained in the previous section.

Note that in all instructions requiring two operands, like ADD, and MOV,, the destination operand comes before the source operand. A list of all strongForth assembly instructions is given below.

Instruction list

    AAA, ASCII Adjust for Addition
    AAD, ASCII Adjust for Division
    AAM, ASCII Adjust for Multiply
    AAS, ASCII Adjust for Subtraction
ex n ADC, ADd with Carry immediate source to destination byte or word
eb rb ADC, ADd with Carry source byte register to destination byte
ew rw ADC, ADd with Carry source word register to destination word
rb eb ADC, ADd with Carry source byte to destination byte register
rw ew ADC, ADd with Carry source word to destination word register
ex n ADD, ADD immediate source to destination byte or word
eb rb ADD, ADD source byte register to destination byte
ew rw ADD, ADD source word register to destination word
rb eb ADD, ADD source byte to destination byte register
rw ew ADD, ADD source word to destination word register
ex n AND, AND immediate source to destination byte or word
eb rb AND, AND source byte register to destination byte
ew rw AND, AND source word register to destination word
rb eb AND, AND source byte to destination byte register
rw ew AND, AND source word to destination word register
  c CALL, CALL subroutine at absolute address c
  ew CALL, CALL subroutine at address stored in source word
u c CALLF, CALL Far subroutine at absolute address c in memory segment u
  mw CALLF, CALL Far subroutine at address/segment stored in source double word
    CBW, Convert Byte to Word
    CLC, CLear Carry flag
    CLD, CLear Direction flag
    CLI, CLear Interrupt-enable flag
    CMC, CoMplement Carry flag
ex n CMP, CoMPare immediate source to destination byte or word
eb rb CMP, CoMPare source byte register to destination byte
ew rw CMP, CoMPare source word register to destination word
rb eb CMP, CoMPare source byte to destination byte register
rw ew CMP, CoMPare source word to destination word register
    CMPSB, CoMPare String of Bytes
    CMPSW, CoMPare String of Words
    CS: Code Segment prefix
    CWD, Convert Word to Double word
    DAA, Decimal Adjust for Addition
    DAS, Decimal Adjust for Subtraction
  n DB, Define immediate Byte
  d DD, Define immediate Double word
ex   DEC, DECrement destination byte or word by 1
  ex DIV, DIVide AX by source byte or AX/DX by source word (unsigned)
    DS: Data Segment prefix
  n DW, Define immediate Word
    ES: Extra Segment prefix
    HLT, enter HaLT state
  ex IDIV, Integer DIVide AX by source byte or AX/DX by source word (signed)
  ex IMUL, Integer MULtiply AL by source byte or AX by source word (signed)
ar pb IN, INput byte or word from immediate source port into AL or AX
ar DX IN, INput byte or word from source port in DX into AL or AX
ex   INC, INCrement destination byte or word by 1
  ub INT, INTerrupt vector number ub
    INTO, INTerrupt on Overflow vector number 4
    IRET, Interrupt RETurn
  cb JA, Jump on Above to absolute address cb
  cb JAE, Jump on Above or Equal to absolute address cb
  cb JB, Jump on Below to absolute address cb
  cb JBE, Jump on Below or Equal to absolute address cb
  cb JC, Jump on Carry to absolute address cb
  cb JCXZ, Jump if CX Zero to absolute address cb
  cb JE, Jump on Equal to absolute address cb
  cb JG, Jump on Greater than to absolute address cb
  cb JGE, Jump on Greater than or Equal to absolute address cb
  cb JL, Jump on Less than to absolute address cb
  cb JLE, Jump on Less than or Equal to absolute address cb
  c JMP, JuMP unconditionally to absolute address c
  ew JMP, JuMP unconditionally to absolute address stored in source word
u c JMPF, JuMP unconditionally to absolute address c in memory segment u
  mw JMPF, JuMP unconditionally to address/segment stored in source double word
  cb JNA, Jump on Not Above to absolute address cb
  cb JNAE, Jump on Not Above or Equal to absolute address cb
  cb JNB, Jump on Not Below to absolute address cb
  cb JNBE, Jump on Not Below or Equal to absolute address cb
  cb JNC, Jump on Not Carry to absolute address cb
  cb JNE, Jump on Not Equal to absolute address cb
  cb JNG, Jump on Not Greater than to absolute address cb
  cb JNGE, Jump on Not Greater than or Equal to absolute address cb
  cb JNL, Jump on Not Less than to absolute address cb
  cb JNLE, Jump on Not Less than or Equal to absolute address cb
  cb JNO, Jump on Not Overflow to absolute address cb
  cb JNP, Jump on Not Parity to absolute address cb
  cb JNS, Jump on Not Sign to absolute address cb
  cb JNZ, Jump on Not Zero to absolute address cb
  cb JO, Jump on Overflow to absolute address cb
  cb JP, Jump on Parity to absolute address cb
  cb JPE, Jump on Parity Equal to absolute address cb
  cb JPO, Jump on Parity Odd to absolute address cb
  cb JS, Jump on Sign to absolute address cb
  cb JZ, Jump on Zero to absolute address cb
    LAHF, Load register AH from Flags
rw ew LDS, Load pointer using DS and destination register from source double word
rw ew LEA, Load Effective Address of source into destination register
rw ew LES, Load pointer using ES and destination register from source double word
    LOCK LOCK the bus prefix
    LODSB, LOaD String of Bytes
    LODSW, LOaD String of Words
  cb LOOP, LOOP to absolute address cb
  cb LOOPE, LOOP while Equal to absolute address cb
  cb LOOPNE, LOOP while Not Equal to absolute address cb
  cb LOOPNZ, LOOP while Not Zero to absolute address cb
  cb LOOPZ, LOOP while Zero to absolute address cb
ex n MOV, MOVe immediate source to destination byte or word
eb rb MOV, MOVe source byte register to destination byte
ew rw MOV, MOVe source word register to destination word
rb eb MOV, MOVe source byte to destination byte register
rw ew MOV, MOVe source word to destination word register
ew sr MOV, MOVe source segment register to destination word
sx ew MOV, MOVe source word to destination segment register
    MOVSB, MOVe String of Bytes
    MOVSW, MOVe String of Words
  ex MUL, MULtiply AL by source byte or AX by source word (unsigned)
ex   NEG, NEGate destination byte or word
    NOP, No OPeration
ex   NOT, logical NOT destination byte or word
ex n OR, OR immediate source to destination byte or word
eb rb OR, OR source byte register to destination byte
ew rw OR, OR source word register to destination word
rb eb OR, OR source byte to destination byte register
rw ew OR, OR source word to destination word register
pb ar OUT, OUTput byte or word from AL or AX to immediate destination port
DX ar OUT, OUTput byte or word from AL or AX to destination port in DX
sx   POP, POP word from stack into destination segment register
ew   POP, POP word from stack into destination
    POPF, POP flags from stack
  sr PUSH, PUSH source segment register to stack
  ew PUSH, PUSH source word to stack
    PUSHF, PUSH flags to stack
ex 1 RCL, Rotate destination byte or word through Carry Left by 1 bit
ex CL RCL, Rotate destination byte or word through Carry Left by CL bits
ex 1 RCR, Rotate destination byte or word through Carry Right by 1 bit
ex CL RCR, Rotate destination byte or word through Carry Right by CL bits
    REP REPeat prefix
    REPE REPeat while Equal prefix
    REPNE REPeat while Not Equal prefix
    REPNZ REPeat while Not Zero prefix
    REPZ REPeat while Zero prefix
    RET, RETurn from subroutine
  u RET, RETurn from subroutine and add u to SP
    RETF, RETurn from Far subroutine
  u RETF, RETurn from Far subroutine and add u to SP
ex 1 ROL, Rotate destination byte or word Left by 1 bit
ex CL ROL, Rotate destination byte or word Left by CL bits
ex 1 ROR, Rotate destination byte or word Right by 1 bit
ex CL ROR, Rotate destination byte or word Right by CL bits
    SAHF, Store register AH into Flags
ex 1 SAL, Shift destination byte or word Arithmetic Left by 1 bit
ex CL SAL, Shift destination byte or word Arithmetic Left by CL bits
ex 1 SAR, Shift destination byte or word Arithmetic Right by 1 bit
ex CL SAR, Shift destination byte or word Arithmetic Right by CL bits
ex n SBB, SuBtract with Borrow immediate source from destination byte or word
eb rb SBB, SuBtract with Borrow source byte register from destination byte
ew rw SBB, SuBtract with Borrow source word register from destination word
rb eb SBB, SuBtract with Borrow source byte from destination byte register
rw ew SBB, SuBtract with Borrow source word from destination word register
    SCASB, SCAn String of Bytes
    SCASW, SCAn String of Words
ex 1 SHL, SHift destination byte or word logical Left by 1 bit
ex CL SHL, SHift destination byte or word logical Left by CL bits
ex 1 SHR, SHift destination byte or word logical Right by 1 bit
ex CL SHR, SHift destination byte or word logical Right by CL bits
    SS: Stack Segment prefix
    STC, SeT Carry flag
    STD, SeT Direction flag
    STI, SeT Interrupt-enable flag
    STOSB, STOre String of Bytes
    STOSW, STOre String of Words
ex n SUB, SUBtract immediate source from destination byte or word
eb rb SUB, SUBtract source byte register from destination byte
ew rw SUB, SUBtract source word register from destination word
rb eb SUB, SUBtract source byte from destination byte register
rw ew SUB, SUBtract source word from destination word register
ex n TEST, TEST immediate source with destination byte or word
eb rb TEST, TEST source byte register with destination byte
ew rw TEST, TEST source word register with destination word
rb eb TEST, TEST source byte with destination byte register
rw ew TEST, TEST source word with destination word register
    WAIT, enter WAIT state
eb rb XCHG, eXCHanGe source byte register with destination byte
ew rw XCHG, eXCHanGe source word register with destination word
rb eb XCHG, eXCHanGe source byte with destination byte register
rw ew XCHG, eXCHanGe source word with destination word register
    XLATB, TransLATe Byte
ex n XOR, eXclusive OR immediate source to destination byte or word
eb rb XOR, eXclusive OR source byte register to destination byte
ew rw XOR, eXclusive OR source word register to destination word
rb eb XOR, eXclusive OR source byte to destination byte register
rw ew XOR, eXclusive OR source word to destination word register

The symbols used in the destination and source operand columns have the following meaning:

a  ::= any ADDRESS
ar ::= AX | AL
c  ::= any CODE
cb ::= any CODE; CODE-HERE - 127 < cb < CODE-HERE + 130
d  ::= any DOUBLE
eb ::= rb | mb
ew ::= rw | mw
ex ::= eb | ew
mb ::= BYTE[BX+SI] | BYTE[BX+DI] | BYTE[BP+SI] | BYTE[BP+DI] |
       BYTE[SI] | BYTE[DI] | BYTE[BP] | BYTE[BX] |
       n BYTE[BX+SI]+ | n BYTE[BX+DI]+ | n BYTE[BP+SI]+ | n BYTE[BP+DI]+ |
       n BYTE[SI]+ | n BYTE[DI]+ | n BYTE[BP]+ | n BYTE[BX]+ |
       a BYTE[]
mw ::= [BX+SI] | [BX+DI] | [BP+SI] | [BP+DI] |
       [SI] | [DI] | [BP] | [BX] |
       n [BX+SI]+ | n [BX+DI]+ | n [BP+SI]+ | n [BP+DI]+ |
       n [SI]+ | n [DI]+ | n [BP]+ | n [BX]+ |
       a WORD[]
n  ::= any SINGLE
pb ::= any PORT; 0 <= pb <= 255
rb ::= AL | CL | DL | BL | AH | CH | DH | BH
rw ::= AX | CX | DX | BX | SP | BP | SI | DI
sr ::= CS | DS | ES | SS
sx ::= DS | ES | SS
u  ::= any UNSIGNED
ub ::= any UNSIGNED; 0 <= ub <= 255

The instruction list includes three instructions for compiling bytes, words and double words into the code space: DB, DW, and DD, respectively. These words are mostely used by the assembler itself. They are defined as follows:

: DB, ( SINGLE -- ) SPACE@ CODE-SPACE SWAP C, SPACE! ;
: DW, ( SINGLE -- ) SPACE@ CODE-SPACE SWAP  , SPACE! ;
: DD, ( DOUBLE -- ) SPACE@ CODE-SPACE SWAP  , SPACE! ;

Instructions whose name does not end with a comma are prefixes, that have to be used in combination with another instruction. One of those, ES:, is actually used within NEXT,. The syntax for prefixes in relation to the assembly instruction they are applied to is rather simple and obvious: Prefixes have to be executed immediately before the assembly instruction, but they may be mixed with addressing modes and immediate values if convenient. For example, these two phrases generate the same code:

ES: AX [DI] MOV,
AX ES: [DI] MOV,

Conditionals

The strongForth assembler supports structured programming by using the following instructions instead of conditional and unconditional jump instructions and explicit labels:

IFcc,    ( -- ORIGIN-CODE )
ELSE,    ( ORIGIN-CODE -- 1ST )
THEN,    ( ORIGIN-CODE -- )
AHEAD,   ( -- ORIGIN-CODE )
BEGIN,   ( -- DESTINATION-CODE )
UNTILcc, ( DESTINATION-CODE -- )
AGAIN,   ( DESTINATION-CODE -- )
WHILEcc, ( DESTINATION-CODE -- ORIGIN-CODE 1ST )
REPEAT,  ( ORIGIN-CODE DESTINATION-CODE -- )

IFcc, UNTILcc, and WHILEcc, each stand for a whole group of instructions, with cc being a condition derived from the bits of the 8086 flags register:

A above
AE above or equal
B below
BE below or equal
C carry
E equal
G greater than
GE greater than or equal
L less than
LE less than or equal
NA not above
NAE not above or equal
NB not below
NBE not below or equal
NC not carry
NCXZ not CX zero
NE not equal
NG not greater than
NGE not greater than or equal
NL not less than
NLE not less than or equal
NO not overflow
NP not parity
NS not sign
NZ not zero
O overflow
P parity
PE parity even
PO parity odd
S sign
Z zero

The above instructions are the assembler's equivalent to the words IF, ELSE, THEN, AHEAD, BEGIN, UNTIL, WHILE and REPEAT from the Core word set and the word AGAIN from the Core extension word set. The data types ORIGIN-CODE and DESTINATION-CODE are used in a similar way as ORIGIN and DESTINATION. This means that structures like

... IFcc, ... THEN, ...
... IFcc, ... ELSE, ... THEN, ...
... AHEAD, ... THEN, ...
... BEGIN, ... UNTILcc, ...
... BEGIN, ... AGAIN, ...
... BEGIN, ... WHILEcc, ... REPEAT, ...

can be inserted into the assembly code. These structures can also be nested if desired. They will be translated into appropriate conditional and unconditional branch instructions. Let's try an example:

CODE > ( UNSIGNED 1ST -- FLAG )
 OK
BX POP, AX POP, AX BX CMP,
 OK
IFA, AX TRUE MOV,
 OK
ELSE, AX AX XOR,
 OK
THEN, AX PUSH,
 OK
NEXT, END-CODE
 OK
DISASSEMBLE >
1B15: BX POP,
1B16: AX POP,
1B17: AX BX CMP,
1B19: 1B20 JBE,
1B1B: AX FFFF MOV,
1B1E: 1B22 JMP,
1B20: AX AX XOR,
1B22: AX PUSH,
1B23: ES: LODSW,
1B25: BX AX MOV,
1B27: ES: [BX] JMP,
 OK

As expected, IFA, generates a conditional jump for the inverse condition. Above turns to Below or Equal, because the if branch should be skipped whenever the condition is not true. ELSE, resolves the conditional jump generated by IFA,, and compiles an unconditional jump that skips the else branch. Finally, THEN, resolves the unconditional jump compiled by ELSE,.

A second example demonstrates a typical loop structure. BITS takes an item of data type SINGLE from the stack and returns the number of bits required to represent it's value. BITS actually returns the bit number of the highest 1 bit in SINGLE, plus 1.

CODE BITS ( SINGLE -- UNSIGNED )
 OK
AX POP, CX CX XOR, AX AX TEST,
 OK
BEGIN,
 OK
WHILENZ, CX INC, AX 1 SHR,
 OK
REPEAT, CX PUSH,
 OK
NEXT, END-CODE
 OK
DISASSEMBLE BITS
1B62: AX POP,
1B63: CX CX XOR,
1B65: AX AX TEST,
1B67: 1B6E JZ,
1B69: CX INC,
1B6A: AX 1 SHR,
1B6C: 1B67 JMP,
1B6E: CX PUSH,
1B6F: ES: LODSW,
1B71: BX AX MOV,
1B73: ES: [BX] JMP,
 OK

WHILENZ, compiles a conditional branch for the inverse condition, and REPEAT, compiles an unconditional branch and resolves the branch addresses of the complete loop structure. It is not necessary to define any labels.

Labels

The location counter of the assembler is identical to strongForth's code space pointer as it is delivered by CODE-HERE. This word is part of strongForth's the Core word set. Since most other assemblers use a special symbol for the location counter, it might be useful to define an alias:

' CODE-HERE ALIAS $ ( -- CODE ) END-CODE
Although the existence of conditional instructions greatly reduces the need to explicitely define labels or to access the location counter in general, it can in certain situations be necessary to define labels. A label can be defined with

CODE-HERE CONSTANT name

where name is the name of the label. However, this phrase looks a little bit awkward when embedded into assembly code. As an alternative, the strongForth assembler provides the word LABEL, which allows labels to be defined as follows:

LABEL name

This is the definition of LABEL:

: LABEL ( -- )
  CODE-HERE [DT] CODE DTP@ ! CONSTANT ;

The phrase [DT] CODE DTP@ ! is required, because CONSTANT expects the data type of the constant at the first unused address of the data type heap. This data type is automaticall available only if CONSTANT is interpreted. The data type is simply left there by the interpreter. But if CONSTANT is executed as part of a compiled word, you have to make sure yourself that it finds the correct data type at the location the data type heap pointer points to.

Disassembler

The usage of the disassembler has already been demonstrated several times in the previous paragraphs. DISASSEMBLE is normally the only word of the disassembler package that will be used. It disassebles machine code beginning at the machine code address of a word until the instruction [BX] JMP, is found. This is the last instruction of the inner interpreter, which typically is the last action of a code definition. If this does not work, because the code definition does not execute the inner interpreter at the end, or it has more than one exit point, or [BX] JMP, is used for other purposes as well, an overloaded version of DISASSEMBLE can be used. This version expects an explicit line count as input parameter:

4 DISASSEMBLE BITS
1B62: AX POP,
1B63: CX CX XOR,
1B65: AX AX TEST,
1B67: 1B6E JZ,
 OK

Dr. Stephan Becher - December 23rd, 2005