\ Lesson 9 - Compiler Words \ The Forth Course \ by Richard E. Haskell \ Dept. of Computer Science and Engineering \ Oakland University, Rochester, MI 48309 comment: Lesson 9 COMPILER WORDS 9.1 COMPILING VS. INTERPRETING 9-2 9.2 COMPILE AND [COMPILE] 9-5 9.3 LITERALS 9-6 9.4 CONDITIONAL COMPILER WORDS 9-8 BEGIN...WHILE...REPEAT 9-9 IF...ELSE...THEN 9-10 BEGIN...AGAIN 9-11 BEGIN...UNTIL 9-12 DO...LOOP 9-13 9.5 EXERCISES 9-14 9.1 COMPILING VS. INTERPRETING Compiler words are IMMEDIATE words. This means that they are executed immediately when they are encountered in a colon definition rather than being compiled into the list segment. Immediate words have the precedence bit in the name field set (see Lesson 3, Section 3.12). F-PC can be in one of two possible states: compiling or interpreting. It is in the compiling state during the compilation of a colon definition. That is, between the time the word "colon" (:) is executed and the word "semi-colon" (;) is executed. The system variable STATE contains the following possible two values: TRUE -- if compiling FALSE -- if interpreting To test what state exists at different times, consider the following two definitions: comment; : 1state? ( -- ) STATE @ IF ." Compiling" ELSE ." Interpreting" THEN CR ; : 1test ( -- ) 1state? ; comment: After FLOADing lesson9 type 1state? and 1test Note that "interpreting" is printed in each case. Why? It is because you are in the interpreting mode when you type both 1state? and 1test. How can you get "Compiling" to be printed? It is necessary to have 1state? execute during the time that 1test is being compiled. That is, we must make 1state? an immediate word. We do this by adding the word IMMEDIATE to the definition following the semi-colon. Let's change the name to 2state? and define comment; : 2state? ( -- ) STATE @ IF ." Compiling" ELSE ." Interpreting" THEN CR ; IMMEDIATE comment: Now type in the following colon definition: : 2test 2state? ; Note that when you type this definition, the word "Compiling" is printed as soon as you press <Enter>. That is, the word 2state? is executed immediately and does not wait for you to later type 2test. Now type 2test Note that nothing is printed on the screen. This is because 2state? was not compiled in the dictionary. It was just executed immediately. Immediate words do not get compiled in the dictionary unless forced to do so. You can force an immediate word to be compiled, rather than be executed immediately, by preceding the word by [COMPILE]. Thus, in the following definition of 3test the word 2state? is compiled and not executed immediately. comment; : 3test ( -- ) [COMPILE] 2state? ; comment: What do you think will be printed when you type 3test? Try it. It is also possible to turn the compiler off and on within a colon definition by using the two words [ and ]. The word [ is an immediate word that turns the compiling state off; that is, it returns to the interpreting mode. The definition of [ is : [ ( -- ) STATE OFF ; IMMEDIATE The word ] turns the compile mode on and then enters the compiling loop. The compiling loop consists of the following: DO Get next word in input stream; If it is an immediate word, execute it Else compile it; If the word is not in the dictionary, convert it to a number and compile it; UNTIL end of input stream. An a final example, type in the following example: : 4test [ 1state? ] ; Note that when you press <Enter> the word "interpreting" is printed. Why? 9.2 COMPILE AND [COMPILE] We have seen that [COMPILE] will compile the following immediate word into the list segment. Its definition is : [COMPILE] ( -- ) ' X, ; IMMEDIATE The word "tick" (') will put the CFA of the next (immediate) word on the stack and the word X, compiles the integer on the stack into the next available location in the list dictionary. Note that [COMPILE] itself is an immediate word so that it is executed at compile time of the word that contains it. Sometimes you want to compile a word at run time. The word COMPILE will do this. For example, the definition of "semi-colon" is basically the following. : ; ( -- ) COMPILE UNNEST \ compile the UNNEST routine REVEAL \ make the colon word available [COMPILE] [ \ go to interpreting mode ; IMMEDIATE \ do ; immediately Note that ; is an immediate word so that it will be executed when it is encountered in a colon definition. It will COMPILE the CFA of the UNNEST routine in the list dictionary of the colon word. After making the colon word available to dictionary searches by the word REVEAL it switches to the interpreting mode by executing [ which had been compiled in the definition of ; (even though [ is an immediate word) by [COMPILE]. The definition of [COMPILE], which will compile the CFA of the following non-immediate word when the word containing COMPILE executes, is the following : COMPILE ( -- ) 2R@ SWAP \ get ES:SI of next CFA in list seg R> 2+ >R \ inc SI past next word in list seg @L \ get CFA on next word in list seg ,X ; \ & compile it at run time 9.3 LITERALS Consider the following colon definition: comment; : four+ ( n -- n+4 ) 4 + ; comment: This will be compiled into the dictionary as follows: four+ ________ | CFA | CODE | <------| |------| _________ PFA | LSO | -------- +XSEG -------> | (LIT) | ES:0 |------| |-------| Code Segment ?CS: | 4 | IP = ES:SI |-------| | + | |-------| |UNNEST | |-------| List Segment XSEG The word (LIT) is a code word defined as follows: CODE (LIT) ( -- n ) LODSW ES: \ get next word at ES:SI, SI=SI+2 1PUSH \ push it on stack END-CODE Thus the word (LIT) will push the number 4 on the stack and the instruction pointer ES:SI will then be pointing to the CFA of +. If you have a number on the stack and you want to compile it as a literal in the list dictionary you can use the word LITERAL. This word is defined as follows: : LITERAL ( n -- ) COMPILE (LIT) \ compile (LIT) X, \ plus the value n ; IMMEDIATE \ immediately A useful application of the word LITERAL is when you compute some constant value in a definition. For example, suppose that the value 5 is computed as 2 + 3 (perhaps for clarity in some definition). You might then define the word five+ as follows: comment; : five+ ( n -- n+5 ) [ 3 2 + ] LITERAL + ; comment: Although this produces the same answer as : five+ 3 2 + + ; the advantange is that [ 3 2 + ] LITERAL will compile the literal 5 in the list dictionary at compile time so that at run time only 5 + is executed. On the other hand, 3 2 + + will compile both a literal 3 and a literal 2 in the list dictionary and will execute two plus operations at run time. Therefore, the use of [ 3 2 + ] LITERAL produces more efficient code that executes faster. 9.4 CONDITIONAL COMPILER WORDS The two conditional compiler words BRANCH and ?BRANCH are used to define the various conditional branching instructions in F-PC. The word BRANCH is a code word that is defined as follows: |-------| CODE BRANCH ( -- ) |BRANCH | LABEL BRAN1 |-------| MOV ES: SI, 0[SI] |----| addr | IP = ES:SI NEXT | |-------| END-CODE | | | | |-------| | | | | |-------| |--->| | ES:addr |-------| | | |-------| List Segment XSEG BRANCH is compiled into the list dictionary followed by the offset address that is to be unconditionally branched to. The word ?BRANCH will cause a branch to the address following ?BRANCH if the flag on top of the stack is FALSE. It is defined as follows: |-------| CODE ?BRANCH ( f -- ) |?BRANCH| POP AX |-------| OR AX, AX |----| addr | IP = ES:SI JE BRAN1 | |-------| ADD SI, # 2 | | | NEXT | |-------| END-CODE | | | | |-------| |--->| | ES:addr |-------| | | |-------| List Segment XSEG BEGIN...WHILE...REPEAT As an example of a BEGIN...WHILE...REPEAT loop recall the definition of the word "factorial" given in Lesson 4: : factorial ( n -- n! ) 1 2 ROT \ x i n BEGIN \ x i n 2DUP <= \ x i n f WHILE \ x i n -ROT TUCK \ n i x i * SWAP \ n x i 1+ ROT \ x i n REPEAT \ x i n 2DROP ; \ x This definition will be stored in the list dictionary as follows: _________ | (LIT) | |-------| | 1 | XHERE ( -- seg offset ) |-------| | (LIT) | |-------| | 2 | STACK |-------| | ROT | |-------| xhere1 ---> | 2DUP | <--- : BEGIN XHERE NIP ; IMMEDIATE |-------| | <= | |-------| |?BRANCH| <--- : WHILE COMPILE ?BRANCH xhere1 |-------| XHERE NIP 0 X, xhere2 ---> | 0 | <---| SWAP ; IMMEDIATE |-------| | | -ROT | | |-------| | | TUCK | | xhere3 |-------| | | * | | |-------| | | SWAP | | |-------| | | 1+ | | |-------| | | ROT | | |-------| | | BRANCH| | <--- : REPEAT COMPILE BRANCH X, |-------| | XHERE -ROT | xhere1| |-------------- SWAP !L ; |-------| IMMEDIATE xhere3 ---> | 2DROP | seg |-------| xhere2 |UNNEST | |-------| List Segment XSEG The word BEGIN leaves the offset address xhere1 on the stack. The word WHILE compiles ?BRANCH followed by a 0 at xhere2. This value of 0 will later be replaced with the address xhere3 of the word 2DROP. WHILE also leaves the value of xhere2 on the stack under xhere1. The word REPEAT compiles BRANCH and then commas in the address xhere1. It then puts the address xhere3 on the stack and then stores it at address seg:xhere2. IF...ELSE...THEN Consider the following colon definition: : test ( f -- f ) IF TRUE ELSE FALSE THEN ; This will be stored in the list dictionary as follows: _________ |?BRANCH| <--- : IF COMPILE ?BRANCH |-------| XHERE NIP 0 X, xhere1 ---> | 0 | <---| ; IMMEDIATE |-------| | | TRUE | |xhere3 |-------| | | BRANCH| | <--- : ELSE COMPILE BRANCH |-------| | XHERE NIP 0 X, xhere2 ---> | 0 |<--| | SWAP XHERE |-------| | |------------ -ROT SWAP !L xhere3 ---> | FALSE | |xhere4 ; IMMEDIATE |-------| | xhere4 ---> |UNNEST | | <--- : THEN XHERE |-------| |------------ -ROT SWAP !L ; IMMEDIATE List Segment XSEG The word IF compiles ?BRANCH followed by a 0 at xhere1. This value of 0 will later be replaced with the address xhere3 of the word FALSE. IF also leaves the value of xhere1 on the stack. The word ELSE compiles BRANCH followed by a 0 at xhere2. This value of 0 will later be replaced with the address xhere4 of the word UNNEST. ELSE also leaves the value of xhere2 on the stack after putting the address xhere3 on the stack and then storing it at address seg:xhere1. The word THEN puts the address xhere4 on the stack and then stores it at address seg:xhere2. BEGIN...AGAIN An example of a using BEGIN...AGAIN was given in the pop-up menu examples in Lesson 8. The typical form was : main ( -- ) minit BEGIN KEY do.key AGAIN ; This will be stored in the list dictionary as follows: _________ | minit | |-------| xhere1 ---> | KEY | <--- : BEGIN XHERE NIP ; IMMEDIATE |-------| | do.key| |-------| | BRANCH| <--- : AGAIN COMPILE BRANCH X, |-------| ; IMMEDIATE | xhere1| |-------| |UNNEST | |-------| List Segment XSEG The word BEGIN leaves the offset address xhere1 on the stack. The word AGAIN compiles BRANCH and then commas in the address xhere1. BEGIN...UNTIL The following example of a using BEGIN...UNTIL was given in Lesson 4. : dowrite ( -- ) BEGIN KEY DUP EMIT 13 = UNTIL ; This will be stored in the list dictionary as follows: _________ xhere1 ---> | KEY | <--- : BEGIN XHERE NIP ; IMMEDIATE |-------| | DUP | |-------| | EMIT | |-------| | (LIT) | |-------| | 13 | |-------| | = | |-------| |?BRANCH| <--- : UNTIL COMPILE ?BRANCH X, |-------| ; IMMEDIATE | xhere1| |-------| |UNNEST | |-------| List Segment XSEG The word BEGIN leaves the offset address xhere1 on the stack. The word UNTIL compiles ?BRANCH and then commas in the address xhere1. Note that the only difference between BEGIN...AGAIN and BEGIN...UNTIL is that in UNTIL ?BRANCH replaces the BRANCH in AGAIN. DO...LOOP A DO loop will produce the following structure in the list dictionary: __________ | (DO) | <--- : DO COMPILE (DO) |--------| XHERE NIP 0 X, xhere1 ---> | 0 | <---| ; IMMEDIATE |--------| | xhere1 + 2 ---> | |<--| |xhere2 |--------| | | | | | | |--------| | | | | | | |--------| | | | (LOOP) | | | <--- : LOOP COMPILE (LOOP) |--------| | | DUP 2+ X, |xhere1+2|---| | XHERE |--------| |----------- -ROT SWAP !L xhere2 ---> | | ; IMMEDIATE |--------| List Segment XSEG The word DO compiles (DO) followed by a 0 at xhere1. This value of 0 will later be replaced with the address xhere2 of the first word following the DO loop. DO also leaves the value of xhere1 on the stack. The word LOOP compiles (LOOP) and then commas in the address xhere1+2. LOOP then puts the address xhere2 on the stack and then stores it at address seg:xhere1. The run-time word (DO) ( limit index -- ) sets up the return stack as follows: Return Stack ___________________________ | index - (limit + 8000H) | |-------------------------| | limit + 8000H | |-------------------------| | xhere2 | |-------------------------| The run-time word (LOOP) adds 1 to the top of the return stack and jumps to xhere1+2 if the overflow flag is not set. If the overflow flag is set (i.e. the top of the stack crosses the 8000H boundary when index = limit) (LOOP) pops the three items from the return stack and moves the instruction pointer ES:SI to xhere2. Having the value of xhere2 as the third item on the return stack is used by LEAVE to find the address to leave to. Adding the value of 8000H to the top two values on the return stack when executing (DO) allows the DO loop to work properly when the limit is larger than 8000H. For example, suppose the limit were FFFFH and the initial index were 0. The initial value on top of the return stack would be -7FFFH. As 1 is added to this value, the overflow flag will not be set until the top of the stack becomes equal to 8000H, that is, after FFFFH loops. 9.5 EXERCISES 9.1 Use the words SEE and LDUMP to investigate the dictionary structure of the following three test words: comment; : a.test ( f -- ) IF ." True" ELSE ." False" THEN ; : b.test ( -- ) 5 0 DO I . LOOP ; : c.test ( -- ) 4 BEGIN DUP . 1- DUP 0= UNTIL DROP ; comment: For each word draw the dictionary structure indicating the names and actual values of all entries in the list dictionary. Indicate on these drawings the effects of the words IF, ELSE, THEN, DO, LOOP, BEGIN and UNTIL. Also explain how the word ." works in a.test and how the numbers 5, 0 and 4 are stored in b.test and c.test. comment;