Lesson 9

\       Lesson 9 - Compiler Words
\       The Forth Course
\       by Richard E. Haskell
\          Dept. of Computer Science and Engineering
\          Oakland University, Rochester, MI 48309

comment:



                                Lesson 9

                             COMPILER WORDS


                9.1  COMPILING VS. INTERPRETING                 9-2

                9.2  COMPILE AND [COMPILE]                      9-5

                9.3  LITERALS                                   9-6

                9.4  CONDITIONAL COMPILER WORDS                 9-8

                     BEGIN...WHILE...REPEAT                     9-9

                     IF...ELSE...THEN                           9-10

                     BEGIN...AGAIN                              9-11

                     BEGIN...UNTIL                              9-12

                     DO...LOOP                                  9-13

                9.5  EXERCISES                                  9-14





















9.1  COMPILING VS. INTERPRETING

        Compiler words are IMMEDIATE words.  This means that they are
        executed immediately when they are encountered in a colon
        definition rather than being compiled into the list segment.
        Immediate words have the precedence bit in the name field set
        (see Lesson 3, Section 3.12).

        F-PC can be in one of two possible states: compiling or interpreting.
        It is in the compiling state during the compilation of a colon
        definition.  That is, between the time the word "colon" (:) is
        executed and the word "semi-colon" (;) is executed.
        The system variable STATE contains the following possible two values:

                TRUE  --  if compiling
                FALSE --  if interpreting

        To test what state exists at different times, consider the following
        two definitions:
comment;

        : 1state?       ( -- )
                        STATE @
                        IF
                           ." Compiling"
                        ELSE
                           ." Interpreting"
                        THEN
                        CR ;

        : 1test         ( -- )
                        1state? ;

comment:
        After FLOADing lesson9 type

                1state?
        and
                1test

        Note that "interpreting" is printed in each case.  Why?

        It is because you are in the interpreting mode when you type both
        1state? and 1test.











        How can you get "Compiling" to be printed?
        It is necessary to have 1state? execute during the time that
        1test is being compiled.  That is, we must make 1state? an
        immediate word.  We do this by adding the word IMMEDIATE to the
        definition following the semi-colon.  Let's change the name to
        2state? and define
comment;

        : 2state?       ( -- )
                        STATE @
                        IF
                           ." Compiling"
                        ELSE
                           ." Interpreting"
                        THEN
                        CR ; IMMEDIATE

comment:
        Now type in the following colon definition:

        : 2test   2state? ;

        Note that when you type this definition, the word "Compiling"
        is printed as soon as you press <Enter>.  That is, the word
        2state? is executed immediately and does not wait for you to
        later type 2test.  Now type

                2test

        Note that nothing is printed on the screen.  This is because
        2state? was not compiled in the dictionary.  It was just executed
        immediately.  Immediate words do not get compiled in the dictionary
        unless forced to do so.  You can force an immediate word to be
        compiled, rather than be executed immediately, by preceding the
        word by [COMPILE].  Thus, in the following definition of 3test
        the word 2state? is compiled and not executed immediately.
comment;

        : 3test         ( -- )
                        [COMPILE] 2state? ;

comment:
        What do you think will be printed when you type 3test?  Try it.












        It is also possible to turn the compiler off and on within a colon
        definition by using the two words [ and ].
        The word [ is an immediate word that turns the compiling state off;
        that is, it returns to the interpreting mode.  The definition of [ is

        : [     ( -- )
                STATE OFF ; IMMEDIATE

        The word ] turns the compile mode on and then enters the compiling
        loop.  The compiling loop consists of the following:

        DO      Get next word in input stream;
                If it is an immediate word, execute it
                Else compile it;
                If the word is not in the dictionary,
                  convert it to a number and compile it;
        UNTIL end of input stream.

        An a final example, type in the following example:

        : 4test  [ 1state? ] ;

        Note that when you press <Enter> the word "interpreting" is printed.
        Why?































9.2  COMPILE AND [COMPILE]

        We have seen that [COMPILE] will compile the following immediate
        word into the list segment.  Its definition is

        : [COMPILE]     ( -- )
                        ' X, ; IMMEDIATE

        The word "tick" (') will put the CFA of the next (immediate) word
        on the stack and the word X, compiles the integer on the stack
        into the next available location in the list dictionary.  Note that
        [COMPILE] itself is an immediate word so that it is executed at
        compile time of the word that contains it.

        Sometimes you want to compile a word at run time.  The word COMPILE
        will do this.  For example, the definition of "semi-colon" is
        basically the following.

        : ;     ( -- )
                COMPILE UNNEST          \ compile the UNNEST routine
                REVEAL                  \ make the colon word available
                [COMPILE] [             \ go to interpreting mode
                ; IMMEDIATE             \ do ; immediately

        Note that ; is an immediate word so that it will be executed when
        it is encountered in a colon definition.  It will COMPILE the CFA
        of the UNNEST routine in the list dictionary of the colon word.
        After making the colon word available to dictionary searches by the
        word REVEAL it switches to the interpreting mode by executing [
        which had been compiled in the definition of ; (even though [ is
        an immediate word) by [COMPILE].

        The definition of [COMPILE], which will compile the CFA of the
        following non-immediate word when the word containing COMPILE
        executes, is the following

        : COMPILE       ( -- )
                2R@ SWAP                \ get ES:SI of next CFA in list seg
                R> 2+ >R                \ inc SI past next word in list seg
                @L                      \ get CFA on next word in list seg
                ,X ;                    \ & compile it at run time















9.3  LITERALS

         Consider the following colon definition:
comment;

        : four+         ( n -- n+4 )
                        4 + ;
comment:
        This will be compiled into the dictionary as follows:

                                 four+
                      ________        |
                  CFA | CODE | <------|
                      |------|                          _________
                  PFA | LSO  | -------- +XSEG ------->  | (LIT) | ES:0
                      |------|                          |-------|
                  Code Segment ?CS:                     |   4   | IP = ES:SI
                                                        |-------|
                                                        |   +   |
                                                        |-------|
                                                        |UNNEST |
                                                        |-------|
                                                    List Segment XSEG

        The word (LIT) is a code word defined as follows:

        CODE (LIT)      ( -- n )
                        LODSW ES:       \ get next word at ES:SI, SI=SI+2
                        1PUSH           \ push it on stack
                        END-CODE

        Thus the word (LIT) will push the number 4 on the stack and the
        instruction pointer ES:SI will then be pointing to the CFA of +.






















        If you have a number on the stack and you want to compile it as
        a literal in the list dictionary you can use the word LITERAL.
        This word is defined as follows:

        : LITERAL       ( n -- )
                        COMPILE (LIT)   \ compile (LIT)
                        X,              \ plus the value n
                        ; IMMEDIATE     \ immediately

        A useful application of the word LITERAL is when you compute
        some constant value in a definition.  For example, suppose that
        the value 5 is computed as 2 + 3 (perhaps for clarity in some
        definition).  You might then define the word five+ as follows:
comment;

        : five+         ( n -- n+5 )
                        [ 3 2 + ] LITERAL + ;

comment:
        Although this produces the same answer as

        : five+         3 2 + + ;

        the advantange is that [ 3 2 + ] LITERAL will compile the literal
        5 in the list dictionary at compile time so that at run time only
        5 + is executed.  On the other hand, 3 2 + + will compile both a
        literal 3 and a literal 2 in the list dictionary and will execute
        two plus operations at run time.  Therefore, the use of
        [ 3 2 + ] LITERAL produces more efficient code that executes faster.

























9.4  CONDITIONAL COMPILER WORDS

        The two conditional compiler words BRANCH and ?BRANCH are used
        to define the various conditional branching instructions in F-PC.
        The word BRANCH is a code word that is defined as follows:

                                                        |-------|
        CODE BRANCH     ( -- )                          |BRANCH |
                LABEL BRAN1                             |-------|
                        MOV ES: SI, 0[SI]          |----| addr  | IP = ES:SI
                        NEXT                       |    |-------|
                        END-CODE                   |    |       |
                                                   |    |-------|
                                                   |    |       |
                                                   |    |-------|
                                                   |--->|       | ES:addr
                                                        |-------|
                                                        |       |
                                                        |-------|
                                                    List Segment XSEG

        BRANCH is compiled into the list dictionary followed by the
        offset address that is to be unconditionally branched to.

        The word ?BRANCH will cause a branch to the address following
        ?BRANCH if the flag on top of the stack is FALSE.  It is defined
        as follows:

                                                        |-------|
        CODE ?BRANCH     ( f -- )                       |?BRANCH|
                        POP AX                          |-------|
                        OR  AX, AX                 |----| addr  | IP = ES:SI
                        JE  BRAN1                  |    |-------|
                        ADD SI, # 2                |    |       |
                        NEXT                       |    |-------|
                        END-CODE                   |    |       |
                                                   |    |-------|
                                                   |--->|       | ES:addr
                                                        |-------|
                                                        |       |
                                                        |-------|
                                                    List Segment XSEG













BEGIN...WHILE...REPEAT

        As an example of a BEGIN...WHILE...REPEAT loop recall the
        definition of the word "factorial" given in Lesson 4:
        : factorial     ( n -- n! )
                        1 2 ROT                 \ x i n
                        BEGIN                   \ x i n
                           2DUP <=              \ x i n f
                        WHILE                   \ x i n
                           -ROT TUCK            \ n i x i
                           * SWAP               \ n x i
                           1+ ROT               \ x i n
                        REPEAT                  \ x i n
                        2DROP ;                 \ x
        This definition will be stored in the list dictionary as follows:
                          _________
                          | (LIT) |
                          |-------|
                          |   1   |         XHERE  ( -- seg offset )
                          |-------|
                          | (LIT) |
                          |-------|
                          |   2   |
             STACK        |-------|
                          |  ROT  |
                          |-------|
            xhere1 --->   | 2DUP  | <--- : BEGIN  XHERE NIP ; IMMEDIATE
                          |-------|
                          |  <=   |
                          |-------|
                          |?BRANCH| <--- : WHILE  COMPILE ?BRANCH
            xhere1        |-------|               XHERE NIP 0 X,
            xhere2 --->   |   0   | <---|         SWAP ; IMMEDIATE
                          |-------|     |
                          | -ROT  |     |
                          |-------|     |
                          | TUCK  |     | xhere3
                          |-------|     |
                          |   *   |     |
                          |-------|     |
                          | SWAP  |     |
                          |-------|     |
                          |  1+   |     |
                          |-------|     |
                          |  ROT  |     |
                          |-------|     |
                          | BRANCH|     | <--- : REPEAT  COMPILE BRANCH X,
                          |-------|     |                XHERE -ROT
                          | xhere1|     |--------------  SWAP !L ;
                          |-------|                      IMMEDIATE
            xhere3 --->   | 2DROP |
            seg           |-------|
            xhere2        |UNNEST |
                          |-------|
                       List Segment XSEG

        The word BEGIN leaves the offset address xhere1 on the stack.
        The word WHILE compiles ?BRANCH followed by a 0 at xhere2.
        This value of 0 will later be replaced with the address xhere3
        of the word 2DROP.  WHILE also leaves the value of xhere2 on the
        stack under xhere1.
        The word REPEAT compiles BRANCH and then commas in the address
        xhere1.  It then puts the address xhere3 on the stack and then
        stores it at address seg:xhere2.

IF...ELSE...THEN

        Consider the following colon definition:

        : test          ( f -- f )
                        IF
                           TRUE
                        ELSE
                           FALSE
                        THEN ;

        This will be stored in the list dictionary as follows:

                          _________
                          |?BRANCH| <--- : IF   COMPILE ?BRANCH
                          |-------|             XHERE NIP 0 X,
            xhere1 --->   |   0   | <---|       ; IMMEDIATE
                          |-------|     |
                          | TRUE  |     |xhere3
                          |-------|     |
                          | BRANCH|     | <--- : ELSE  COMPILE BRANCH
                          |-------|     |              XHERE NIP 0 X,
            xhere2 --->   |   0   |<--| |              SWAP XHERE
                          |-------|   | |------------  -ROT SWAP !L
            xhere3 --->   | FALSE |   |xhere4          ; IMMEDIATE
                          |-------|   |
            xhere4 --->   |UNNEST |   | <--- : THEN  XHERE
                          |-------|   |------------  -ROT SWAP !L
                                                     ; IMMEDIATE
                       List Segment XSEG

        The word IF compiles ?BRANCH followed by a 0 at xhere1.
        This value of 0 will later be replaced with the address xhere3
        of the word FALSE.  IF also leaves the value of xhere1 on the
        stack.

        The word ELSE compiles BRANCH followed by a 0 at xhere2.
        This value of 0 will later be replaced with the address xhere4
        of the word UNNEST.  ELSE also leaves the value of xhere2 on the
        stack after putting the address xhere3 on the stack and then
        storing it at address seg:xhere1.

        The word THEN puts the address xhere4 on the stack and then
        stores it at address seg:xhere2.

BEGIN...AGAIN

        An example of a using BEGIN...AGAIN was given in the pop-up menu
        examples in Lesson 8.  The typical form was

        : main          ( -- )
                        minit
                        BEGIN
                           KEY do.key
                        AGAIN ;

        This will be stored in the list dictionary as follows:

                          _________
                          | minit |
                          |-------|
            xhere1 --->   |  KEY  | <--- : BEGIN  XHERE NIP ; IMMEDIATE
                          |-------|
                          | do.key|
                          |-------|
                          | BRANCH| <--- : AGAIN   COMPILE BRANCH X,
                          |-------|                ; IMMEDIATE
                          | xhere1|
                          |-------|
                          |UNNEST |
                          |-------|
                       List Segment XSEG

        The word BEGIN leaves the offset address xhere1 on the stack.
        The word AGAIN compiles BRANCH and then commas in the address
        xhere1.

























BEGIN...UNTIL

        The following example of a using BEGIN...UNTIL was given in
        Lesson 4.

        : dowrite       ( -- )
                        BEGIN
                           KEY
                           DUP EMIT
                           13 =
                        UNTIL ;

        This will be stored in the list dictionary as follows:

                          _________
            xhere1 --->   |  KEY  | <--- : BEGIN  XHERE NIP ; IMMEDIATE
                          |-------|
                          |  DUP  |
                          |-------|
                          | EMIT  |
                          |-------|
                          | (LIT) |
                          |-------|
                          |  13   |
                          |-------|
                          |   =   |
                          |-------|
                          |?BRANCH| <--- : UNTIL   COMPILE ?BRANCH X,
                          |-------|                ; IMMEDIATE
                          | xhere1|
                          |-------|
                          |UNNEST |
                          |-------|
                       List Segment XSEG

        The word BEGIN leaves the offset address xhere1 on the stack.
        The word UNTIL compiles ?BRANCH and then commas in the address
        xhere1.  Note that the only difference between BEGIN...AGAIN and
        BEGIN...UNTIL is that in UNTIL ?BRANCH replaces the BRANCH in
        AGAIN.














DO...LOOP

        A DO loop will produce the following structure in the list
        dictionary:

                          __________
                          |  (DO)  | <--- : DO   COMPILE (DO)
                          |--------|             XHERE NIP 0 X,
              xhere1 ---> |   0    | <---|       ; IMMEDIATE
                          |--------|     |
          xhere1 + 2 ---> |        |<--| |xhere2
                          |--------|   | |
                          |        |   | |
                          |--------|   | |
                          |        |   | |
                          |--------|   | |
                          | (LOOP) |   | | <--- : LOOP  COMPILE (LOOP)
                          |--------|   | |              DUP 2+ X,
                          |xhere1+2|---| |              XHERE
                          |--------|     |-----------   -ROT SWAP !L
              xhere2 ---> |        |                    ; IMMEDIATE
                          |--------|

                       List Segment XSEG

        The word DO compiles (DO) followed by a 0 at xhere1.
        This value of 0 will later be replaced with the address xhere2
        of the first word following the DO loop.  DO also leaves the
        value of xhere1 on the stack.

        The word LOOP compiles (LOOP) and then commas in the address
        xhere1+2.  LOOP then puts the address xhere2 on the stack and
        then stores it at address seg:xhere1.

        The run-time word
                (DO)  ( limit index -- )
        sets up the return stack as follows:

                          Return Stack
                   ___________________________
                   | index - (limit + 8000H) |
                   |-------------------------|
                   |     limit + 8000H       |
                   |-------------------------|
                   |         xhere2          |
                   |-------------------------|

        The run-time word (LOOP) adds 1 to the top of the return stack
        and jumps to xhere1+2 if the overflow flag is not set.  If the
        overflow flag is set (i.e. the top of the stack crosses the
        8000H boundary when index = limit) (LOOP) pops the three items
        from the return stack and moves the instruction pointer ES:SI
        to xhere2.



        Having the value of xhere2 as the third item on the return stack
        is used by LEAVE to find the address to leave to.  Adding the
        value of 8000H to the top two values on the return stack when
        executing (DO) allows the DO loop to work properly when the limit
        is larger than 8000H.  For example, suppose the limit were FFFFH
        and the initial index were 0.  The initial value on top of the
        return stack would be -7FFFH.  As 1 is added to this value, the
        overflow flag will not be set until the top of the stack becomes
        equal to 8000H, that is, after FFFFH loops.


9.5  EXERCISES

   9.1  Use the words SEE and LDUMP to investigate the dictionary
        structure of the following three test words:
comment;

        : a.test        ( f -- )
                        IF
                           ." True"
                        ELSE
                           ." False"
                        THEN ;

        : b.test        ( -- )
                        5 0 DO
                           I .
                        LOOP ;

        : c.test        ( -- )
                        4
                        BEGIN
                           DUP .
                           1- DUP 0=
                        UNTIL
                        DROP ;

comment:
        For each word draw the dictionary structure indicating the names
        and actual values of all entries in the list dictionary.  Indicate
        on these drawings the effects of the words IF, ELSE, THEN, DO,
        LOOP, BEGIN and UNTIL.  Also explain how the word ." works in
        a.test and how the numbers 5, 0 and 4 are stored in b.test and
        c.test.
comment;