papierkorb:yet_another_interpreter_organization
Unterschiede
Hier werden die Unterschiede zwischen zwei Versionen angezeigt.
| papierkorb:yet_another_interpreter_organization [2025-08-10 23:13] – ↷ Seite von projects:yet_another_interpreter_organization nach papierkorb:yet_another_interpreter_organization verschoben mka | papierkorb:yet_another_interpreter_organization [Unbekanntes Datum] (aktuell) – gelöscht - Externe Bearbeitung (Unbekanntes Datum) 127.0.0.1 | ||
|---|---|---|---|
| Zeile 1: | Zeile 1: | ||
| - | < | ||
| - | \ From Sun Forth doc file by Mitch Bradley | ||
| - | Sep 20 21:55 1984 interpreter.doc Page 1 | ||
| - | |||
| - | Yet Another Interpreter Organization | ||
| - | |||
| - | There has been a mild controversy in the Forth community about how to | ||
| - | implement the text interpreter. | ||
| - | distinction between compiling and interpreting should be coded. | ||
| - | three distinct solutions have been advocated over the years. | ||
| - | fourth one, and claim that it is the best solution yet. | ||
| - | |||
| - | [ Additional extract from a Unixnews item from Mitch: | ||
| - | |||
| - | "I have modified the compiler so that it doesn' | ||
| - | undefined word; instead, it prints an error message and compiles a | ||
| - | reference to a word LOSE, whose run time action is to complain and abort. | ||
| - | This scheme allows the compiler to find all the undefined words in a | ||
| - | file, without making such errors propagate through all words that | ||
| - | use the undefined word. | ||
| - | |||
| - | This is especially wonderful in an environment where you can run the | ||
| - | editor and the forth process simultaneously in separate windows, | ||
| - | such as inside Gosling' | ||
| - | |||
| - | |||
| - | FIG-Forth Solution | ||
| - | |||
| - | FIG-Forth used a variable STATE whose value was 0 when interpreting and | ||
| - | (hex) C0 when compiling. | ||
| - | INTERPRET which tested STATE to determine whether to compile or to | ||
| - | interpret. | ||
| - | |||
| - | : INTERPRET ( -- ) | ||
| - | BEGIN -FIND | ||
| - | | ||
| - | IF CFA , ELSE CFA EXECUTE | ||
| - | | ||
| - | IF DROP [COMPILE] LITERAL | ||
| - | ELSE [COMPILE] DLITERAL | ||
| - | THEN | ||
| - | | ||
| - | AGAIN | ||
| - | ; | ||
| - | |||
| - | The "STATE @ <" phrase is pretty clever (or disgusting, however you wish to | ||
| - | look at it). Since the value stored in STATE is (hex) C0 when compiling, | ||
| - | and since the length byte of a defined word (which is left on the stack by | ||
| - | -FIND) is in the range (hex) 80-BF for a non-immediate word and in the (hex) | ||
| - | C0-FF for an immediate word, the "STATE @ <" test manages to return TRUE | ||
| - | only if the STATE is compiling and the word is not immediate. | ||
| - | not salient to our discussion, but is included here to prevent confusion. | ||
| - | |||
| - | STATE is explicitly tested once inside this loop, but if you look at the | ||
| - | code for the word LITERAL, it too tests STATE to decide whether to compile | ||
| - | the number or not. | ||
| - | |||
| - | To switch between compilating and interpreting, | ||
| - | and ]. [ is immediate and simply stores 0 into STATE. | ||
| - | and stores (hex) C0 into STATE. | ||
| - | (colon), which is defined something like: | ||
| - | |||
| - | : : | ||
| - | <some irrelevant stuff> | ||
| - | ] ;CODE | ||
| - | <some assembly language stuff> | ||
| - | END-CODE | ||
| - | |||
| - | The important point here is that when : executes to define a new word, the ] | ||
| - | just sets the STATE to compiling, then the ;CODE proceeds to execute. | ||
| - | purpose of ;CODE is to patch the code field of the word defined by : so that | ||
| - | it does the appropriate thing for a high-level forth word). | ||
| - | word INTERPRET doesn' | ||
| - | finishes. | ||
| - | |||
| - | So we see that [ and ] are pretty innocuous; they just change the value of a | ||
| - | variable. | ||
| - | |||
| - | |||
| - | |||
| - | |||
| - | |||
| - | Sep 20 21:55 1984 interpreter.doc Page 2 | ||
| - | |||
| - | Poly-FORTH Solution | ||
| - | |||
| - | Forth, Inc. decided that it would be better to have two separate loops for | ||
| - | the two separate functions of compiling and interpreting. | ||
| - | loop was called ], so ] actually executed the compile loop directly, rather | ||
| - | than just setting a variable. | ||
| - | |||
| - | If you loop at the previous definition of :, and now pretend that instead of | ||
| - | just setting a variable, ] actually executes the compiler loop, you will see | ||
| - | that the ;CODE following it doesn' | ||
| - | compiling is finished. | ||
| - | the use of ] inside programmer-defined words sometimes caused unexpected | ||
| - | behavior because stuff after the ] would get executed after a bunch of stuff | ||
| - | had been compiled. | ||
| - | |||
| - | The other subtlety relates to how the loops are terminated. | ||
| - | INTERPRET loop shown above never terminates! | ||
| - | does terminate, and the mechanism is pretty kludgey. | ||
| - | there is a null character at the end of every line of text in the input | ||
| - | stream, and at the end of every BLOCK of text from mass storage. | ||
| - | interpreter picks up this null character just like a normal word. The | ||
| - | dictionary contains an entry which matches this "null word" | ||
| - | code is executed, and it plays around with the return stack in such a way | ||
| - | that the INTERPRET loop is exited without its ever knowing about it. | ||
| - | |||
| - | The problem with the dual-loop interpreter/ | ||
| - | line of input from the input stream kicks out system out of whichever loop | ||
| - | it was in. If the user is attempting to compile a multi-line colon | ||
| - | definition from the input stream, he must start each line after the first | ||
| - | with an explicit ], because once the compiler loop is exited at the end of | ||
| - | the first line, the system doesn' | ||
| - | |||
| - | One key thing to remember is that the compiler loop (which was named [) is | ||
| - | executed from within the interpreter loop. | ||
| - | |||
| - | Coroutines (Patton/ | ||
| - | |||
| - | At FORML 83, Bob Berkey presented a paper about using coroutines for the | ||
| - | interpreter loop and the compiler loop, instead of having the compiler loop | ||
| - | run inside the interpreter loop. This means that executing ] kicks out the | ||
| - | interpreter loop and runs the compiler loop instead; similarly, executing [ | ||
| - | kicks out the compiler loop and runs the interpreter loop instead. | ||
| - | subroutine versions of these loops are present in his scheme, named COMPILER | ||
| - | and INTERPRETER. | ||
| - | |||
| - | Bob feels that this scheme is more symmetrical than the Poly-FORTH approach, | ||
| - | and that it eliminates some of the counter-intuitive behavior. | ||
| - | |||
| - | This scheme still requires that multi-line colon definitions compiled from | ||
| - | the keyboard have a ] at the beginning of each line after the first. | ||
| - | |||
| - | What is Wrong with all this | ||
| - | |||
| - | These different schemes do not at all address what I consider to be the | ||
| - | fundamental problems with the interpreter/ | ||
| - | |||
| - | Fundamental Problem #1: | ||
| - | |||
| - | The compiler/ | ||
| - | can't tell it to just compile one word; once you start it, off it goes, and | ||
| - | it won't stop until it gets to the end of the line or screen. | ||
| - | |||
| - | |||
| - | |||
| - | Sep 20 21:55 1984 interpreter.doc Page 3 | ||
| - | |||
| - | Fundamental Problem #2: | ||
| - | |||
| - | The reading of the next word from the input stream is buried inside this | ||
| - | loop. This means that you can't hand a string representing a word to the | ||
| - | interpreter/ | ||
| - | |||
| - | Fundamental Problem #3: | ||
| - | |||
| - | The behavior of the interpreter/ | ||
| - | behavior is hard-wired into one or two relatively large words. | ||
| - | this behavior can be extremely useful for a number of applications, | ||
| - | example meta-compiling. | ||
| - | |||
| - | Fundamental Problem #4: | ||
| - | |||
| - | If the interpreter/ | ||
| - | not defined and it's not a number), it aborts. | ||
| - | not done directly from within the loop, but inside NUMBER. | ||
| - | limits the usefulness of NUMBER because if the string that NUMBER gets is | ||
| - | not recognizable as a number, it will abort on you. (The 83 standard punts | ||
| - | this issue by not specifying NUMBER, except as an uncontrolled refernece | ||
| - | word). | ||
| - | |||
| - | Solution: | ||
| - | |||
| - | As I see it, there are several distinct things that are going on inside the | ||
| - | interpreter/ | ||
| - | into words which each do one thing solves all these problems. | ||
| - | |||
| - | The outermost thing is the loop. The loop's job is to repetitively get the | ||
| - | next word from the input stream and do something with it. The loop should | ||
| - | terminate when the input stream is exhausted. | ||
| - | |||
| - | : NEW-INTERPRET | ||
| - | BEGIN BL WORD ( str ) | ||
| - | MORE? ( str f ) ( flag true if input stream not exhausted ) | ||
| - | WHILE | ||
| - | " | ||
| - | REPEAT | ||
| - | DROP | ||
| - | ; | ||
| - | |||
| - | The next level down is the "do something with it" | ||
| - | separate word so that it may be called by other words which would like to | ||
| - | compile/ | ||
| - | because it takes a string representing a single word and compiles (or | ||
| - | interprets) it. " | ||
| - | dealing with. There are 3 choices. | ||
| - | it is a literal (i.e. a number), or it is neither. | ||
| - | |||
| - | : " | ||
| - | FIND ( str 0 | cfa -1 | ||
| - | DUP | ||
| - | IF | ||
| - | ELSE DROP ( str ) | ||
| - | | ||
| - | | ||
| - | | ||
| - | THEN | ||
| - | THEN | ||
| - | ; | ||
| - | |||
| - | |||
| - | |||
| - | Sep 20 21:55 1984 interpreter.doc Page 4 | ||
| - | |||
| - | Finally, at the lowest layer, there is the code which does the appropriate | ||
| - | thing for each of these three possibilities. | ||
| - | the words DO-DEFINED, DO-LITERAL, and DO-UNDEFINED. | ||
| - | lowest layer that the system cares at all whether it is compiling or | ||
| - | interpreting. | ||
| - | speed. | ||
| - | the loop. | ||
| - | |||
| - | Clearly, my scheme has to do something to distinguish between compiling and | ||
| - | interpreting. | ||
| - | DO-DEFINED, DO-LITERAL, and DO-UNDEFINED. | ||
| - | of course. | ||
| - | |||
| - | A more interesting alternative is to make each of DO-DEFINED, DO-LITERAL, | ||
| - | and DO-UNDEFINED a deferred word. (Deferred words are sometimes called | ||
| - | execution vectors. | ||
| - | of a word to execute, except that the @ EXECUTE is done automatically) | ||
| - | |||
| - | If these words are deferred, then they can be changed when the system goes | ||
| - | from compiling to interpreting, | ||
| - | |||
| - | DEFER LITERAL? | ||
| - | DEFER DO-DEFINED | ||
| - | DEFER DO-LITERAL | ||
| - | DEFER DO-UNDEFINED | ||
| - | |||
| - | : (LITERAL? | ||
| - | >R R@ NUMBER? | ||
| - | IF R> DROP TRUE | ||
| - | ELSE DROP R> FALSE | ||
| - | THEN | ||
| - | ; | ||
| - | ' (LITERAL? IS LITERAL? | ||
| - | : INTERPRET-DO-DEFINED | ||
| - | DROP EXECUTE | ||
| - | ; | ||
| - | : COMPILE-DO-DEFINED | ||
| - | 0> IF | ||
| - | ELSE , ( if not immediate ) | ||
| - | THEN | ||
| - | ; | ||
| - | : INTERPRET-DO-LITERAL ( d -- d | n ) | ||
| - | DOUBLE? 0= IF DROP THEN | ||
| - | ; | ||
| - | : COMPILE-DO-LITERAL ( d -- ) | ||
| - | DOUBLE? IF [COMPILE] DLITERAL ELSE [COMPILE] LITERAL THEN | ||
| - | ; | ||
| - | : INTERPRET-DO-UNDEFINED ( str -- ) | ||
| - | COUNT TYPE ." | ||
| - | QUIT | ||
| - | ; | ||
| - | : COMPILE-DO-UNDEFINED | ||
| - | COUNT TYPE ." | ||
| - | COMPILE LOSE | ||
| - | ; | ||
| - | |||
| - | Then [ and ] would be defined as follows: | ||
| - | |||
| - | : [ | ||
| - | ['] INTERPRET-DO-DEFINED | ||
| - | ['] INTERPRET-DO-LITERAL | ||
| - | ['] INTERPRET-DO-UNDEFINED IS DO-UNDEFINED | ||
| - | |||
| - | |||
| - | Sep 20 21:55 1984 interpreter.doc Page 5 | ||
| - | |||
| - | STATE OFF | ||
| - | ; IMMEDIATE | ||
| - | |||
| - | : ] | ||
| - | ['] COMPILE-DO-DEFINED | ||
| - | ['] COMPILE-DO-LITERAL | ||
| - | ['] COMPILE-DO-UNDEFINED | ||
| - | STATE ON | ||
| - | ; | ||
| - | |||
| - | (IS is the word which sets the word to execute for a deferred word. | ||
| - | |||
| - | Executing a deferred word need not be slow. Deferred word are so useful | ||
| - | that they should be coded in assembler for speed. | ||
| - | only very slightly slower than normal colon definitions. | ||
| - | |||
| - | So what? | ||
| - | |||
| - | This may seem to be more complicated than the schemes it replaces. | ||
| - | certainly does have more words. | ||
| - | individually easy to understand, and each word does a very specific job, in | ||
| - | contrast to the old style, which bundles up a lot of different things in one | ||
| - | big word. The more explicit factoring gives you a great deal of control | ||
| - | over the interpreter. | ||
| - | |||
| - | Here are some interesting things you can do with this new scheme: | ||
| - | |||
| - | One of my favorite words, TH (for Temporary Hex): | ||
| - | |||
| - | : TH ( --word | ||
| - | BASE @ >R HEX | ||
| - | BL WORD " | ||
| - | R> BASE ! | ||
| - | ; IMMEDIATE | ||
| - | |||
| - | This word temporarily sets the base to hexadecimal, | ||
| - | a word, and restores the base. It works for numbers or defined words, | ||
| - | either interpreting or compiling. | ||
| - | |||
| - | For example: | ||
| - | |||
| - | DECIMAL | ||
| - | TH 10 . (system prints--> | ||
| - | 10 TH . (system prints--> | ||
| - | : STRIP-PARITY ( char -- char-without-parity ) | ||
| - | TH 7F AND | ||
| - | ; | ||
| - | |||
| - | Liberal use of this word markedly reduces the need to switch bases, | ||
| - | especially in source code, and thus reduces the chance of errors. | ||
| - | |||
| - | Here's a common word that is trivial to implement with this kind of | ||
| - | interpreter: | ||
| - | |||
| - | : ASCII ( --name | ||
| - | BL WORD 1+ C@ ( char ) | ||
| - | -1 DPL ! \ make sure it's not handled as a double number | ||
| - | DO-LITERAL | ||
| - | ; | ||
| - | |||
| - | Here's a word which allows you to make a new name for an old word. It is a | ||
| - | smart word, in that when the new word is compiled, the old word will | ||
| - | |||
| - | |||
| - | Sep 20 21:55 1984 interpreter.doc Page 6 | ||
| - | |||
| - | actually be compiled instead, eliminating any performance penalty. | ||
| - | Furthermore, | ||
| - | see, the vectored " | ||
| - | |||
| - | : ALIAS ( -- ) ( Input stream: | ||
| - | CREATE | ||
| - | BL WORD FIND ( cfa -1 | cfa 1 | str false ) | ||
| - | DUP IF | ||
| - | , , IMMEDIATE | ||
| - | ELSE | ||
| - | DROP ." Can't find " | ||
| - | THEN | ||
| - | DOES> | ||
| - | | ||
| - | ; | ||
| - | ( Examples ) | ||
| - | ALIAS D@ 2@ | ||
| - | HERE D@ ( actually executes 2@ ) | ||
| - | : FOO HERE D@ ; ( actually compiles 2@ ) | ||
| - | ALIAS FOREVER | ||
| - | : LOOP-ALWAYS BEGIN FOREVER ; ( actually executes AGAIN, which is immediate ) | ||
| - | |||
| - | Finally, a really neat way to write keyword-driven translators. | ||
| - | have some kind of a file that contains a bunch of text. Interspersed | ||
| - | throughout the text are keywords that you would like to recognize, and the | ||
| - | program should do something special when it sees a keyword. | ||
| - | aren't keywords, it just writes them out unchanged. | ||
| - | keywords are " | ||
| - | |||
| - | VOCABULARY KEYWORDS DEFINITIONS | ||
| - | : .PARAGRAPH | ||
| - | ( whatever you want to happen when you see paragraph ) | ||
| - | ; | ||
| - | : .SECTION | ||
| - | ( whatever you want to happen when you see paragraph ) | ||
| - | ; | ||
| - | : KEYWORDS-DO-UNDEFINED ( STR -- ) | ||
| - | COUNT TYPE | ||
| - | ; | ||
| - | : .END | ||
| - | ONLY FORTH | ||
| - | ['] (LITERAL? | ||
| - | ['] INTERPRET-DO-UNDEFINED IS DO-UNDEFINED | ||
| - | ; | ||
| - | ONLY FORTH ALSO KEYWORDS | ||
| - | : PROCESS-KEYWORDS | ||
| - | ['] FALSE IS LITERAL? | ||
| - | ['] KEYWORDS-DO-UNDEFINED | ||
| - | ONLY KEYWORDS | ||
| - | ; | ||
| - | |||
| - | I have used this technique very successfully to extract specific information | ||
| - | from data base files produced by a CAD system. | ||
| - | unrecognized words, I actually just ingored them in this application, | ||
| - | the technique is the same in either case. | ||
| - | |||
| - | Mitch Bradley | ||
| - | Sun Microsystems, | ||
| - | 2550 Garcia Ave. | ||
| - | Mountain View, CA 94043 | ||
| - | (Work) 415/ | ||
| - | (Home) 415/ | ||
| - | |||
| - | |||
| - | </ | ||
papierkorb/yet_another_interpreter_organization.1754860422.txt.gz · Zuletzt geändert: 2025-08-10 23:13 von mka