Benutzer-Werkzeuge

Webseiten-Werkzeuge


papierkorb:yet_another_interpreter_organization

Unterschiede

Hier werden die Unterschiede zwischen zwei Versionen angezeigt.

Link zu dieser Vergleichsansicht

papierkorb:yet_another_interpreter_organization [2025-08-10 23:13] – ↷ Seite von projects:yet_another_interpreter_organization nach papierkorb:yet_another_interpreter_organization verschoben mkapapierkorb:yet_another_interpreter_organization [Unbekanntes Datum] (aktuell) – gelöscht - Externe Bearbeitung (Unbekanntes Datum) 127.0.0.1
Zeile 1: Zeile 1:
-<code> 
-\ From Sun Forth doc file by Mitch Bradley 
  
-Sep 20 21:55 1984  interpreter.doc Page 1 
- 
-        Yet Another Interpreter Organization 
- 
-There has been a mild controversy in the Forth community about how to 
-implement the text interpreter.  The particular problem is how the 
-distinction between compiling and interpreting should be coded.  At least 
-three distinct solutions have been advocated over the years.  I propose a 
-fourth one, and claim that it is the best solution yet. 
- 
-[ Additional extract from a Unixnews item from Mitch: 
- 
-"I have modified the compiler so that it doesn't stop when it hits an 
-undefined word; instead, it prints an error message and compiles a 
-reference to a word LOSE, whose run time action is to complain and abort. 
-This scheme allows the compiler to find all the undefined words in a 
-file, without making such errors propagate through all words that 
-use the undefined word. 
- 
-This is especially wonderful in an environment where you can run the 
-editor and the forth process simultaneously in separate windows, 
-such as inside Gosling's Emacs or in the SunWindows environment." ] 
- 
- 
-FIG-Forth Solution 
- 
-FIG-Forth used a variable STATE whose value was 0 when interpreting and 
-(hex) C0 when compiling.  The interpreter was coded as a single word 
-INTERPRET which tested STATE to determine whether to compile or to 
-interpret.  Here is the code: 
- 
-    : INTERPRET ( -- ) 
-      BEGIN  -FIND 
-         IF   STATE @  < 
-              IF  CFA , ELSE  CFA EXECUTE  THEN 
-         ELSE   HERE NUMBER  DPL @  1+ 
-              IF   DROP [COMPILE] LITERAL 
-              ELSE      [COMPILE] DLITERAL 
-              THEN 
-         THEN  ?STACK 
-      AGAIN 
-    ; 
- 
-The "STATE @ <" phrase is pretty clever (or disgusting, however you wish to 
-look at it).  Since the value stored in STATE is (hex) C0 when compiling, 
-and since the length byte of a defined word (which is left on the stack by 
--FIND) is in the range (hex) 80-BF for a non-immediate word and in the (hex) 
-C0-FF for an immediate word, the "STATE @ <" test manages to return TRUE 
-only if the STATE is compiling and the word is not immediate.  This fact is 
-not salient to our discussion, but is included here to prevent confusion. 
- 
-STATE is explicitly tested once inside this loop, but if you look at the 
-code for the word LITERAL, it too tests STATE to decide whether to compile 
-the number or not. 
- 
-To switch between compilating and interpreting, Fig-FORTH uses two words [ 
-and ].  [ is immediate and simply stores 0 into STATE.  ] is not immediate 
-and stores (hex) C0 into STATE.  Compilation is typically started with : 
-(colon), which is defined something like: 
- 
-: : 
-  <some irrelevant stuff> 
-  ] ;CODE  
-  <some assembly language stuff> 
-END-CODE 
- 
-The important point here is that when : executes to define a new word, the ] 
-just sets the STATE to compiling, then the ;CODE proceeds to execute.  (The 
-purpose of ;CODE is to patch the code field of the word defined by : so that 
-it does the appropriate thing for a high-level forth word).  The interpret 
-word INTERPRET doesn't notice that STATE is now compiling until the ;CODE 
-finishes. 
- 
-So we see that [ and ] are pretty innocuous; they just change the value of a 
-variable. 
- 
- 
- 
- 
- 
-Sep 20 21:55 1984  interpreter.doc Page 2 
- 
-Poly-FORTH Solution 
- 
-Forth, Inc. decided that it would be better to have two separate loops for 
-the two separate functions of compiling and interpreting.  The compiling 
-loop was called ], so ] actually executed the compile loop directly, rather 
-than just setting a variable.  This has two subtle side effects. 
- 
-If you loop at the previous definition of :, and now pretend that instead of 
-just setting a variable, ] actually executes the compiler loop, you will see 
-that the ;CODE following it doesn't actually get executed until AFTER the 
-compiling is finished.  This in itself doesn't cause a problem for :, but 
-the use of ] inside programmer-defined words sometimes caused unexpected 
-behavior because stuff after the ] would get executed after a bunch of stuff 
-had been compiled. 
- 
-The other subtlety relates to how the loops are terminated.  Note that the 
-INTERPRET loop shown above never terminates!  We all know that it really 
-does terminate, and the mechanism is pretty kludgey.  What happens is that 
-there is a null character at the end of every line of text in the input 
-stream, and at the end of every BLOCK of text from mass storage.  The text 
-interpreter picks up this null character just like a normal word.  The 
-dictionary contains an entry which matches this "null word" The associated 
-code is executed, and it plays around with the return stack in such a way 
-that the INTERPRET loop is exited without its ever knowing about it. 
- 
-The problem with the dual-loop interpreter/compiler is that the end of each 
-line of input from the input stream kicks out system out of whichever loop 
-it was in.  If the user is attempting to compile a multi-line colon 
-definition from the input stream, he must start each line after the first 
-with an explicit ], because once the compiler loop is exited at the end of 
-the first line, the system doesn't remember that it was compiling. 
- 
-One key thing to remember is that the compiler loop (which was named [) is 
-executed from within the interpreter loop. 
- 
-Coroutines (Patton/Berkey) 
- 
-At FORML 83, Bob Berkey presented a paper about using coroutines for the 
-interpreter loop and the compiler loop, instead of having the compiler loop 
-run inside the interpreter loop.  This means that executing ] kicks out the 
-interpreter loop and runs the compiler loop instead; similarly, executing [ 
-kicks out the compiler loop and runs the interpreter loop instead.  The 
-subroutine versions of these loops are present in his scheme, named COMPILER 
-and INTERPRETER. 
- 
-Bob feels that this scheme is more symmetrical than the Poly-FORTH approach, 
-and that it eliminates some of the counter-intuitive behavior. 
- 
-This scheme still requires that multi-line colon definitions compiled from 
-the keyboard have a ] at the beginning of each line after the first. 
- 
-What is Wrong with all this 
- 
-These different schemes do not at all address what I consider to be the 
-fundamental problems with the interpreter/compiler. 
- 
-Fundamental Problem #1: 
- 
-The compiler/interpreter has a built-in infinite loop.  This means that you 
-can't tell it to just compile one word; once you start it, off it goes, and 
-it won't stop until it gets to the end of the line or screen. 
- 
- 
- 
-Sep 20 21:55 1984  interpreter.doc Page 3 
- 
-Fundamental Problem #2: 
- 
-The reading of the next word from the input stream is buried inside this 
-loop.  This means that you can't hand a string representing a word to the 
-interpreter/compiler and have it interpret or compile it for you. 
- 
-Fundamental Problem #3: 
- 
-The behavior of the interpreter/compiler is hard to change because all the 
-behavior is hard-wired into one or two relatively large words.  Changing 
-this behavior can be extremely useful for a number of applications, for 
-example meta-compiling. 
- 
-Fundamental Problem #4: 
- 
-If the interpreter/compiler can't figure out what to do with a word (it's 
-not defined and it's not a number), it aborts.  Worse yet, the aborting is 
-not done directly from within the loop, but inside NUMBER.  This severly 
-limits the usefulness of NUMBER because if the string that NUMBER gets is 
-not recognizable as a number, it will abort on you.  (The 83 standard punts 
-this issue by not specifying NUMBER, except as an uncontrolled refernece 
-word). 
- 
-Solution: 
- 
-As I see it, there are several distinct things that are going on inside the 
-interpreter/compiler.  A proper factorization of the interpreter/compiler 
-into words which each do one thing solves all these problems. 
- 
-The outermost thing is the loop.  The loop's job is to repetitively get the 
-next word from the input stream and do something with it.  The loop should 
-terminate when the input stream is exhausted. 
- 
-: NEW-INTERPRET  (S -- ) 
-  BEGIN   BL WORD   ( str ) 
-          MORE?     ( str f )  ( flag true if input stream not exhausted ) 
-  WHILE 
-          "COMPILE 
-  REPEAT 
-  DROP 
-; 
- 
-The next level down is the "do something with it" This ought to be a 
-separate word so that it may be called by other words which would like to 
-compile/interpret a single word.  This layer is here called "COMPILE, 
-because it takes a string representing a single word and compiles (or 
-interprets) it.  "COMPILE's main job is to decide what kind of word it is 
-dealing with.  There are 3 choices.  Either the word is already defined, or 
-it is a literal (i.e. a number), or it is neither. 
- 
-: "COMPILE ( str -- ?? ) 
-  FIND     ( str 0  |   cfa -1     cfa 1 ) 
-  DUP 
-  IF   DO-DEFINED    ( ??  ) 
-  ELSE DROP          ( str ) 
-       LITERAL?      ( str false | ?? true ) 
-       IF    DO-LITERAL        ( ?? ) 
-       ELSE  DO-UNDEFINED      ( ?? ) 
-       THEN 
-  THEN 
-; 
- 
- 
- 
-Sep 20 21:55 1984  interpreter.doc Page 4 
- 
-Finally, at the lowest layer, there is the code which does the appropriate 
-thing for each of these three possibilities.  This level is represented by 
-the words DO-DEFINED, DO-LITERAL, and DO-UNDEFINED.  It is ONLY at this 
-lowest layer that the system cares at all whether it is compiling or 
-interpreting.  One of the benefits claimed for the Poly-FORTH scheme is 
-speed.  This is due to the elimination of tests of the STATE variable within 
-the loop. 
- 
-Clearly, my scheme has to do something to distinguish between compiling and 
-interpreting.  An obvious solution would be to test STATE inside each of 
-DO-DEFINED, DO-LITERAL, and DO-UNDEFINED.  This would slow down the system, 
-of course. 
- 
-A more interesting alternative is to make each of DO-DEFINED, DO-LITERAL, 
-and DO-UNDEFINED a deferred word. (Deferred words are sometimes called 
-execution vectors.  Basically they are like variables which hold the address 
-of a word to execute, except that the @ EXECUTE is done automatically) 
- 
-If these words are deferred, then they can be changed when the system goes 
-from compiling to interpreting, and vice versa. 
- 
-DEFER LITERAL?      ( str -- n true  |  d true  |  str false ) 
-DEFER DO-DEFINED    ( cfa -1 | cfa 1  -- ?? ) 
-DEFER DO-LITERAL    ( literal -- ?? ) 
-DEFER DO-UNDEFINED  ( str -- ) 
- 
-: (LITERAL?  ( str -- str false  | literal true ) 
-  >R R@ NUMBER?   ( l f ) 
-  IF    R> DROP  TRUE 
-  ELSE  DROP R> FALSE 
-  THEN 
-; 
-' (LITERAL? IS LITERAL? 
-: INTERPRET-DO-DEFINED  ( cfa -1 | cfa 1 -- ?? ) 
-  DROP EXECUTE 
-; 
-: COMPILE-DO-DEFINED    ( cfa -1 | cfa 1 -- ) 
-  0> IF   EXECUTE   ( if immediate ) 
-     ELSE ,         ( if not immediate ) 
-     THEN 
-; 
-: INTERPRET-DO-LITERAL ( d -- d | n ) 
-  DOUBLE? 0= IF DROP THEN 
-; 
-: COMPILE-DO-LITERAL ( d -- ) 
-  DOUBLE? IF [COMPILE] DLITERAL ELSE [COMPILE] LITERAL THEN 
-;   
-: INTERPRET-DO-UNDEFINED ( str -- ) 
-    COUNT TYPE ."  ?" CR 
-  QUIT 
-; 
-: COMPILE-DO-UNDEFINED   ( str -- ) 
-    COUNT TYPE ."  ?" CR 
-  COMPILE LOSE 
-; 
- 
-Then [ and ] would be defined as follows: 
- 
-: [ 
-  ['] INTERPRET-DO-DEFINED   IS DO-DEFINED 
-  ['] INTERPRET-DO-LITERAL   IS DO-LITERAL 
-  ['] INTERPRET-DO-UNDEFINED IS DO-UNDEFINED 
- 
- 
-Sep 20 21:55 1984  interpreter.doc Page 5 
- 
-  STATE OFF 
-; IMMEDIATE 
- 
-: ] 
-  ['] COMPILE-DO-DEFINED     IS DO-DEFINED 
-  ['] COMPILE-DO-LITERAL     IS DO-LITERAL 
-  ['] COMPILE-DO-UNDEFINED   IS DO-UNDEFINED 
-  STATE ON 
-; 
- 
-(IS is the word which sets the word to execute for a deferred word. 
- 
-Executing a deferred word need not be slow.  Deferred word are so useful 
-that they should be coded in assembler for speed.  On my system they are 
-only very slightly slower than normal colon definitions. 
- 
-So what? 
- 
-This may seem to be more complicated than the schemes it replaces.  It 
-certainly does have more words.  On the other hand, each word is 
-individually easy to understand, and each word does a very specific job, in 
-contrast to the old style, which bundles up a lot of different things in one 
-big word.  The more explicit factoring gives you a great deal of control 
-over the interpreter. 
- 
-Here are some interesting things you can do with this new scheme: 
- 
-One of my favorite words, TH (for Temporary Hex): 
- 
-: TH  ( --word  ?? ) 
-  BASE @ >R  HEX 
-  BL WORD "COMPILE 
-  R> BASE ! 
-; IMMEDIATE 
- 
-This word temporarily sets the base to hexadecimal, interprets 
-a word, and restores the base.  It works for numbers or defined words, 
-either interpreting or compiling. 
- 
-For example: 
- 
-DECIMAL 
-TH 10 .  (system prints-->  16 
-10 TH .  (system prints-->  A 
-: STRIP-PARITY ( char -- char-without-parity ) 
-  TH 7F AND 
-; 
- 
-Liberal use of this word markedly reduces the need to switch bases, 
-especially in source code, and thus reduces the chance of errors. 
- 
-Here's a common word that is trivial to implement with this kind of 
-interpreter: 
- 
-: ASCII (  --name  char ) 
-  BL WORD 1+ C@ ( char ) 
-  -1 DPL !     \ make sure it's not handled as a double number 
-  DO-LITERAL 
-; 
- 
-Here's a word which allows you to make a new name for an old word.  It is a 
-smart word, in that when the new word is compiled, the old word will 
- 
- 
-Sep 20 21:55 1984  interpreter.doc Page 6 
- 
-actually be compiled instead, eliminating any performance penalty. 
-Furthermore, it even works for old words that are immediate!  As you will 
-see, the vectored "DO-DEFINED" does exactly the thing we want. 
- 
-: ALIAS  ( -- )     ( Input stream:  new-name old-word ) 
-  CREATE 
-    BL WORD FIND    ( cfa -1 | cfa 1  |  str false ) 
-    DUP IF 
-       , , IMMEDIATE 
-    ELSE 
-      DROP  ." Can't find "  COUNT TYPE 
-    THEN 
-  DOES>  2@   ( cfa -1 | cfa 1 ) 
-   DO-DEFINED 
-; 
-( Examples ) 
-ALIAS D@ 2@ 
-HERE D@   ( actually executes 2@ ) 
-: FOO HERE D@ ;  ( actually compiles 2@ ) 
-ALIAS FOREVER  AGAIN 
-: LOOP-ALWAYS BEGIN FOREVER ;  ( actually executes AGAIN, which is immediate ) 
- 
-Finally, a really neat way to write keyword-driven translators.  Suppose you 
-have some kind of a file that contains a bunch of text.  Interspersed 
-throughout the text are keywords that you would like to recognize, and the 
-program should do something special when it sees a keyword.  For things that 
-aren't keywords, it just writes them out unchanged.  Suppose that the 
-keywords are ".PARAGRAPH", ".SECTION", and ".END". 
- 
-VOCABULARY KEYWORDS DEFINITIONS 
-: .PARAGRAPH 
-  ( whatever you want to happen when you see paragraph ) 
-; 
-: .SECTION 
-  ( whatever you want to happen when you see paragraph ) 
-; 
-: KEYWORDS-DO-UNDEFINED ( STR -- ) 
-  COUNT TYPE 
-; 
-: .END 
-  ONLY FORTH 
-  ['] (LITERAL?              IS  LITERAL? 
-  ['] INTERPRET-DO-UNDEFINED IS  DO-UNDEFINED 
-; 
-ONLY FORTH ALSO KEYWORDS 
-: PROCESS-KEYWORDS 
-  ['] FALSE                  IS LITERAL? 
-  ['] KEYWORDS-DO-UNDEFINED  IS DO-UNDEFINED 
-  ONLY KEYWORDS 
-; 
- 
-I have used this technique very successfully to extract specific information 
-from data base files produced by a CAD system.  Instead of outputting 
-unrecognized words, I actually just ingored them in this application, but 
-the technique is the same in either case. 
- 
-        Mitch Bradley 
-        Sun Microsystems, Inc. 
-        2550 Garcia Ave. 
-        Mountain View, CA 94043 
-        (Work) 415/960-3300x314 
-        (Home) 415/961-1302 
- 
- 
-</code> 
papierkorb/yet_another_interpreter_organization.1754860422.txt.gz · Zuletzt geändert: 2025-08-10 23:13 von mka