Lesson 6
\ Lesson 6 - Strings
\ The Forth Course
\ by Richard E. Haskell
\ Dept. of Computer Science and Engineering
\ Oakland University, Rochester, MI 48309
comment:
Lesson 6
STRINGS
6.1 STRING INPUT 6-2
6.2 ASCII - BINARY CONVERSIONS 6-4
6.3 NUMBER OUTPUT CONVERSIONS 6-5
6.4 SCREEN OUTPUT 6-7
6.1 STRING INPUT
To get a string from the terminal and put it in a buffer at "addr"
you can use the word
EXPECT ( addr len -- )
This word has limited editing capabilities (you can backspace, for
example) and will continue storing the ASCII codes of the keys you
type until "len" characters are entered or you press <Enter>. The
number of characters entered is stored in the variable SPAN.
The address of the Terminal Input Buffer is stored in the variable
'TIB. The word
TIB ( -- addr )
puts this address on top of the stack.
The word QUERY will get a string from the keyboard and store it in
the Terminal Input Buffer. It could be defined as follows:
: QUERY ( -- )
TIB 80 EXPECT
SPAN @ #TIB !
>IN OFF ;
The variable #TIB contains the number of characters in the TIB.
The variable >IN is a pointer to the characters in the TIB. It
is initially set to zero by the word OFF which stores a zero at
the address on the stack.
For example, suppose in response to QUERY you type the following
characters plus <Enter>.
3.1415,2.789
The ASCII codes for each of these characters would be stored in the
Terminal Input Buffer beginning at the address TIB. The value 12
would be stored in the variable #TIB.
Now suppose you want to parse this input stream and extract the two
numbers that are separated by a comma. You can do this with the word
WORD ( char -- addr )
which will parse the input stream for "char" and leave a "counted"
string at "addr" (which is really the address HERE). Thus, if the
program executes the phrase
ASCII , WORD
the words ASCII , will put the ASCII code for a comma (hex 2C) on
the stack and then WORD will result in the following bytes being
stored at HERE.
|-------|
HERE --> | 6 |
|-------|
| '3' |
|-------|
| '.' |
|-------|
| '1' |
|-------|
| '4' |
|-------|
| '1' |
|-------|
| '5' |
|-------|
| blank |
|-------|
Note that the first byte of a "counted" string contains the number
of bytes in the string (6 in this case). Also note that the word
WORD will append a blank character (ASCII 20 hex) to the end of the
string.
At this point the variable >IN will be pointing to the character
following the comma (in this case the 2). The next time the phrase
ASCII , WORD
is executed the counted string "2.789" will be stored at HERE. Even
though there is no comma at the end of this string the word WORD will
parse to the end of the string if it does not find the delimiting
character "char."
6.2 ASCII - BINARY CONVERSIONS
Suppose you enter the number 3.1415. We saw in Lesson 5 that if you
do this in the interpretive mode, the value 31415 will be stored as
a double number on the stack and the number of places to the right
of the decimal point (4) will be stored in the variable DPL.
How can you have the same thing done as part of your program. For
example, you may want to ask the user to enter a number and have
that number end up on the stack. The Forth word NUMBER will convert
an ASCII string to a binary number. Its stack picture looks like
this.
NUMBER ( addr -- d )
This word will convert a counted string at "addr" and store the
result as a double number on the stack. The string can represent
a real, signed number with a radix point in the current BASE. The
number of digits after the radix point is stored in the variable DPL.
If there is no radix point, the value of DPL is -1. The number
string must end with a blank. This is just the situation that is
created by WORD.
If we want to enter a single number (16-bits) from the keyboard,
we can define the following word:
comment;
: enter.number ( -- n )
QUERY
BL WORD
NUMBER DROP ;
comment:
In this definition BL is the ASCII code for a blank (ASCII 20 hex).
Thus, WORD will parse the input string until a blank or the end of
the string is reached. NUMBER will convert this input string to
a double number and DROP will drop the high word and leave the
value of the single number on the stack. Note that its value must
be in the range -32768 to 32767 in order for the high word of the
double number to be zero.
6.3 NUMBER OUTPUT CONVERSION
To print out the number 1234 on the screen we must
1) Divide the number by the base.
2) Convert the remainder to ASCII and store
backwards as a number string
Repeat 1) and 2) until the quotient = 0
Example: |------|
1234/10 = 123 Rem = 4 -| | 31 |
| |------|
123/10 = 12 Rem = 3 | |---> | 32 |
| | |------|
12/10 = 1 Rem = 2 -|-| | 33 |
| |------|
1/10 = 0 Rem = 1 |-----> | 34 |
|------|
The following Forth words are used to perform this conversiion
and print the result on the screen:
PAD is a temporary buffer 80 bytes above HERE.
: PAD ( --- addr )
HERE 80 + ;
HLD is a VARIABLE that points to the last character stored in
the number string.
<# starts the number conversion which will store the number string
in the memory bytes below PAD.
: <# ( -- )
PAD HLD ! ;
HOLD will insert the character "char" in the output string.
: HOLD ( char -- )
-1 HLD +!
HLD @ C! ;
The Forth word +! ( n addr -- ) adds n to the value at addr. Thus,
in the definition of HOLD, the value in HLD is decremented by 1
and then the ASCII code "char" is stored in the byte at HLD.
The word # ("sharp") converts the next digit by performing steps
1) and 2) above. The dividend must be a double nunber.
: # ( d1 -- d2 )
BASE @ MU/MOD \ rem d2
ROT 9 OVER \ d2 rem 9 rem
<
IF \ if 9 < rem
7 + \ add 7 to rem
THEN
ASCII 0 + \ conv. rem to ASCII
HOLD ; \ insert in string
The word #S ("sharp-S") converts the rest of a double number and
leaves a double zero on the stack.
: #S ( d -- 0 0 )
BEGIN
# \ convert next digit
2DUP OR 0= \ continue until
UNTIL ; \ quotient = 0
The word #> completes the conversion by dropping the double zero
left by #S and then computing the length of the string by
subtracting the address of the first character (now in HLD) from
the address following the last char (PAD). This string length
is left on the stack above the string address (in HLD).
: #> ( d -- addr len )
2DROP \ drop 0 0
HLD @ \ addr
PAD OVER \ addr pad addr
- ; \ addr len
The Forth word SIGN is used to insert a minus-sign (-) in an
output string if the value on top of the stack is negative.
: SIGN ( n -- )
0<
IF
ASCII - HOLD
THEN ;
These words will be used in the next section to display number
values on the screen.
6.4 SCREEN OUTPUT
The word TYPE prints a string whose address and length are on the
stack.
: TYPE ( addr len -- )
0 ?DO \ addr
DUP C@ \ addr char
EMIT 1+ \ next.addr
LOOP
DROP ;
F-PC actually uses a somewhat different definition of TYPE that
allows you to TYPE a string that is stored in any segment by
storing the segment address of the string in the variable TYPESEG.
Strings are usually specified in one of three ways:
(1) Counted strings in which the first byte contains
the number of characters in the string. The string
is specified by the address of the count byte ( addr -- ).
(2) The address of the first character of the string and
the length of the string are specified: ( addr len -- ).
(3) An ASCIIZ string is a string specified by the address
of the first character ( addr -- ). The string is
terminated by a nul character (a zero byte).
A counted string (1) can be converted to an address-length string (2)
by using the Forth word COUNT. Thus, the word
COUNT ( addr -- addr+1 len )
takes the address of a counted string (addr) and leaves the address
of the first character of the string (addr+1) and the length of the
string (which it got from the byte at addr).
Since TYPE requires the address and length of the string to be on
the stack, to print a counted string you would use
COUNT TYPE
Example: The following word will echo whatever you type on the
screen.
comment;
: echo ( -- )
QUERY \ get a string
ASCII , WORD
CR COUNT TYPE ;
comment:
The use of the number output conversion words in Section 6.3 can be
illustrated by showing how the various number output words are
defined.
The word (U.) converts an unsigned single number and leaves the
address and length of the converted string on the stack.
: (U.) ( u -- addr len )
0 <# #S #> ;
The word U. prints this string on the screen followed by a blank.
: U. ( u -- )
(U.) TYPE SPACE ;
The word SPACE prints one blank. It is just
: SPACE ( -- )
BL EMIT ;
where BL is the CONSTANT 32, the ASCII code of a blank.
The Forth word SPACES ( n -- ) prints n spaces.
When printing numbers in columns on the screen it is necessary to
print the number right-justified in a field of width "wid".
This can be done for an unsigned number with the Forth word U.R.
: U.R ( u wid -- )
>R (U.) \ addr len
R> \ addr len wid
OVER - SPACES \ addr len
TYPE ;
For example, 8 U.R will print an unsigned number, right-justified
in a field of width 8.
To print a signed number we need to insert a minus-sign at the
beginning of the string if the number is negative. The word (.)
will do this.
: (.) ( n -- addr len )
DUP ABS \ n u
0 <# #S \ n 0 0
ROT SIGN #> ;
The word dot (.) is then defined as
: . ( n -- )
(.) TYPE SPACE ;
The word .R can be used to print a signed number, right-justified
in a field of width "wid".
: .R ( n wid -- )
>R (.) \ addr len
R> \ addr len wid
OVER - SPACES \ addr len
TYPE ;
Similar words are defined to print unsigned and signed double
numbers.
: (UD.) ( ud -- addr len )
<# #S #> ;
: UD. ( ud -- )
(UD.) TYPE SPACE ;
: UD.R ( ud wid -- )
>R (UD.) \ addr len
R> \ addr len wid
OVER - SPACES \ addr len
TYPE ;
: (D.) ( d -- addr len )
TUCK DABS \ dH ud
<# #S ROT SIGN #> ;
: D. ( d -- )
(D.) TYPE SPACE ;
: D.R ( ud wid -- )
>R (D.) \ addr len
R> \ addr len wid
OVER - SPACES \ addr len
TYPE ;
To clear the screen, use the word
DARK ( -- )
To set the cursor at the x,y coordinate col,row use the word
AT ( col row -- )
For example, the following word Example_6.4 will clear the screen
and print the message "Message starting at col 20, row 10".
comment;
: Example_6.4 ( -- )
DARK
20 10 AT
." Message starting at col 20, row 10"
CR ;