Table of Contents
String handling
The idea
Character strings are mostly associated with dynamic memory and garbage collection. That is an overkill with the string handling that is used in most Forth programs. In particular we can get by with buffers that are statically allocated using CREATE. It is still useful to lift manipulating single characters to manipulating strings as a whole.
(We define) A few words that make string manipulation in forth a little smoother.
Original idea Albert Nijhof & Albert van der Horst. Examples are:
- Manipulate files
- Start programs
- Add, delete and use folders/directories
- Etc.
Construction of strings
Strings in Forth are of the type address & length. The length is stored in front of the string. There are two views possible. The classic view is to store the length in a byte.
The so called counted strings, as is shown in the picture:
Pseudo code
Function: $VARIABLE reserve a buffer for the count-byte + 'maxlen' characters Alternatively" reserve a buffer for the count-cell + 'maxlen' characters Define: ( maxlen "name" -- ) Save maxlen & buffer-address Action: ( -- s ) Leave address of string variable Function: $@ ( s -- c ) Read counted string from address Function: $+! ( c s -- ) Extend counted string at address Function: $! ( c s -- ) Store counted string at address Function: $. ( c -- ) Print counted string Function: $C+! ( char s -- ) Add one character to counted string at address
The original idea also contains :
$^ $? $/ $\
See the reference in the introduction.
Two tools, idea Albert Nijhof:
Function: -HEAD ( adr len i -- adr' len' ) cut first 'i' characters from string Function: -TAIL ( adr len i -- adr len' ) cut last 'i' characters from string
However that flies in the face of the goals mentionned in the introduction. We promised to get rid of characters, never count characters, only concern ourselves with strings.
A better example in this context is:
Function: -TRAILING ( c -- c' ) remove trailing blanks space from string. Function: -LEADING ( c -- c' ) remove leading blanks space from string.
Generic Forth
The idea of strings is that a character string (s) is in fact a counted string (c) that has been stored. s (c-addr) is the string, c (c-addr u) is constant string
: $VARIABLE \ Reserve space for a string buffer here swap 1+ allot align \ Reserve RAM buffer create ( here) , ( +n "name" -- ) does> @ ; ( -- s ) : C+! ( n a -- ) >r r@ c@ + r> c! ; \ Incr. byte with n at a : $@ ( s -- c ) count ; \ Fetch string : $+! ( c s -- ) >r tuck r@ $@ + swap move r> c+! ; \ Extend string : $! ( c s -- ) 0 over c! $+! ; \ Store string : $. ( c -- ) type ; \ Print string : $C+! ( char s -- ) dup >r $@ + c! 1 r> c+! ; \ Add char to string
Here is a version where the count is stored in a cell, it is hardly different.
Note that it uses the non Generic Forth word @+
you can find an implementation example in
the well known words list.
: $VARIABLE \ Reserve space for a string buffer here swap CELL+ allot align \ Reserve RAM buffer create ( here) , ( +n "name" -- ) does> @ ; ( -- s ) : $@ ( s -- c ) @+ ; \ Fetch string : $+! ( c s -- ) >r tuck r@ $@ + swap move r> +! ; \ Extend string : $! ( c s -- ) 0 over ! $+! ; \ Store string : $. ( c -- ) type ; \ Print string : $C+! ( char s -- ) dup >r $@ + c! 1 r> +! ; \ Add char to string
Implementations
Have a look at the sub directories for implementations for different systems.
- String word sets
- Primitive string word set, Simple string word set e.g. for file and OS interfacing
- Safe primitive string word set, Version with string overflow warning!
- Safe string word set v1, Version with string limiting
- Building strings, A different approach, author Albert Nijhof
- Etc.
Note that Albert Nijhof's string version puts the address of the structure of the $VARIABLE
on the stack. The original example puts the address of the string on the stack. Functionally they are equivalent.
Name | Alt-name | Function |
---|---|---|
S@ | GET$ | Read string variable |
$+! | ADD$ | Add string to string variable |
$! | SET$ | Store string in string variable |
$. | TYPE | Type string |
@C+! | INC$ | Add char to string variable |
String tools
Two string tools as implemented by Albert Nijhof.
- -HEAD
cuts the first 'i' characters from the given string.
- -TAIL
cuts the last 'i' characters from the given string.
\ Extra: cut i characters from a string, with underflow protection : -TAIL ( adr len i -- adr len' ) 0 max over min - ; : -HEAD ( adr len i -- adr' len' ) 0 max over min tuck - >r + r> ; \ -HEAD and -TAIL do not store anything.