String handling

String handling

The idea

Character strings are mostly associated with dynamic memory and garbage collection. That is an overkill with the string handling that is used in most Forth programs. In particular we can get by with buffers that are statically allocated using CREATE. It is still useful to lift manipulating single characters to manipulating strings as a whole.

(We define) A few words that make string manipulation in forth a little smoother.
Original idea Albert Nijhof & Albert van der Horst. Examples are:

Manipulate files
Start programs
Add, delete and use folders/directories
Etc.

Construction of strings

Strings in Forth are of the type address & length. The length is stored in front of the string. There are two views possible. The classic view is to store the length in a byte.

The so called counted strings, as is shown in the picture:

Pseudo code

Function: $VARIABLE 
    reserve a buffer for the count-byte + 'maxlen' characters
    Alternatively" reserve a buffer for the count-cell + 'maxlen' characters
    Define: ( maxlen "name" -- )
          Save maxlen & buffer-address
    Action: ( -- s )
          Leave address of string variable

Function: $@   ( s -- c )
  Read counted string from address
Function: $+!  ( c s -- )
  Extend counted string at address
Function: $!   ( c s -- )
  Store counted string at address
Function: $.   ( c -- )
  Print counted string
Function: $C+! ( char s -- )
  Add one character to counted string at address

The original idea also contains :

 $^ $? $/ $\

See the reference in the introduction.

Two tools, idea Albert Nijhof:

Function: -HEAD ( adr len i -- adr' len' ) cut first 'i' characters from string
Function: -TAIL ( adr len i -- adr len' )  cut last  'i' characters from string

However that flies in the face of the goals mentionned in the introduction. We promised to get rid of characters, never count characters, only concern ourselves with strings.

A better example in this context is:

Function: -TRAILING ( c -- c' ) remove trailing blanks space from string.
Function: -LEADING  ( c -- c' ) remove leading  blanks space from string.

Generic Forth

The idea of strings is that a character string (s) is in fact a counted string (c) that has been stored. s (c-addr) is the string, c (c-addr u) is constant string

: $VARIABLE     \ Reserve space for a string buffer
    here  swap 1+ allot  align  \ Reserve RAM buffer
    create  ( here) ,       ( +n "name" -- )
    does>  @ ;              ( -- s )
 
: C+!   ( n a -- )      >r  r@ c@ +  r> c! ;    \ Incr. byte with n at a
: $@    ( s -- c )      count ;                 \ Fetch string
: $+!   ( c s -- )      >r  tuck  r@ $@ +  swap move  r> c+! ; \ Extend string 
: $!    ( c s -- )      0 over c!  $+! ;        \ Store string
: $.    ( c -- )        type ;                  \ Print string
: $C+!  ( char s -- )   dup >r  $@ + c!  1 r> c+! ; \ Add char to string

Here is a version where the count is stored in a cell, it is hardly different. Note that it uses the non Generic Forth word @+ you can find an implementation example in the well known words list.

: $VARIABLE     \ Reserve space for a string buffer
    here  swap CELL+ allot  align  \ Reserve RAM buffer
    create  ( here) ,       ( +n "name" -- )
    does>  @ ;              ( -- s )
 
: $@    ( s -- c )      @+  ;                  \ Fetch string
: $+!   ( c s -- )      >r  tuck  r@ $@ +  swap move  r> +! ; \ Extend string 
: $!    ( c s -- )      0 over !  $+! ;        \ Store string
: $.    ( c -- )        type ;                 \ Print string
: $C+!  ( char s -- )   dup >r  $@ + c!  1 r> +! ; \ Add char to string

Implementations

Have a look at the sub directories for implementations for different systems.

String word sets
- Primitive string word set, Simple string word set e.g. for file and OS interfacing
- Safe primitive string word set, Version with string overflow warning!
- Safe string word set v1, Version with string limiting
- Building strings, A different approach, author Albert Nijhof
- Etc.

Note that Albert Nijhof's string version puts the address of the structure of the $VARIABLE on the stack. The original example puts the address of the string on the stack. Functionally they are equivalent.

Name	Alt-name	Function
`S@`	`GET$`	Read string variable
`$+!`	`ADD$`	Add string to string variable
`$!`	`SET$`	Store string in string variable
`$.`	`TYPE`	Type string
`@C+!`	`INC$`	Add char to string variable

String tools

Two string tools as implemented by Albert Nijhof.
- -HEAD cuts the first 'i' characters from the given string.
- -TAIL cuts the last 'i' characters from the given string.

\ Extra: cut i characters from a string, with underflow protection
: -TAIL ( adr len i -- adr len' )   0 max  over min - ;
: -HEAD ( adr len i -- adr' len' )  0 max  over min  tuck - >r + r> ;
\ -HEAD and -TAIL do not store anything.

Table of Contents