{{pfw:banner.png}} ====== String handling ====== ===== The idea ===== Character strings are mostly associated with dynamic memory and garbage collection. That is an overkill with the string handling that is used in most Forth programs. In particular we can get by with buffers that are statically allocated using CREATE. It is still useful to lift manipulating single characters to manipulating strings as a whole. (We define) A few words that make string manipulation in forth a little smoother.\\ Original idea Albert Nijhof & [[https://home.hccnet.nl/a.w.m.van.der.horst/index.html|Albert van der Horst]]. Examples are: * Manipulate files * Start programs * Add, delete and use folders/directories * Etc. ===== Construction of strings ===== Strings in Forth are of the type address & length. The length is stored in front of the string. There are two views possible. The classic view is to store the length in a byte. The so called counted strings, as is shown in the picture: {{https://user-images.githubusercontent.com/11397265/142727480-4cb13037-c118-4d05-9eec-529aeaf23cad.jpg| string usage example}} ===== Pseudo code ===== Function: $VARIABLE reserve a buffer for the count-byte + 'maxlen' characters Alternatively" reserve a buffer for the count-cell + 'maxlen' characters Define: ( maxlen "name" -- ) Save maxlen & buffer-address Action: ( -- s ) Leave address of string variable Function: $@ ( s -- c ) Read counted string from address Function: $+! ( c s -- ) Extend counted string at address Function: $! ( c s -- ) Store counted string at address Function: $. ( c -- ) Print counted string Function: $C+! ( char s -- ) Add one character to counted string at address The original idea also contains : $^ $? $/ $\ See the reference in the introduction. Two tools, idea Albert Nijhof: Function: -HEAD ( adr len i -- adr' len' ) cut first 'i' characters from string Function: -TAIL ( adr len i -- adr len' ) cut last 'i' characters from string However that flies in the face of the goals mentionned in the introduction. We promised to get rid of characters, never count characters, only concern ourselves with strings. A better example in this context is: Function: -TRAILING ( c -- c' ) remove trailing blanks space from string. Function: -LEADING ( c -- c' ) remove leading blanks space from string. ===== Generic Forth ===== The idea of strings is that a character string (s) is in fact a counted string (c) that has been stored. s (c-addr) is the string, c (c-addr u) is constant string : $VARIABLE \ Reserve space for a string buffer here swap 1+ allot align \ Reserve RAM buffer create ( here) , ( +n "name" -- ) does> @ ; ( -- s ) : C+! ( n a -- ) >r r@ c@ + r> c! ; \ Incr. byte with n at a : $@ ( s -- c ) count ; \ Fetch string : $+! ( c s -- ) >r tuck r@ $@ + swap cmove r> c+! ; \ Extend string : $! ( c s -- ) 0 over c! $+! ; \ Store string : $. ( c -- ) type ; \ Print string : $C+! ( char s -- ) dup >r $@ + c! 1 r> c+! ; \ Add char to string Here is a version where the count is stored in a cell, it is hardly different. Note that it uses the non Generic Forth word ''%%@+%%'' you can find an implementation example in the [[https://project-forth-works.github.io/well-known-words.txt|well known words]] list. : $VARIABLE \ Reserve space for a string buffer here swap CELL+ allot align \ Reserve RAM buffer create ( here) , ( +n "name" -- ) does> @ ; ( -- s ) : $@ ( s -- c ) @+ ; \ Fetch string : $+! ( c s -- ) >r tuck r@ $@ + swap cmove r> +! ; \ Extend string : $! ( c s -- ) 0 over ! $+! ; \ Store string : $. ( c -- ) type ; \ Print string : $C+! ( char s -- ) dup >r $@ + c! 1 r> +! ; \ Add char to string ===== Implementations ===== Have a look at the sub directories for implementations for different systems. * String word sets * [[https://github.com/project-forth-works/project-forth-works/blob/main/Algorithms/String%20handling/Primitive-string-word-set.f|Primitive string word set]], Simple string word set e.g. for file and OS interfacing * [[https://github.com/project-forth-works/project-forth-works/blob/main/Algorithms/String%20handling/Safe-string-word-set-pr.f|Safe primitive string word set]], Version with string overflow warning! * [[https://github.com/project-forth-works/project-forth-works/blob/main/Algorithms/String%20handling/Safe-string-word-set.f|Safe string word set v1]], Version with string limiting * [[https://github.com/project-forth-works/project-forth-works/blob/main/Algorithms/String%20handling/building-strings-an.f|Building strings]], A different approach, author Albert Nijhof * Etc. Note that Albert Nijhof's string version puts the address of the structure of the ''%%$VARIABLE%%'' on the stack. The original example puts the address of the string on the stack. Functionally they are equivalent. ^ Name ^ Alt-name ^Function ^ | ''%%S@%%'' | ''%%GET$%%'' |Read string variable | | ''%%$+!%%'' | ''%%ADD$%%'' |Add string to string variable | | ''%%$!%%'' | ''%%SET$%%'' |Store string in string variable| | ''%%$.%%'' | ''%%TYPE%%'' |Type string | | ''%%@C+!%%'' | ''%%INC$%%'' |Add char to string variable | ===== String tools ===== Two string tools as implemented by Albert Nijhof.\\ - ''%%-HEAD%%'' cuts the first 'i' characters from the given string.\\ - ''%%-TAIL%%'' cuts the last 'i' characters from the given string. \ Extra: cut i characters from a string, with underflow protection : -TAIL ( adr len i -- adr len' ) 0 max over min - ; : -HEAD ( adr len i -- adr' len' ) 0 max over min tuck - >r + r> ; \ -HEAD and -TAIL do not store anything.