Skip to Content

C: Strings

Understanding and manipulating textual data is fundamental to much programming. C strings are a bedrock knowledge area that you need, not only for programming in C/C++ but also to understand many other programming languages and environments.

The char datatype is an 8-bit integer datatype, but was really designed to hold language characters – well at least English and perhaps Western language characters. Each character is represented by a unique integer; for English this mapping is 7-bit ASCII, and for Western languages in general, ASCII is extended into 8-bit ISO 8859-1. 8 bits can represent 256 different values, and so this can handle lower and uppercase letters, accented letters, numbers, punctuation marks, and other special symbols and codes for most all Western languages. For other languages around the world, Unicode is an accepted standard, with various bit encodings.

In a Unix/Linux terminal, you can do the command man ascii to see an ASCII table, man iso-8859-1 to see an ISO 8859 table, and man utf-8 to learn more about one Unicode format.

C Strings

A C string is a sequence (an array) of non-zero character values that end with a character value of zero. Never forget this, they really are that simple. All characters in the string have some non-zero value, and then the string ends with a character value of zero, often called the null character. There are never any exceptions.

Because C strings store a terminating character of zero, the space required to store them is always one greater than the visible number of characters. Never forget this!

In C programming, string constants are formed using double quotes, and the compiler automatically creates space to store the string constant in memory, complete with the null character. So the string “Hello World!” takes up 13 bytes of memory. Character constants in your program are formed using single quotes, and these are single 1-byte characters; they are not C strings! So ‘A’ is a single character, while “A” is a C string taking up 2 bytes of memory – the visible character ‘A’ and the null character ‘\0’.

Special characters are formed using backslash notation, with a \ in front of a code character. The most common are:

Char Meaning
‘\0’ null character, byte value of 0
‘\n’ newline, indicates an end of a text line
‘\t’ tab character
‘\r’ carriage return, Windows uses \r\n for line endings
‘\’ an actual \ character

These can also be used in string constants.

Pointers, Memory Allocation, and (no) Operators

In C, pointers often point to arrays of elements (see this page for more information), and this is very true of pointers that point to strings. And as with all pointers, declaring a pointer does not automatically declare space for anything that the pointer might point to. So the two lines below are extremely wrong:

char* str;
strcpy(str, "hello world");

because the pointer str has not been set to point to any space yet, so trying to copy a string to it will cause a memory failure (hopefully!). The three lines below are still wrong:

char* str;
str = (char*) malloc(11);
strcpy(str, "hello world");

because space for strings must include space for the terminating null character. The string above needs 12 bytes.

C does not have built-in operators that compare, assign, or otherwise manipulate strings, but since pointers are valid data, the code below will compile, but it does not compare the strings:

char* str1;
char* str2;
// some code in here that makes str1 and str2 point to strings
if (str1 == str2) {
   // ...

the comparison above only compares the pointers themselves, not the strings that the pointers point to. If you want to test whether two strings are equal, you must do:

char* str1;
char* str2;
// some code in here that makes str1 and str2 point to strings
if (!strcmp(str1,str2)) {
   // ...

The ! is the not operator, and since strcmp() returns zero if the strings are equal, then with the !, the if condition is true when the strings are equal.

Library String Functions

All of the functions in the table below are defined in the header file string.h, except that malloc() and free() are defined in stdlib.h. The definitions below are not exactly correct, but are simplified for easier readability for new C programmers.

Library Function Meaning
int strlen(char* str) Returns length of string, not including the terminating null character
char *strcpy(char *dest, const char *src) copies string from src to dest, including the null character; dest must point to enough space! Returns dest
int strcmp(char* str1, char* str2) Returns 0 if strings are equal; returns negative value if str1 < str2, returns positive value if str1 > str2
char *strncpy(char *dest, const char *src, int n) copies at most n characters from string src to string dest, including the null character only if src is n-1 characters or less; dest must point to enough space!
int strncmp(char* str1, char* str2, int n) like strcmp() except only compares up to n characters
int strcasecmp(char* str1, char* str2) like strcmp() but ignores letter case; also strncasecmp()
int memcmp(void* ptr1, void* ptr2, size_t n) Compares n bytes of memory, returns 0 if equal, first value difference if not equal
void* malloc(int size) dynamically allocate size bytes of memory and return pointer to it; return value must be typecast as (char*) for string pointers
void free(void* ptr) free allocated memory; ptr must be a pointer value that was returned by malloc(); the entire allocated block will be freed

Other Resources

https://www.cprogramming.com/

https://www.learn-c.org/