Hash Table Implementation

PROJECT: IMPLEMENT A HASH TABLE

So far, we’ve seen arrays (fast access by index, slow search) and linked lists (slow search, flexible size). Now, we will build a data structure that offers the best of both worlds: the HASH TABLE (also known as a Hash Map).

WHAT IS A HASH TABLE? A Hash Table is a data structure that maps KEYS to VALUES. You give it a key (e.g., a person’s name), and it can very quickly find the associated value (e.g., their phone number). The goal is to achieve an average time complexity of O(1) – constant time – for insertion, deletion, and searching. This is incredibly fast and is why hash tables are used everywhere, from databases to compiler symbol tables to the dictionaries/maps in modern languages like Python and JavaScript.

HOW DOES IT WORK?

THE HASH FUNCTION: The magic is in the HASH FUNCTION. This is a function that takes a key (which can be a string, number, etc.) and converts it into an integer index. This index tells us where to store the value in an underlying array. A good hash function distributes keys evenly across the array to minimize collisions.
THE ARRAY (TABLE): The hash table uses a simple array as its backbone. The index generated by the hash function corresponds to a “bucket” or “slot” in this array.
COLLISION HANDLING: What if two different keys hash to the same index? This is called a COLLISION. It’s inevitable. We will handle collisions using a method called SEPARATE CHAINING. This means that each slot in our array won’t hold the value directly. Instead, each slot will be a pointer to the HEAD of a LINKED LIST. If a collision occurs, we just add the new key-value pair as a new node in that linked list.

OUR IMPLEMENTATION PLAN:

Define an Entry struct (a node in our linked list for a key-value pair).
Define a HashTable struct (which holds the main array of entry pointers).
Write a hash_function to convert string keys to array indices.
Implement ht_create to initialize the table.
Implement ht_insert to add or update key-value pairs.
Implement ht_search to retrieve a value by its key.
Implement ht_delete to remove a key-value pair.
Implement ht_free to clean up all allocated memory.

This is a simple hash function. It sums the ASCII values of the characters in the key and then uses the modulo operator to ensure the result is within the bounds of our table size.

A good hash function is crucial for performance, but this is fine for learning.

Full Source

/**
 * @file 28_hash_table_implementation.c
 * @brief Part 4, Lesson 28: Hash Table Implementation
 * @author dunamismax
 * @date 06-15-2025
 *
 * This file is a major project: we will implement a Hash Table, one of the
 * most important and powerful data structures in all of computer science.
 */

/*
 * =====================================================================================
 * |                                   - LESSON START -                                  |
 * =====================================================================================
 *
 * PROJECT: IMPLEMENT A HASH TABLE
 *
 * So far, we've seen arrays (fast access by index, slow search) and linked lists
 * (slow search, flexible size). Now, we will build a data structure that offers
 * the best of both worlds: the HASH TABLE (also known as a Hash Map).
 *
 * WHAT IS A HASH TABLE?
 * A Hash Table is a data structure that maps KEYS to VALUES. You give it a key
 * (e.g., a person's name), and it can very quickly find the associated value
 * (e.g., their phone number). The goal is to achieve an average time complexity
 * of O(1) -- constant time -- for insertion, deletion, and searching. This is
 * incredibly fast and is why hash tables are used everywhere, from databases to
 * compiler symbol tables to the dictionaries/maps in modern languages like
 * Python and JavaScript.
 *
 * HOW DOES IT WORK?
 * 1. THE HASH FUNCTION: The magic is in the HASH FUNCTION. This is a function
 *    that takes a key (which can be a string, number, etc.) and converts it
 *    into an integer index. This index tells us where to store the value in
 *    an underlying array. A good hash function distributes keys evenly across
 *    the array to minimize collisions.
 *
 * 2. THE ARRAY (TABLE): The hash table uses a simple array as its backbone. The
 *    index generated by the hash function corresponds to a "bucket" or "slot"
 *    in this array.
 *
 * 3. COLLISION HANDLING: What if two different keys hash to the same index?
 *    This is called a COLLISION. It's inevitable. We will handle collisions
 *    using a method called SEPARATE CHAINING. This means that each slot in our
 *    array won't hold the value directly. Instead, each slot will be a pointer
 *    to the HEAD of a LINKED LIST. If a collision occurs, we just add the new
 *    key-value pair as a new node in that linked list.
 *
 * OUR IMPLEMENTATION PLAN:
 * 1. Define an `Entry` struct (a node in our linked list for a key-value pair).
 * 2. Define a `HashTable` struct (which holds the main array of entry pointers).
 * 3. Write a `hash_function` to convert string keys to array indices.
 * 4. Implement `ht_create` to initialize the table.
 * 5. Implement `ht_insert` to add or update key-value pairs.
 * 6. Implement `ht_search` to retrieve a value by its key.
 * 7. Implement `ht_delete` to remove a key-value pair.
 * 8. Implement `ht_free` to clean up all allocated memory.
 */

// --- Required Headers ---
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// --- Part 1: Data Structures and Constants ---

#define TABLE_SIZE 10 // A small size for easy demonstration

// A single key-value entry. This is also a node in a linked list.
typedef struct Entry
{
    char *key;
    char *value;
    struct Entry *next;
} Entry;

// The Hash Table itself.
typedef struct HashTable
{
    // The table is an array of pointers to Entry structs.
    // This is an array of linked list heads.
    Entry **entries;
} HashTable;

// --- Part 2: The Hash Function ---

/*
 * This is a simple hash function. It sums the ASCII values of the characters
 * in the key and then uses the modulo operator to ensure the result is
 * within the bounds of our table size.
 *
 * A good hash function is crucial for performance, but this is fine for learning.
 */
unsigned int hash_function(const char *key)
{
    unsigned long int value = 0;
    unsigned int i = 0;
    unsigned int key_len = strlen(key);

    // Sum the ASCII values of the characters
    for (; i < key_len; ++i)
    {
        value = value * 37 + key[i]; // A slightly better algorithm (djb2-ish)
    }

    // Return the value modulo the table size
    return value % TABLE_SIZE;
}

// --- Part 3: Core Hash Table Operations ---

/**
 * @brief Creates and initializes a new hash table.
 * @return A pointer to the newly created HashTable, or NULL on failure.
 */
HashTable *ht_create(void)
{
    // Allocate memory for the HashTable structure itself.
    HashTable *hashtable = malloc(sizeof(HashTable));
    if (hashtable == NULL)
    {
        return NULL;
    }

    // Allocate memory for the array of Entry pointers ("buckets").
    // We use calloc to initialize all pointers in the array to NULL.
    hashtable->entries = calloc(TABLE_SIZE, sizeof(Entry *));
    if (hashtable->entries == NULL)
    {
        free(hashtable); // Clean up partially allocated memory
        return NULL;
    }

    return hashtable;
}

/**
 * @brief Creates a new key-value entry.
 */
Entry *create_entry(const char *key, const char *value)
{
    Entry *entry = malloc(sizeof(Entry));
    entry->key = malloc(strlen(key) + 1);
    entry->value = malloc(strlen(value) + 1);

    strcpy(entry->key, key);
    strcpy(entry->value, value);
    entry->next = NULL;

    return entry;
}

/**
 * @brief Inserts a key-value pair into the hash table. Updates value if key exists.
 */
void ht_insert(HashTable *hashtable, const char *key, const char *value)
{
    unsigned int index = hash_function(key);
    Entry *current_entry = hashtable->entries[index];

    // Traverse the linked list at this index to see if the key already exists.
    while (current_entry != NULL)
    {
        if (strcmp(current_entry->key, key) == 0)
        {
            // Key found, so update the value.
            free(current_entry->value); // Free the old value
            current_entry->value = malloc(strlen(value) + 1);
            strcpy(current_entry->value, value);
            return; // We're done.
        }
        current_entry = current_entry->next;
    }

    // Key not found. Create a new entry and insert it at the head of the list.
    Entry *new_entry = create_entry(key, value);
    new_entry->next = hashtable->entries[index];
    hashtable->entries[index] = new_entry;
}

/**
 * @brief Searches for a key in the hash table.
 * @return The value associated with the key, or NULL if the key is not found.
 */
char *ht_search(HashTable *hashtable, const char *key)
{
    unsigned int index = hash_function(key);
    Entry *entry = hashtable->entries[index];

    // Traverse the linked list at the calculated index.
    while (entry != NULL)
    {
        if (strcmp(entry->key, key) == 0)
        {
            // Key found! Return its value.
            return entry->value;
        }
        entry = entry->next;
    }

    // Key not found.
    return NULL;
}

/**
 * @brief Deletes a key-value pair from the hash table.
 */
void ht_delete(HashTable *hashtable, const char *key)
{
    unsigned int index = hash_function(key);
    Entry *entry = hashtable->entries[index];
    Entry *prev = NULL;

    // Traverse the list to find the entry to delete
    while (entry != NULL && strcmp(entry->key, key) != 0)
    {
        prev = entry;
        entry = entry->next;
    }

    // If entry is NULL, the key wasn't found.
    if (entry == NULL)
    {
        return;
    }

    // Relink the list
    if (prev == NULL)
    {
        // The item to delete is the head of the list
        hashtable->entries[index] = entry->next;
    }
    else
    {
        // The item is in the middle or at the end
        prev->next = entry->next;
    }

    // Free the memory for the deleted entry
    free(entry->key);
    free(entry->value);
    free(entry);
}

/**
 * @brief Frees all memory used by the hash table.
 */
void ht_free(HashTable *hashtable)
{
    for (int i = 0; i < TABLE_SIZE; ++i)
    {
        Entry *entry = hashtable->entries[i];
        while (entry != NULL)
        {
            Entry *temp = entry;
            entry = entry->next;
            free(temp->key);
            free(temp->value);
            free(temp);
        }
    }

    free(hashtable->entries);
    free(hashtable);
}

/**
 * @brief A helper function to print the contents of the hash table.
 */
void ht_print(HashTable *hashtable)
{
    printf("\n--- Hash Table Contents ---\n");
    for (int i = 0; i < TABLE_SIZE; ++i)
    {
        printf("Bucket[%d]: ", i);
        Entry *entry = hashtable->entries[i];
        if (entry == NULL)
        {
            printf("~empty~\n");
            continue;
        }
        while (entry != NULL)
        {
            printf(" -> [\"%s\": \"%s\"]", entry->key, entry->value);
            entry = entry->next;
        }
        printf("\n");
    }
    printf("---------------------------\n");
}

// --- Main Function for Demonstration ---
int main(void)
{
    printf("Creating a new hash table.\n");
    HashTable *ht = ht_create();

    printf("\nInserting key-value pairs...\n");
    ht_insert(ht, "name", "John Doe");
    ht_insert(ht, "age", "30");
    ht_insert(ht, "city", "New York");
    // These two keys will likely cause a collision with TABLE_SIZE 10
    ht_insert(ht, "country", "USA");
    ht_insert(ht, "language", "C");

    ht_print(ht);

    printf("\nSearching for keys...\n");
    char *name = ht_search(ht, "name");
    char *job = ht_search(ht, "job"); // This key doesn't exist

    printf("Value for 'name': %s\n", name ? name : "Not Found");
    printf("Value for 'job': %s\n", job ? job : "Not Found");

    printf("\nDeleting key 'age'...\n");
    ht_delete(ht, "age");
    ht_print(ht);

    printf("\nUpdating key 'city'...\n");
    ht_insert(ht, "city", "Los Angeles"); // This should update the existing entry
    ht_print(ht);

    printf("\nFreeing all hash table memory...\n");
    ht_free(ht);
    printf("Memory freed successfully.\n");

    return 0;
}

/*
 * =====================================================================================
 * |                                    - LESSON END -                                   |
 * =====================================================================================
 *
 * Congratulations on building a complete hash table! You've implemented one of the
 * most fundamental and high-performance data structures. You used dynamic memory,
 * pointers to pointers, structs, linked lists (for collision handling), and
 * string manipulation. This is a massive achievement.
 *
 * You now understand the core principles behind the dictionaries, maps, and objects
 * that power modern, high-level programming languages.
 *
 * HOW TO COMPILE AND RUN THIS CODE:
 *
 * 1. Open a terminal and compile the program:
 *    `gcc -Wall -Wextra -std=c11 -o 28_hash_table_implementation 28_hash_table_implementation.c`
 *
 * 2. Run the executable:
 *    `./28_hash_table_implementation`
 *
 * 3. Observe the output. Pay close attention to how the `ht_print` function shows
 *    the state of the table after each operation, especially how collisions
 *    result in linked lists within a single bucket.
 */

How to Compile and Run

cc -Wall -Wextra -std=c11 -o 28_hash_table_implementation 28_hash_table_implementation.c
./28_hash_table_implementation

Keyboard shortcuts

C From The Ground Up

Hash Table Implementation

Full Source

How to Compile and Run