Introduction
If you need to look up a person’s phone number in an unsorted list, you have to check every single name (Linear Search). If the list is sorted alphabetically, you can find it much faster (Binary Search).
But what if you didn’t have to search at all? What if you could miraculously know exactly which page and which line the number was written on, the moment you heard the person’s name?
That is the magic of a Hash Table (often called a HashMap or Dictionary). It allows you to insert, delete, and look up data in $O(1)$ constant time on average.
How It Works
A Hash Table uses an array under the hood. The challenge is: how do we convert a non-integer key (like a String "Alice") into an integer index for that array?
We do this in two steps:
- Hash Function: Converts the key into a large integer (the hash code).
- Modulo Operator: Shrinks that large integer to fit within the bounds of our array.
Step 1: The Hash Function
A hash function takes an input (the key) and returns an integer. A good hash function must:
- Always return the same integer for the same input (deterministic).
- Distribute values evenly to avoid clumping.
- Be fast to calculate.
For example, a simplistic hash function for strings might sum the ASCII values of the characters: "Cat" $\rightarrow 67 + 97 + 116 = 280$
Step 2: The Array Index
Suppose our array size is 10. We cannot use index 280. We use the modulo operator (%) to find the remainder when divided by the array size: $280 \pmod{10} = 0$.
So, we store the value for "Cat" at array[0].
To retrieve the value later, we hash "Cat" again, get 280, modulo 10 to get 0, and look directly at array[0]. Zero searching required!
Dealing with Collisions
What happens if we hash a different key, say "Dog" ($68 + 111 + 103 = 282$), and later we hash "God" ($71 + 111 + 100 = 282$)? Both keys produce $282 \pmod{10} = 2$. This is called a collision.
Since two keys cannot occupy the exact same array slot, we need a way to resolve this. There are two main techniques:
1. Separate Chaining (Used in Java’s HashMap)
Instead of storing just the value at each array index, we store a Linked List. When a collision occurs, we simply append the new key-value pair to the end of the Linked List at that index.
Index 2: ["Dog" -> "Woof"] -> ["God" -> "Divine"]
When looking up "God", we go to index 2, and then sequentially search the short Linked List for the key "God".
2. Open Addressing (Linear Probing)
If the calculated index is already occupied, we simply look at the next index. If that is occupied, we look at the next, and so on until we find an empty slot.
-
"Dog"goes to index 2. -
"God"calculates index 2. It’s full. It moves to index 3. It’s empty."God"goes to index 3.
Load Factor & Resizing
If an array has 10 slots and we insert 100 items using Separate Chaining, the linked lists will become very long ($O(n)$ time to search).
To maintain $O(1)$ performance, Hash Tables track their Load Factor (number of entries $\div$ array size). In Java, the default max load factor is 0.75.
Once the table is 75% full, it resizes:
- It creates a new array usually twice the size.
- It rehashes every existing key using the new array size (since
hash % 10is different fromhash % 20). - This operation is expensive ($O(n)$), but happens infrequently enough that the amortised time is still $O(1)$.
Java Implementation Example
In Java, you rarely build a Hash Table from scratch. The java.util.HashMap is one of the most heavily used classes in the entire language.
import java.util.HashMap;
import java.util.Map;
public class HashTableExample {
public static void main(String[] args) {
// Create a HashMap with String keys and Integer values
Map<String, Integer> phoneBook = new HashMap<>();
// O(1) Insertion
phoneBook.put("Alice", 5551234);
phoneBook.put("Bob", 5559876);
phoneBook.put("Charlie", 5555555);
// O(1) Lookup
int bobsNumber = phoneBook.get("Bob");
System.out.println("Bob's number: " + bobsNumber); // 5559876
// O(1) Search for Key
if (phoneBook.containsKey("Alice")) {
System.out.println("Alice is in the phonebook.");
}
// Handle missing keys gracefully
System.out.println("Dave's number: " + phoneBook.getOrDefault("Dave", 0));
// O(1) Deletion
phoneBook.remove("Charlie");
}
}
Java HashMap Internal Optimisation
A fascinating detail about Java 8+: if a single bucket in a HashMap gets too many collisions (more than 8 items in the Linked List), Java automatically converts that Linked List into a Red-Black Tree. This guarantees worst-case lookup time drops from $O(n)$ to $O(\log n)$ for that heavily-collided bucket!
Time & Space Complexity
| Operation | Average Case | Worst Case (Many Collisions) |
|---|---|---|
| Insert | $O(1)$ | $O(n)$ |
| Lookup | $O(1)$ | $O(n)$ |
| Delete | $O(1)$ | $O(n)$ |
| Space | $O(n)$ | |:—:|:—:|
Note: The worst-case $O(n)$ only happens if your hash function is terrible or an attacker intentionally feeds you keys that all hash to the same bucket (a Hash Denial of Service attack).
When to Use / When NOT to Use
| ✅ Use Hash Tables When | ❌ Avoid Hash Tables When |
|---|---|
| Fast lookups, insertions, and deletions are critical | Data needs to be retrieved in a sorted order (HashMaps are inherently unordered) |
| You need to count frequencies of items (e.g., counting word occurrences) | You need the prefix of a string (use a Trie) |
| Caching previously computed results (Memoization) | Memory is extremely limited (Hash Tables have memory overhead for the array and linked list nodes) |
Practice Problems
The Hash Map is the “cheat code” for coding interviews. Many $O(n^2)$ array problems become $O(n)$ when you throw a HashMap at them.
- Two Sum – LeetCode #1 — Solve this $O(n^2)$ classic in $O(n)$ using a HashMap to store the numbers you have seen so far.
- Valid Anagram – LeetCode #242 — Use a Hash Map to count character frequencies.
- LRU Cache – LeetCode #146 — A challenging design problem combining a HashMap for $O(1)$ lookups and a Doubly-Linked List for $O(1)$ updates.
What’s Next?
We just mentioned Doubly-Linked Lists. In our next post, we will explore this structure in depth, looking at how elements are chained together by pointers in memory: Linked Lists — Singly, Doubly & Circular.