What Really Happens Inside a Java HashMap
by Eric Hanson, Backend Developer at Clean Systems Consulting
The structure: an array of buckets
A HashMap is an array — Node<K,V>[] table in the OpenJDK source — where each slot is a bucket. When you call put(key, value), the key's hash code is computed, transformed, and used to index into the array. The entry is stored at that index.
table[0]: null
table[1]: Node("order-123", order1) -> null
table[2]: null
table[3]: Node("user-456", user1) -> Node("user-789", user2) -> null // collision
table[4]: null
...
The array index is computed as (n - 1) & hash, where n is the table length (always a power of two) and hash is the processed hash code. The power-of-two table size makes this a fast bitwise AND rather than a modulo operation.
The hash function: why it's not just hashCode()
HashMap doesn't use key.hashCode() directly. It applies a spreading function first:
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
(h = key.hashCode()) ^ (h >>> 16) XORs the high 16 bits of the hash code into the low 16 bits. Why? For small tables, the index computation (n - 1) & hash only uses the low bits of the hash. If the hash function concentrates distinguishing information in the high bits, many keys would map to the same bucket even with good hashCode() implementations. The spreading function mixes high and low bits so the full hash contributes to bucket selection.
The practical consequence: a hashCode() that returns values with all distinguishing information in the high bits will cause more collisions than one that distributes well across all bits. For most classes, this isn't a concern — String.hashCode(), Integer.hashCode(), and Long.hashCode() distribute well. For custom hashCode() implementations that only vary in high bits, the spreading function helps but isn't a substitute for a well-designed hash function.
Collision handling: linked list to tree
Multiple keys can hash to the same bucket — a collision. The bucket stores a linked list of entries with the same hash index:
// Both keys hash to bucket 3
map.put("user-456", user1); // bucket 3: ["user-456" -> user1]
map.put("user-789", user2); // bucket 3: ["user-456" -> user1] -> ["user-789" -> user2]
Lookup traverses the linked list comparing keys with equals(). A bucket with n entries has O(n) lookup — the theoretical O(1) assumes low collision rate.
Since Java 8, when a bucket's linked list exceeds 8 entries, it converts to a Red-Black tree — O(log n) lookup and insertion instead of O(n). When entries drop below 6 (after removals), it converts back to a linked list. The thresholds (8 and 6) balance the overhead of tree structure against the benefit of O(log n) operations.
This treeification is the defense against hash flooding: an attacker who crafts keys with identical hash codes can degrade a HashMap to O(n) per operation — effectively a denial of service for web applications where the attacker controls request parameters. With treeification, the worst case becomes O(log n) even with all keys in one bucket.
For String keys specifically, Java applies a randomized hash seed (unavailable to user code) that prevents deterministic hash flooding. Keys of other types may still be vulnerable if hashCode() is predictable.
The equals() contract and key identity
get(key) finds entries by computing the hash to identify the bucket, then walking the bucket comparing keys with equals(). This means hashCode() and equals() must be consistent:
If a.equals(b), then a.hashCode() == b.hashCode().
The converse is not required — equal hash codes do not imply equal objects (that would eliminate collisions). But if two equal objects have different hash codes, get() looks in the wrong bucket and returns null even though the key is in the map.
The failure mode:
public class Order {
private final String id;
private String status; // mutable
@Override
public boolean equals(Object o) {
return o instanceof Order other && id.equals(other.id);
}
@Override
public int hashCode() {
return Objects.hash(id, status); // includes mutable field — BUG
}
}
Order order = new Order("ord-123", "pending");
Map<Order, String> map = new HashMap<>();
map.put(order, "value");
order.setStatus("shipped"); // mutates hashCode
map.get(order); // returns null — hashCode changed, wrong bucket
map.containsKey(order); // false — same reason
hashCode() must not depend on mutable state if the object is used as a HashMap key. A key whose hashCode() changes after insertion is lost in the map — it computes to a different bucket than where it was stored. The entry is still there; hashCode() just no longer finds it.
The rule: hashCode() should depend only on the same fields that equals() uses, and those fields should be immutable (or at minimum, never modified while the object is a key in a hash-based collection).
Resize behavior: when capacity doubles
The table starts at 16 entries (default) and doubles when the load factor threshold is crossed — 75% by default, meaning 12 entries triggers a resize to 32.
Resize is expensive: a new array of double size is allocated, and every existing entry is rehashed into the new array. Because the table size doubles (always a power of two), the new bucket index for each entry is either the same as the old index or old_index + old_capacity. This means only one bit of the hash determines which of two possible positions each entry moves to — the bit corresponding to the old capacity.
The resize cost is O(n) — proportional to the number of entries. For large maps that were not pre-sized, multiple resizes occur as entries accumulate: 16 → 32 → 64 → ... each resize rehashing all entries. Pre-sizing eliminates this:
// Without pre-sizing: resizes at 12, 24, 48, 96 entries (4 resizes for 100 entries)
Map<String, Order> map = new HashMap<>();
// With pre-sizing: no resizes
// capacity = ceil(expectedEntries / loadFactor) = ceil(100 / 0.75) = 134, round up to power of 2 = 256
Map<String, Order> map = new HashMap<>(134); // HashMap rounds up to next power of 2 internally
The formula: initialCapacity = (int) Math.ceil(expectedEntries / 0.75). Pass this to the constructor. The HashMap rounds it up to the next power of two internally.
null keys and null values
HashMap permits one null key (stored in bucket 0, since hash(null) == 0) and any number of null values. This is a deliberate design decision that distinguishes HashMap from Hashtable (no nulls) and ConcurrentHashMap (no null keys or values).
map.put(null, "value"); // permitted — stored in bucket 0
map.put("key", null); // permitted — null value
map.get(null); // returns "value"
map.containsKey(null); // true
map.containsValue(null); // true — O(n) scan
ConcurrentHashMap's prohibition on null keys and values is intentional: map.get(key) == null is ambiguous in a concurrent map — it might mean the key is absent, or that null was explicitly stored. In a single-threaded HashMap, you can check containsKey to disambiguate. In a concurrent map, the state might change between the get and the containsKey check, making the disambiguation inherently racy.
ConcurrentHashMap: not just a synchronized HashMap
ConcurrentHashMap uses a different internal structure from HashMap. The concurrency mechanism is not a single lock on the map — it locks individual buckets during writes, allowing concurrent reads and writes to different buckets simultaneously. Reads are generally lock-free.
The practical difference from Collections.synchronizedMap(new HashMap<>()):
synchronizedMapserializes all access — one thread at a time, for both reads and writesConcurrentHashMapallows concurrent reads, concurrent writes to different buckets, and single-bucket locking for writes to the same bucket
For maps with high read concurrency and moderate write rate, ConcurrentHashMap is significantly more scalable than synchronizedMap. For maps that are mostly written during initialization and then read concurrently, Collections.unmodifiableMap(new HashMap<>()) after construction is even better — no locking at all.
What this means in practice
Pre-size HashMap when you know the approximate entry count. Profile with realistic data if you suspect collision-related degradation — many entries in few buckets is visible via HashMap.size() vs bucket count (not directly exposed, but inferable from behavior). Use immutable or effectively-immutable objects as keys. Never modify a key's hashCode()-influencing fields after inserting it into a HashMap. For concurrent access, choose between ConcurrentHashMap, synchronizedMap, and unmodifiableMap based on the read/write ratio and whether the map is mutable after construction.
The O(1) average case is real and usually achieved. The cases where it's not — excessive collisions, repeated resizes, mutable keys — are all avoidable with the above.