What Really Happens Inside a Java HashMap

February 23, 2026

by Arif Ikhsanudin, Backend Developer

The structure: an array of buckets

A HashMap is an array — Node<K,V>[] table in the OpenJDK source — where each slot is a bucket. When you call put(key, value), the key's hash code is computed, transformed, and used to index into the array. The entry is stored at that index.

table[0]:  null
table[1]:  Node("order-123", order1) -> null
table[2]:  null
table[3]:  Node("user-456", user1) -> Node("user-789", user2) -> null  // collision
table[4]:  null
...

The array index is computed as (n - 1) & hash, where n is the table length (always a power of two) and hash is the processed hash code. The power-of-two table size makes this a fast bitwise AND rather than a modulo operation.

The hash function: why it's not just hashCode()

HashMap doesn't use key.hashCode() directly. It applies a spreading function first:

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

(h = key.hashCode()) ^ (h >>> 16) XORs the high 16 bits of the hash code into the low 16 bits. Why? For small tables, the index computation (n - 1) & hash only uses the low bits of the hash. If the hash function concentrates distinguishing information in the high bits, many keys would map to the same bucket even with good hashCode() implementations. The spreading function mixes high and low bits so the full hash contributes to bucket selection.

The practical consequence: a hashCode() that returns values with all distinguishing information in the high bits will cause more collisions than one that distributes well across all bits. For most classes, this isn't a concern — String.hashCode(), Integer.hashCode(), and Long.hashCode() distribute well. For custom hashCode() implementations that only vary in high bits, the spreading function helps but isn't a substitute for a well-designed hash function.

Collision handling: linked list to tree

Multiple keys can hash to the same bucket — a collision. The bucket stores a linked list of entries with the same hash index:

// Both keys hash to bucket 3
map.put("user-456", user1); // bucket 3: ["user-456" -> user1]
map.put("user-789", user2); // bucket 3: ["user-456" -> user1] -> ["user-789" -> user2]

Lookup traverses the linked list comparing keys with equals(). A bucket with n entries has O(n) lookup — the theoretical O(1) assumes low collision rate.

Since Java 8, when a bucket's linked list exceeds 8 entries, it converts to a Red-Black tree — O(log n) lookup and insertion instead of O(n). When entries drop below 6 (after removals), it converts back to a linked list. The thresholds (8 and 6) balance the overhead of tree structure against the benefit of O(log n) operations.

This treeification is the defense against hash flooding: an attacker who crafts keys with identical hash codes can degrade a HashMap to O(n) per operation — effectively a denial of service for web applications where the attacker controls request parameters. With treeification, the worst case becomes O(log n) even with all keys in one bucket.

For String keys specifically, Java applies a randomized hash seed (unavailable to user code) that prevents deterministic hash flooding. Keys of other types may still be vulnerable if hashCode() is predictable.

The equals() contract and key identity

get(key) finds entries by computing the hash to identify the bucket, then walking the bucket comparing keys with equals(). This means hashCode() and equals() must be consistent:

If a.equals(b), then a.hashCode() == b.hashCode().

The converse is not required — equal hash codes do not imply equal objects (that would eliminate collisions). But if two equal objects have different hash codes, get() looks in the wrong bucket and returns null even though the key is in the map.

The failure mode:

public class Order {
    private final String id;
    private String status; // mutable

    @Override
    public boolean equals(Object o) {
        return o instanceof Order other && id.equals(other.id);
    }

    @Override
    public int hashCode() {
        return Objects.hash(id, status); // includes mutable field — BUG
    }
}

Order order = new Order("ord-123", "pending");
Map<Order, String> map = new HashMap<>();
map.put(order, "value");

order.setStatus("shipped"); // mutates hashCode

map.get(order); // returns null — hashCode changed, wrong bucket
map.containsKey(order); // false — same reason

hashCode() must not depend on mutable state if the object is used as a HashMap key. A key whose hashCode() changes after insertion is lost in the map — it computes to a different bucket than where it was stored. The entry is still there; hashCode() just no longer finds it.

The rule: hashCode() should depend only on the same fields that equals() uses, and those fields should be immutable (or at minimum, never modified while the object is a key in a hash-based collection).

Resize behavior: when capacity doubles

The table starts at 16 entries (default) and doubles when the load factor threshold is crossed — 75% by default, meaning 12 entries triggers a resize to 32.

Resize is expensive: a new array of double size is allocated, and every existing entry is rehashed into the new array. Because the table size doubles (always a power of two), the new bucket index for each entry is either the same as the old index or old_index + old_capacity. This means only one bit of the hash determines which of two possible positions each entry moves to — the bit corresponding to the old capacity.

The resize cost is O(n) — proportional to the number of entries. For large maps that were not pre-sized, multiple resizes occur as entries accumulate: 16 → 32 → 64 → ... each resize rehashing all entries. Pre-sizing eliminates this:

// Without pre-sizing: resizes at 12, 24, 48, 96 entries (4 resizes for 100 entries)
Map<String, Order> map = new HashMap<>();

// With pre-sizing: no resizes
// capacity = ceil(expectedEntries / loadFactor) = ceil(100 / 0.75) = 134, round up to power of 2 = 256
Map<String, Order> map = new HashMap<>(134); // HashMap rounds up to next power of 2 internally

The formula: initialCapacity = (int) Math.ceil(expectedEntries / 0.75). Pass this to the constructor. The HashMap rounds it up to the next power of two internally.

null keys and null values

HashMap permits one null key (stored in bucket 0, since hash(null) == 0) and any number of null values. This is a deliberate design decision that distinguishes HashMap from Hashtable (no nulls) and ConcurrentHashMap (no null keys or values).

map.put(null, "value");       // permitted — stored in bucket 0
map.put("key", null);         // permitted — null value
map.get(null);                // returns "value"
map.containsKey(null);        // true
map.containsValue(null);      // true — O(n) scan

ConcurrentHashMap's prohibition on null keys and values is intentional: map.get(key) == null is ambiguous in a concurrent map — it might mean the key is absent, or that null was explicitly stored. In a single-threaded HashMap, you can check containsKey to disambiguate. In a concurrent map, the state might change between the get and the containsKey check, making the disambiguation inherently racy.

ConcurrentHashMap: not just a synchronized HashMap

ConcurrentHashMap uses a different internal structure from HashMap. The concurrency mechanism is not a single lock on the map — it locks individual buckets during writes, allowing concurrent reads and writes to different buckets simultaneously. Reads are generally lock-free.

The practical difference from Collections.synchronizedMap(new HashMap<>()):

synchronizedMap serializes all access — one thread at a time, for both reads and writes
ConcurrentHashMap allows concurrent reads, concurrent writes to different buckets, and single-bucket locking for writes to the same bucket

For maps with high read concurrency and moderate write rate, ConcurrentHashMap is significantly more scalable than synchronizedMap. For maps that are mostly written during initialization and then read concurrently, Collections.unmodifiableMap(new HashMap<>()) after construction is even better — no locking at all.

What this means in practice

Pre-size HashMap when you know the approximate entry count. Profile with realistic data if you suspect collision-related degradation — many entries in few buckets is visible via HashMap.size() vs bucket count (not directly exposed, but inferable from behavior). Use immutable or effectively-immutable objects as keys. Never modify a key's hashCode()-influencing fields after inserting it into a HashMap. For concurrent access, choose between ConcurrentHashMap, synchronizedMap, and unmodifiableMap based on the read/write ratio and whether the map is mutable after construction.

The O(1) average case is real and usually achieved. The cases where it's not — excessive collisions, repeated resizes, mutable keys — are all avoidable with the above.

Our offices

Follow us

What Really Happens Inside a Java HashMap

The structure: an array of buckets

The hash function: why it's not just hashCode()

Collision handling: linked list to tree

The equals() contract and key identity

Resize behavior: when capacity doubles

null keys and null values

ConcurrentHashMap: not just a synchronized HashMap

What this means in practice

Scale Your Backend - Need an Experienced Backend Developer?

Tell us about your project

Our offices

More articles

How Small Is a Microservice Supposed to Be

Virtual Threads in Java — What Changes, What Doesn't, and How to Migrate

Celebrating Small Wins Even When Things Go Wrong

Building a Reputation as a Remote Backend Contractor Takes Time. Here Is Where to Start.