What Really Happens Inside a Java HashMap

by Eric Hanson, Backend Developer at Clean Systems Consulting

The structure: an array of buckets

A HashMap is an array — Node<K,V>[] table in the OpenJDK source — where each slot is a bucket. When you call put(key, value), the key's hash code is computed, transformed, and used to index into the array. The entry is stored at that index.

table[0]:  null
table[1]:  Node("order-123", order1) -> null
table[2]:  null
table[3]:  Node("user-456", user1) -> Node("user-789", user2) -> null  // collision
table[4]:  null
...

The array index is computed as (n - 1) & hash, where n is the table length (always a power of two) and hash is the processed hash code. The power-of-two table size makes this a fast bitwise AND rather than a modulo operation.

The hash function: why it's not just hashCode()

HashMap doesn't use key.hashCode() directly. It applies a spreading function first:

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

(h = key.hashCode()) ^ (h >>> 16) XORs the high 16 bits of the hash code into the low 16 bits. Why? For small tables, the index computation (n - 1) & hash only uses the low bits of the hash. If the hash function concentrates distinguishing information in the high bits, many keys would map to the same bucket even with good hashCode() implementations. The spreading function mixes high and low bits so the full hash contributes to bucket selection.

The practical consequence: a hashCode() that returns values with all distinguishing information in the high bits will cause more collisions than one that distributes well across all bits. For most classes, this isn't a concern — String.hashCode(), Integer.hashCode(), and Long.hashCode() distribute well. For custom hashCode() implementations that only vary in high bits, the spreading function helps but isn't a substitute for a well-designed hash function.

Collision handling: linked list to tree

Multiple keys can hash to the same bucket — a collision. The bucket stores a linked list of entries with the same hash index:

// Both keys hash to bucket 3
map.put("user-456", user1); // bucket 3: ["user-456" -> user1]
map.put("user-789", user2); // bucket 3: ["user-456" -> user1] -> ["user-789" -> user2]

Lookup traverses the linked list comparing keys with equals(). A bucket with n entries has O(n) lookup — the theoretical O(1) assumes low collision rate.

Since Java 8, when a bucket's linked list exceeds 8 entries, it converts to a Red-Black tree — O(log n) lookup and insertion instead of O(n). When entries drop below 6 (after removals), it converts back to a linked list. The thresholds (8 and 6) balance the overhead of tree structure against the benefit of O(log n) operations.

This treeification is the defense against hash flooding: an attacker who crafts keys with identical hash codes can degrade a HashMap to O(n) per operation — effectively a denial of service for web applications where the attacker controls request parameters. With treeification, the worst case becomes O(log n) even with all keys in one bucket.

For String keys specifically, Java applies a randomized hash seed (unavailable to user code) that prevents deterministic hash flooding. Keys of other types may still be vulnerable if hashCode() is predictable.

The equals() contract and key identity

get(key) finds entries by computing the hash to identify the bucket, then walking the bucket comparing keys with equals(). This means hashCode() and equals() must be consistent:

If a.equals(b), then a.hashCode() == b.hashCode().

The converse is not required — equal hash codes do not imply equal objects (that would eliminate collisions). But if two equal objects have different hash codes, get() looks in the wrong bucket and returns null even though the key is in the map.

The failure mode:

public class Order {
    private final String id;
    private String status; // mutable

    @Override
    public boolean equals(Object o) {
        return o instanceof Order other && id.equals(other.id);
    }

    @Override
    public int hashCode() {
        return Objects.hash(id, status); // includes mutable field — BUG
    }
}

Order order = new Order("ord-123", "pending");
Map<Order, String> map = new HashMap<>();
map.put(order, "value");

order.setStatus("shipped"); // mutates hashCode

map.get(order); // returns null — hashCode changed, wrong bucket
map.containsKey(order); // false — same reason

hashCode() must not depend on mutable state if the object is used as a HashMap key. A key whose hashCode() changes after insertion is lost in the map — it computes to a different bucket than where it was stored. The entry is still there; hashCode() just no longer finds it.

The rule: hashCode() should depend only on the same fields that equals() uses, and those fields should be immutable (or at minimum, never modified while the object is a key in a hash-based collection).

Resize behavior: when capacity doubles

The table starts at 16 entries (default) and doubles when the load factor threshold is crossed — 75% by default, meaning 12 entries triggers a resize to 32.

Resize is expensive: a new array of double size is allocated, and every existing entry is rehashed into the new array. Because the table size doubles (always a power of two), the new bucket index for each entry is either the same as the old index or old_index + old_capacity. This means only one bit of the hash determines which of two possible positions each entry moves to — the bit corresponding to the old capacity.

The resize cost is O(n) — proportional to the number of entries. For large maps that were not pre-sized, multiple resizes occur as entries accumulate: 16 → 32 → 64 → ... each resize rehashing all entries. Pre-sizing eliminates this:

// Without pre-sizing: resizes at 12, 24, 48, 96 entries (4 resizes for 100 entries)
Map<String, Order> map = new HashMap<>();

// With pre-sizing: no resizes
// capacity = ceil(expectedEntries / loadFactor) = ceil(100 / 0.75) = 134, round up to power of 2 = 256
Map<String, Order> map = new HashMap<>(134); // HashMap rounds up to next power of 2 internally

The formula: initialCapacity = (int) Math.ceil(expectedEntries / 0.75). Pass this to the constructor. The HashMap rounds it up to the next power of two internally.

null keys and null values

HashMap permits one null key (stored in bucket 0, since hash(null) == 0) and any number of null values. This is a deliberate design decision that distinguishes HashMap from Hashtable (no nulls) and ConcurrentHashMap (no null keys or values).

map.put(null, "value");       // permitted — stored in bucket 0
map.put("key", null);         // permitted — null value
map.get(null);                // returns "value"
map.containsKey(null);        // true
map.containsValue(null);      // true — O(n) scan

ConcurrentHashMap's prohibition on null keys and values is intentional: map.get(key) == null is ambiguous in a concurrent map — it might mean the key is absent, or that null was explicitly stored. In a single-threaded HashMap, you can check containsKey to disambiguate. In a concurrent map, the state might change between the get and the containsKey check, making the disambiguation inherently racy.

ConcurrentHashMap: not just a synchronized HashMap

ConcurrentHashMap uses a different internal structure from HashMap. The concurrency mechanism is not a single lock on the map — it locks individual buckets during writes, allowing concurrent reads and writes to different buckets simultaneously. Reads are generally lock-free.

The practical difference from Collections.synchronizedMap(new HashMap<>()):

  • synchronizedMap serializes all access — one thread at a time, for both reads and writes
  • ConcurrentHashMap allows concurrent reads, concurrent writes to different buckets, and single-bucket locking for writes to the same bucket

For maps with high read concurrency and moderate write rate, ConcurrentHashMap is significantly more scalable than synchronizedMap. For maps that are mostly written during initialization and then read concurrently, Collections.unmodifiableMap(new HashMap<>()) after construction is even better — no locking at all.

What this means in practice

Pre-size HashMap when you know the approximate entry count. Profile with realistic data if you suspect collision-related degradation — many entries in few buckets is visible via HashMap.size() vs bucket count (not directly exposed, but inferable from behavior). Use immutable or effectively-immutable objects as keys. Never modify a key's hashCode()-influencing fields after inserting it into a HashMap. For concurrent access, choose between ConcurrentHashMap, synchronizedMap, and unmodifiableMap based on the read/write ratio and whether the map is mutable after construction.

The O(1) average case is real and usually achieved. The cases where it's not — excessive collisions, repeated resizes, mutable keys — are all avoidable with the above.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

The Most Dangerous Developer in a Company Is the One Nobody Can Replace

Every company has that one developer everyone depends on. At first it feels like a strength—until it quietly becomes your biggest risk.

Read more

How to Laugh at Yourself After a Huge Mistake

We’ve all been there: the code breaks, the email goes to the wrong person, or the deployment wipes out production. Learning to laugh at these moments can save your sanity and even make you a better professional.

Read more

The Strategy Pattern in Java — Replacing Conditional Dispatch With Polymorphism

Conditional dispatch — switching on a type or status to select behavior — is the most common source of rigid code in Java applications. The strategy pattern replaces the switch with polymorphism, but the right implementation depends on what varies and how often it changes.

Read more

How I Use Ruby's Struct and Data Classes in Production

Struct and Data solve the same problem — lightweight named containers — but their different defaults around mutability and equality make them suited to different jobs. Here is where each one earns its place.

Read more