Java Memory Leaks That Don't Show Up in Heap Dumps

by Eric Hanson, Backend Developer at Clean Systems Consulting

The symptom that doesn't match the tools

The process RSS grows steadily over days. You take a heap dump — heap usage looks normal, no obvious leak suspects, dominator tree shows nothing alarming. You restart the process, RSS drops, then climbs again. The heap dump told you nothing because the leak isn't on the heap.

Java processes use memory beyond the heap: Metaspace for class metadata, direct buffers allocated outside the heap, thread stacks, JIT-compiled code cache, and any memory allocated by native code through JNI. None of these appear in a heap dump. Diagnosing them requires different tools and a different mental model.

Metaspace leaks — class loader accumulation

Metaspace holds class metadata. Classes are unloaded only when their class loader is GC'd. In application servers, OSGi containers, plugin architectures, and any system that dynamically loads code, class loaders are created and discarded regularly. If something holds a reference to a class, an object of that class, or the class loader itself, the entire class loader — and all classes it loaded — cannot be unloaded.

The pattern that causes it:

// A framework that creates a new ClassLoader per deployment
URLClassLoader pluginLoader = new URLClassLoader(pluginUrls, parentLoader);
Class<?> pluginClass = pluginLoader.loadClass("com.plugin.Main");
Object plugin = pluginClass.getDeclaredConstructor().newInstance();

// The plugin instance is stored in a static registry
PluginRegistry.register("my-plugin", plugin); // strong reference to plugin instance
// plugin -> pluginClass -> pluginLoader -> all classes loaded by pluginLoader
// PluginRegistry.register holds a strong reference chain — nothing can be unloaded

When the plugin is "undeployed," pluginLoader can only be GC'd if every reference to every object of every class it loaded is released. One lingering reference — a static map entry, a thread-local, an executor's thread that ran a task from the plugin — prevents the entire class loader from unloading.

Detecting Metaspace leaks: watch jvm.memory.used{area="nonheap"} over deployment cycles. It should return to the same baseline after each plugin unload. Steady growth across redeployments indicates class loader leaks.

MAT can find class loader leaks: look for multiple instances of the same class loaded by different class loaders, or class loaders with large retained heap. The "OQL" (Object Query Language) view lets you query: SELECT * FROM java.lang.ClassLoader.

Fixing them requires auditing every static field and thread-local for references to classes from the plugin's class loader, and ensuring executors don't retain thread-local state from plugin threads.

DirectByteBuffer leaks

DirectByteBuffer allocates memory outside the Java heap using malloc. It's used by NIO channels, Netty, gRPC, and any I/O library that wants to avoid copying between JVM heap and OS buffers. The allocation is invisible to heap dumps — a 1GB direct buffer doesn't appear in heap analysis at all.

// Allocates off-heap — does NOT appear in heap dump
ByteBuffer buffer = ByteBuffer.allocateDirect(1024 * 1024 * 100); // 100MB off-heap

Direct buffers are freed when the DirectByteBuffer Java object is GC'd and its Cleaner runs. The problem: GC pressure on the heap is what triggers collection, but the heap object for a direct buffer is tiny — a few hundred bytes. The heap can be nearly empty while gigabytes of direct memory are held by DirectByteBuffer objects that haven't been collected yet.

The direct memory limit: -XX:MaxDirectMemorySize (defaults to -Xmx value). When it's exceeded, System.gc() is triggered to try freeing direct buffers. If that doesn't free enough, OutOfMemoryError: Direct buffer memory.

Monitoring direct memory:

// Via JMX
BufferPoolMXBean directPool = ManagementFactory.getPlatformMXBeans(BufferPoolMXBean.class)
    .stream()
    .filter(p -> p.getName().equals("direct"))
    .findFirst()
    .orElseThrow();

System.out.println("Direct memory used: " + directPool.getMemoryUsed());
System.out.println("Direct buffer count: " + directPool.getCount());

Or via Micrometer: jvm.buffer.memory.used{id="direct"}.

Growing direct buffer count with stable heap is the signature of a direct buffer leak. The usual cause: ByteBuffer.allocateDirect() in a code path that runs frequently, with the buffers not being released explicitly and the heap not experiencing enough GC pressure to clean them up.

The fix for high-frequency allocation: pool direct buffers. Netty's PooledByteBufAllocator is the production solution for NIO-heavy applications. For application code, a simple pool:

private static final Deque<ByteBuffer> BUFFER_POOL = new ArrayDeque<>();

public static ByteBuffer acquire(int capacity) {
    ByteBuffer buf = BUFFER_POOL.poll();
    if (buf == null || buf.capacity() < capacity) {
        return ByteBuffer.allocateDirect(capacity);
    }
    buf.clear();
    return buf;
}

public static void release(ByteBuffer buf) {
    BUFFER_POOL.push(buf);
}

Thread stack leaks

Each Java thread has a stack. Default stack size is 512KB–1MB depending on platform and JVM flags (-Xss to configure). A thread pool with 200 threads holds 100–200MB in thread stacks alone — not on the heap, not in Metaspace.

Thread leaks — threads created but never stopped — are a form of native memory leak:

// Thread created per request, never joined or pooled
public void handleRequest(Request request) {
    Thread worker = new Thread(() -> process(request));
    worker.start();
    // worker is started but never tracked — if process() hangs, this thread leaks
}

Each leaked thread holds its stack in native memory. 1,000 leaked threads is 500MB–1GB of native memory, invisible to heap analysis.

Monitoring: jvm.threads.live and jvm.threads.daemon via Micrometer. A growing thread count that doesn't return to baseline is a thread leak. jstack <pid> dumps all thread states — look for hundreds of threads in WAITING or TIMED_WAITING with the same stack trace, indicating stuck or leaked threads.

Fix: always use bounded thread pools (ThreadPoolExecutor with a fixed max), set thread timeouts, and monitor pool queue depth alongside thread count.

JNI and native library allocations

Code that calls native libraries through JNI (System.loadLibrary) can allocate memory in the native heap that is entirely invisible to the JVM. This memory has no GC, no heap dump visibility, and no JMX monitoring. The native library is responsible for freeing it.

Leaks in this category come from:

  • JNI code that allocates and doesn't free on error paths
  • Native library bugs
  • Incorrect usage of JNI global references (NewGlobalRef without corresponding DeleteGlobalRef)

JNI global references keep Java objects alive outside the GC's knowledge — they're GC roots invisible to heap analysis. A leaked JNI global reference prevents both the Java object and everything it references from being collected.

Diagnosing native memory leaks requires OS-level tools: valgrind (Linux, high overhead), jemalloc with profiling enabled, or native memory tracking built into the JVM:

-XX:NativeMemoryTracking=detail

Then query:

jcmd <pid> VM.native_memory detail

This breaks down native memory by category: Java heap, class metadata, thread stacks, code cache, GC internals, and "other" (which catches JNI allocations). Compare snapshots over time:

jcmd <pid> VM.native_memory baseline
# ... time passes ...
jcmd <pid> VM.native_memory detail.diff

The diff shows which categories have grown. Growth in "other" with stable heap and Metaspace points to JNI or native library allocations.

Code cache exhaustion

The JIT compiler stores compiled native code in the code cache — a fixed-size native memory region. Default size is 240MB (varies by JVM version and flags). When the code cache fills, the JIT stops compiling new methods. The JVM continues running but falls back to interpreted execution for new code — a sudden, severe throughput drop with no heap or GC anomaly.

Monitoring: jvm.compilation.time dropping to zero is a symptom. More directly:

jcmd <pid> Compiler.codecache

Or via JMX: java.lang:type=CompilationTotalCompilationTime stops increasing when the JIT stops.

Fix: increase the code cache size:

-XX:ReservedCodeCacheSize=512m

For long-running services with many hot paths, 512MB is a more realistic default than 240MB. The code cache is native memory — it doesn't count against -Xmx — so increasing it is low-cost as long as the container has the headroom.

The diagnostic sequence for growing RSS

When heap dumps come back clean but RSS grows:

  1. Check Metaspace. jvm.memory.used{area="nonheap"} — is it stable after startup? Growing Metaspace across redeployments indicates class loader leaks.

  2. Check direct buffers. jvm.buffer.memory.used{id="direct"} and jvm.buffer.count{id="direct"} — growing count with stable heap indicates direct buffer leaks.

  3. Check thread count. jvm.threads.live — growing unboundedly indicates thread leaks.

  4. Enable NMT and diff. -XX:NativeMemoryTracking=detail with periodic jcmd snapshots — growing "other" category indicates JNI or native library leaks.

  5. Check code cache. Compiler.codecache — full cache with JIT stopped explains sudden throughput degradation without memory growth.

The heap is only one compartment of a Java process's memory. RSS is the sum of all of them. Growing RSS with a clean heap dump is not a mystery — it's the other compartments, each with its own diagnostic path.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

No Sudo, No Tools, No Hope: How Bureaucracy Stops Projects Before They Start

Ever tried to get a project moving and hit nothing but red tape? Sometimes, bureaucracy kills momentum before a single line of code is written.

Read more

How to Transition from Employee to Independent Contractor

Quitting your job sounds exciting… until you realize you have to replace your salary. The shift isn’t just about freedom — it’s about learning how to operate like a business.

Read more

Caching at the API Level: The Performance Win Most Backends Skip

Database query optimization and index tuning get the attention. HTTP caching — the layer that can eliminate database hits entirely for read-heavy endpoints — often gets ignored.

Read more

How to Save Money When You Don’t Know Your Taxes

You get paid, you feel good… then suddenly remember taxes exist. And now you’re wondering how much of that money is actually yours.

Read more