Java Memory Leaks That Don't Show Up in Heap Dumps
by Eric Hanson, Backend Developer at Clean Systems Consulting
The symptom that doesn't match the tools
The process RSS grows steadily over days. You take a heap dump — heap usage looks normal, no obvious leak suspects, dominator tree shows nothing alarming. You restart the process, RSS drops, then climbs again. The heap dump told you nothing because the leak isn't on the heap.
Java processes use memory beyond the heap: Metaspace for class metadata, direct buffers allocated outside the heap, thread stacks, JIT-compiled code cache, and any memory allocated by native code through JNI. None of these appear in a heap dump. Diagnosing them requires different tools and a different mental model.
Metaspace leaks — class loader accumulation
Metaspace holds class metadata. Classes are unloaded only when their class loader is GC'd. In application servers, OSGi containers, plugin architectures, and any system that dynamically loads code, class loaders are created and discarded regularly. If something holds a reference to a class, an object of that class, or the class loader itself, the entire class loader — and all classes it loaded — cannot be unloaded.
The pattern that causes it:
// A framework that creates a new ClassLoader per deployment
URLClassLoader pluginLoader = new URLClassLoader(pluginUrls, parentLoader);
Class<?> pluginClass = pluginLoader.loadClass("com.plugin.Main");
Object plugin = pluginClass.getDeclaredConstructor().newInstance();
// The plugin instance is stored in a static registry
PluginRegistry.register("my-plugin", plugin); // strong reference to plugin instance
// plugin -> pluginClass -> pluginLoader -> all classes loaded by pluginLoader
// PluginRegistry.register holds a strong reference chain — nothing can be unloaded
When the plugin is "undeployed," pluginLoader can only be GC'd if every reference to every object of every class it loaded is released. One lingering reference — a static map entry, a thread-local, an executor's thread that ran a task from the plugin — prevents the entire class loader from unloading.
Detecting Metaspace leaks: watch jvm.memory.used{area="nonheap"} over deployment cycles. It should return to the same baseline after each plugin unload. Steady growth across redeployments indicates class loader leaks.
MAT can find class loader leaks: look for multiple instances of the same class loaded by different class loaders, or class loaders with large retained heap. The "OQL" (Object Query Language) view lets you query: SELECT * FROM java.lang.ClassLoader.
Fixing them requires auditing every static field and thread-local for references to classes from the plugin's class loader, and ensuring executors don't retain thread-local state from plugin threads.
DirectByteBuffer leaks
DirectByteBuffer allocates memory outside the Java heap using malloc. It's used by NIO channels, Netty, gRPC, and any I/O library that wants to avoid copying between JVM heap and OS buffers. The allocation is invisible to heap dumps — a 1GB direct buffer doesn't appear in heap analysis at all.
// Allocates off-heap — does NOT appear in heap dump
ByteBuffer buffer = ByteBuffer.allocateDirect(1024 * 1024 * 100); // 100MB off-heap
Direct buffers are freed when the DirectByteBuffer Java object is GC'd and its Cleaner runs. The problem: GC pressure on the heap is what triggers collection, but the heap object for a direct buffer is tiny — a few hundred bytes. The heap can be nearly empty while gigabytes of direct memory are held by DirectByteBuffer objects that haven't been collected yet.
The direct memory limit: -XX:MaxDirectMemorySize (defaults to -Xmx value). When it's exceeded, System.gc() is triggered to try freeing direct buffers. If that doesn't free enough, OutOfMemoryError: Direct buffer memory.
Monitoring direct memory:
// Via JMX
BufferPoolMXBean directPool = ManagementFactory.getPlatformMXBeans(BufferPoolMXBean.class)
.stream()
.filter(p -> p.getName().equals("direct"))
.findFirst()
.orElseThrow();
System.out.println("Direct memory used: " + directPool.getMemoryUsed());
System.out.println("Direct buffer count: " + directPool.getCount());
Or via Micrometer: jvm.buffer.memory.used{id="direct"}.
Growing direct buffer count with stable heap is the signature of a direct buffer leak. The usual cause: ByteBuffer.allocateDirect() in a code path that runs frequently, with the buffers not being released explicitly and the heap not experiencing enough GC pressure to clean them up.
The fix for high-frequency allocation: pool direct buffers. Netty's PooledByteBufAllocator is the production solution for NIO-heavy applications. For application code, a simple pool:
private static final Deque<ByteBuffer> BUFFER_POOL = new ArrayDeque<>();
public static ByteBuffer acquire(int capacity) {
ByteBuffer buf = BUFFER_POOL.poll();
if (buf == null || buf.capacity() < capacity) {
return ByteBuffer.allocateDirect(capacity);
}
buf.clear();
return buf;
}
public static void release(ByteBuffer buf) {
BUFFER_POOL.push(buf);
}
Thread stack leaks
Each Java thread has a stack. Default stack size is 512KB–1MB depending on platform and JVM flags (-Xss to configure). A thread pool with 200 threads holds 100–200MB in thread stacks alone — not on the heap, not in Metaspace.
Thread leaks — threads created but never stopped — are a form of native memory leak:
// Thread created per request, never joined or pooled
public void handleRequest(Request request) {
Thread worker = new Thread(() -> process(request));
worker.start();
// worker is started but never tracked — if process() hangs, this thread leaks
}
Each leaked thread holds its stack in native memory. 1,000 leaked threads is 500MB–1GB of native memory, invisible to heap analysis.
Monitoring: jvm.threads.live and jvm.threads.daemon via Micrometer. A growing thread count that doesn't return to baseline is a thread leak. jstack <pid> dumps all thread states — look for hundreds of threads in WAITING or TIMED_WAITING with the same stack trace, indicating stuck or leaked threads.
Fix: always use bounded thread pools (ThreadPoolExecutor with a fixed max), set thread timeouts, and monitor pool queue depth alongside thread count.
JNI and native library allocations
Code that calls native libraries through JNI (System.loadLibrary) can allocate memory in the native heap that is entirely invisible to the JVM. This memory has no GC, no heap dump visibility, and no JMX monitoring. The native library is responsible for freeing it.
Leaks in this category come from:
- JNI code that allocates and doesn't free on error paths
- Native library bugs
- Incorrect usage of JNI global references (
NewGlobalRefwithout correspondingDeleteGlobalRef)
JNI global references keep Java objects alive outside the GC's knowledge — they're GC roots invisible to heap analysis. A leaked JNI global reference prevents both the Java object and everything it references from being collected.
Diagnosing native memory leaks requires OS-level tools: valgrind (Linux, high overhead), jemalloc with profiling enabled, or native memory tracking built into the JVM:
-XX:NativeMemoryTracking=detail
Then query:
jcmd <pid> VM.native_memory detail
This breaks down native memory by category: Java heap, class metadata, thread stacks, code cache, GC internals, and "other" (which catches JNI allocations). Compare snapshots over time:
jcmd <pid> VM.native_memory baseline
# ... time passes ...
jcmd <pid> VM.native_memory detail.diff
The diff shows which categories have grown. Growth in "other" with stable heap and Metaspace points to JNI or native library allocations.
Code cache exhaustion
The JIT compiler stores compiled native code in the code cache — a fixed-size native memory region. Default size is 240MB (varies by JVM version and flags). When the code cache fills, the JIT stops compiling new methods. The JVM continues running but falls back to interpreted execution for new code — a sudden, severe throughput drop with no heap or GC anomaly.
Monitoring: jvm.compilation.time dropping to zero is a symptom. More directly:
jcmd <pid> Compiler.codecache
Or via JMX: java.lang:type=Compilation — TotalCompilationTime stops increasing when the JIT stops.
Fix: increase the code cache size:
-XX:ReservedCodeCacheSize=512m
For long-running services with many hot paths, 512MB is a more realistic default than 240MB. The code cache is native memory — it doesn't count against -Xmx — so increasing it is low-cost as long as the container has the headroom.
The diagnostic sequence for growing RSS
When heap dumps come back clean but RSS grows:
-
Check Metaspace.
jvm.memory.used{area="nonheap"}— is it stable after startup? Growing Metaspace across redeployments indicates class loader leaks. -
Check direct buffers.
jvm.buffer.memory.used{id="direct"}andjvm.buffer.count{id="direct"}— growing count with stable heap indicates direct buffer leaks. -
Check thread count.
jvm.threads.live— growing unboundedly indicates thread leaks. -
Enable NMT and diff.
-XX:NativeMemoryTracking=detailwith periodicjcmdsnapshots — growing "other" category indicates JNI or native library leaks. -
Check code cache.
Compiler.codecache— full cache with JIT stopped explains sudden throughput degradation without memory growth.
The heap is only one compartment of a Java process's memory. RSS is the sum of all of them. Growing RSS with a clean heap dump is not a mystery — it's the other compartments, each with its own diagnostic path.