String Interning, the String Pool, and Memory in Java — What Actually Happens
by Eric Hanson, Backend Developer at Clean Systems Consulting
The three ways a String ends up in memory
Not all String objects are the same in Java. Where a string lives in memory and whether it shares identity with another string of the same content depends on how it was created.
String literals — any string written directly in source code — are placed in the string pool (also called the string constant pool) at class load time. The pool is deduplicated: two class files containing the literal "pending" reference the same String object in the pool, not two separate objects.
new String(...) — explicitly allocating a string — always creates a new object on the heap, separate from the pool, even if an identical string already exists in the pool.
String.intern() — returns the pooled version of a string, adding it to the pool if not already present.
String a = "hello"; // pool
String b = "hello"; // same pool entry as a
String c = new String("hello"); // new heap object, not the pool entry
String d = c.intern(); // returns the pool entry — same as a and b
System.out.println(a == b); // true — same pool object
System.out.println(a == c); // false — c is a separate heap object
System.out.println(a == d); // true — d is the pool entry
System.out.println(a.equals(c)); // true — content is the same
This is why == for string comparison is a bug — two strings with identical content may or may not be the same object depending on how they were created. equals() always compares content. == compares identity.
Where the pool lives
Before Java 7, the string pool was in PermGen — a fixed-size memory region separate from the heap. This made aggressive interning dangerous: fill PermGen with interned strings and you get OutOfMemoryError: PermGen space.
Since Java 7, the string pool is on the heap. This means:
- Pool strings are subject to GC (though in practice, strings interned from literals are reachable through class metadata and rarely collected)
- The pool can grow as large as the heap allows
-XX:StringTableSizecontrols the number of buckets in the pool's hash table (default 65536 in Java 11+, tunable for large-scale interning)
The pool is implemented as a hash table keyed by string content. intern() performs a lookup — O(1) average — and either returns the existing entry or inserts the new one.
How the JIT and javac interact with literals
The Java compiler performs compile-time string concatenation of literals. Constant string expressions are folded at compile time, not runtime:
String s1 = "hello" + " " + "world"; // compile-time: single literal "hello world"
String s2 = "hello world";
System.out.println(s1 == s2); // true — both reference the same pool entry
The compiler folds the concatenation into a single literal. Both s1 and s2 reference the same pool entry. If any operand is a variable (not a compile-time constant), folding doesn't apply:
String prefix = "hello";
String s3 = prefix + " world"; // runtime concatenation — new heap object
System.out.println(s2 == s3); // false — s3 is a separate heap object
final variables that are compile-time constants are treated as literals:
final String PREFIX = "hello";
String s4 = PREFIX + " world"; // compile-time constant — folded to "hello world"
System.out.println(s2 == s4); // true
This is a subtle distinction: final variables that are initialized with non-constant expressions — final String timestamp = LocalDateTime.now().toString() — are not compile-time constants and do not participate in constant folding.
String concatenation and allocation
The + operator on strings compiles to StringBuilder operations in modern Java (via invokedynamic since Java 9, StringConcatFactory). Each concatenation expression creates a new String object on the heap — not in the pool — along with the intermediate StringBuilder:
String result = "Order " + orderId + " status: " + status;
// Roughly equivalent to:
// new StringBuilder().append("Order ").append(orderId)
// .append(" status: ").append(status).toString()
In a hot path called millions of times, this allocates two objects per call (the StringBuilder and the result String). For logging — where the string may not even be used if the log level is off — this allocation happens before the level check:
// Allocates the string even if DEBUG is disabled
logger.debug("Processing order " + orderId + " for user " + userId);
// No allocation if DEBUG is disabled — lambda is only evaluated if needed
logger.debug("Processing order {} for user {}", orderId, userId);
// Or with a supplier:
logger.debug(() -> "Processing order " + orderId + " for user " + userId);
SLF4J's parameterized logging ({} placeholders) defers string construction to after the level check. This is not a minor optimization in high-throughput services — logging at DEBUG in a method called 100,000 times per second creates 200,000 objects per second if the string is always constructed.
intern() — when it helps and when it backfires
intern() is appropriate when you have a large number of objects holding the same string values, and equality checks are frequent and performance-sensitive. The canonical case: a field that holds one of a small set of known values — status codes, category names, currency codes.
// Without interning — each deserialized record creates a new String
record.setStatus(jsonNode.get("status").asText()); // "pending", "shipped", etc.
// With interning — all records with status "pending" share one object
record.setStatus(jsonNode.get("status").asText().intern());
// Equality check becomes identity check
if (record.getStatus() == "pending") { ... } // valid after interning
The memory saving: 10 million records each holding a separate "pending" string consumes 10 million String objects (~240MB on a 64-bit JVM with compressed oops). With interning, they all reference one object.
The identity-check optimization is real but dangerous as a practice — it works only if you can guarantee all strings in the comparison have been interned, which requires discipline across the entire codebase. Miss one new String(...) and == silently returns false. equals() is always safer.
The risk: interning high-cardinality strings — user IDs, session tokens, request IDs — fills the pool with unique values that are never GC'd (pool entries backed by class metadata remain reachable). This is the String.intern() memory leak described in the memory leaks article.
The rule: intern strings only if the cardinality is bounded and small. Status codes, ISO currency codes, HTTP method names — these are safe to intern. Arbitrary user input, request identifiers, URLs — these are not.
G1 string deduplication — automatic without interning
G1GC has a background string deduplication feature that identifies String objects with identical content and replaces their backing char[] (or byte[] since Java 9's compact strings) with a shared reference — without changing the String object's identity or moving it to the pool:
-XX:+UseStringDeduplication # requires -XX:+UseG1GC (default since Java 9)
String deduplication runs as part of the concurrent GC cycle. It identifies duplicate backing arrays and makes them reference the same underlying data. The String objects remain separate heap objects — == is still false — but they share backing storage.
This reduces heap usage for applications with many duplicate strings without the risks of intern(). The tradeoff: deduplication runs on the GC thread and has a small throughput cost. For applications with high string duplication (log processing, data pipelines, applications that parse the same field values repeatedly), the memory savings typically outweigh the cost.
Monitor with:
-XX:+PrintStringDeduplicationStatistics
This logs how many strings were deduplicated and how much space was reclaimed.
The equals() contract and pool assumptions
One final trap: code that assumes pool membership and uses == breaks when strings arrive from outside the pool:
// Brittle — works only if status was interned or is a literal
if (order.getStatus() == "PENDING") { ... }
// This works regardless of how status was created
if ("PENDING".equals(order.getStatus())) { ... }
// Putting the literal first also handles null safely — no NullPointerException
The equals() method is defined on content, not identity. It works correctly regardless of whether either string was interned, created with new, deserialized from JSON, read from a database, or produced by concatenation. == works correctly only for strings you can guarantee are pool entries — which in practice means only literal comparisons, and even then only within the same class loader.
The practical takeaway: use equals() for string comparison in all application code. Use intern() only for deliberate memory optimization on bounded-cardinality strings, with awareness of the pool growth risk. Let G1's deduplication handle the rest if memory pressure from duplicate strings is a measured problem.