Collectors, flatMap, and Reduce in Java Streams — The Operations That Take More Than a Minute to Learn
by Eric Hanson, Backend Developer at Clean Systems Consulting
flatMap — the operation most developers reach for too late
map transforms each element into one element. flatMap transforms each element into zero or more elements — a stream — and flattens all those streams into one:
// map produces Stream<List<LineItem>> — a stream of lists
orders.stream()
.map(Order::getLineItems) // each order becomes a List<LineItem>
// flatMap produces Stream<LineItem> — a flat stream of all line items
orders.stream()
.flatMap(order -> order.getLineItems().stream()) // each order contributes multiple elements
.filter(item -> item.getQuantity() > 1)
.collect(Collectors.toList());
The distinction: map is one-to-one. flatMap is one-to-many, then flattened. When the transformation produces a collection and you want to work with the elements of that collection, flatMap is correct. Using map produces nested streams that require unwrapping.
flatMap also handles optional flattening — a common pattern when filtering and transforming in one step:
// Map each ID to an Optional<Order>, then flatten to only present values
List<Order> found = orderIds.stream()
.map(id -> orderRepository.findById(id)) // Stream<Optional<Order>>
.flatMap(Optional::stream) // Stream<Order> — only present values
.collect(Collectors.toList());
Optional::stream (Java 9+) returns a stream of one element if present, empty stream if absent. flatMap with Optional::stream is the idiomatic way to filter and unwrap optionals in a single operation.
reduce — building a result from a stream
reduce is the general accumulation operation. It takes an identity value and an associative combining function, and folds all elements into a single result:
// Sum of order totals
long total = orders.stream()
.mapToLong(Order::getTotal)
.reduce(0L, Long::sum);
// String concatenation — illustrative only, use Collectors.joining in practice
String joined = Stream.of("a", "b", "c")
.reduce("", (acc, s) -> acc + s); // "abc"
The identity value must be an identity for the combining function — 0 for addition, 1 for multiplication, "" for concatenation. If no elements are in the stream, the identity is returned.
The two-argument reduce returns T. The one-argument version (no identity) returns Optional<T> — empty if the stream is empty:
Optional<Order> mostExpensive = orders.stream()
.reduce((a, b) -> a.getTotal() > b.getTotal() ? a : b);
When to use reduce vs specialized operations. For numeric reductions, use mapToInt/mapToLong/mapToDouble followed by sum(), average(), min(), max() — they're more readable and avoid boxing:
// Prefer this
long total = orders.stream().mapToLong(Order::getTotal).sum();
// Over this
long total = orders.stream().map(Order::getTotal).reduce(0L, Long::sum);
reduce earns its place for non-numeric accumulations where no specialized method exists — building a combined result type, finding an element by custom comparison, or folding into a mutable result when the combiner is associative.
The Collectors toolkit
Collectors is where streams become genuinely powerful for data processing. The operations beyond toList() and toSet():
groupingBy — partitioning into a Map
groupingBy groups elements by a classifier function, producing Map<K, List<V>>:
Map<String, List<Order>> byStatus = orders.stream()
.collect(Collectors.groupingBy(Order::getStatus));
// { "pending" -> [...], "shipped" -> [...], "cancelled" -> [...] }
The downstream collector argument transforms the grouped lists:
// Count per status instead of list
Map<String, Long> countByStatus = orders.stream()
.collect(Collectors.groupingBy(
Order::getStatus,
Collectors.counting()
));
// Sum of totals per status
Map<String, Long> totalByStatus = orders.stream()
.collect(Collectors.groupingBy(
Order::getStatus,
Collectors.summingLong(Order::getTotal)
));
// Map to a different value type
Map<String, List<Long>> idsByStatus = orders.stream()
.collect(Collectors.groupingBy(
Order::getStatus,
Collectors.mapping(Order::getId, Collectors.toList())
));
Multi-level grouping — group by status, then by customer:
Map<String, Map<Long, List<Order>>> byStatusThenCustomer = orders.stream()
.collect(Collectors.groupingBy(
Order::getStatus,
Collectors.groupingBy(Order::getCustomerId)
));
partitioningBy — binary grouping
partitioningBy is a special case of groupingBy with a Predicate — always produces a map with two keys: true and false:
Map<Boolean, List<Order>> partition = orders.stream()
.collect(Collectors.partitioningBy(order -> order.getTotal() > 10_000));
List<Order> largeOrders = partition.get(true);
List<Order> smallOrders = partition.get(false);
Slightly more efficient than groupingBy with a boolean classifier because the result map is always exactly two entries.
joining — string construction
Collectors.joining replaces StringBuilder loops for building strings from stream elements:
String csv = orders.stream()
.map(Order::getId)
.collect(Collectors.joining(", "));
// "ord-1, ord-2, ord-3"
String wrapped = orders.stream()
.map(Order::getId)
.collect(Collectors.joining(", ", "[", "]"));
// "[ord-1, ord-2, ord-3]"
toMap — explicit key and value extraction
toMap builds a Map with explicit key and value functions:
Map<Long, Order> ordersById = orders.stream()
.collect(Collectors.toMap(Order::getId, Function.identity()));
The third argument handles duplicate keys — required if keys might collide:
// Keep the higher-value order on collision
Map<Long, Order> highValueByCustomer = orders.stream()
.collect(Collectors.toMap(
Order::getCustomerId,
Function.identity(),
(existing, replacement) ->
existing.getTotal() > replacement.getTotal() ? existing : replacement
));
Without the merge function, duplicate keys throw IllegalStateException. This is intentional — toMap is strict about key uniqueness by default. The exception tells you that you have duplicate keys that require a decision; it's better than silently overwriting.
The fourth argument specifies the map implementation — useful when insertion order matters:
Map<String, Order> ordered = orders.stream()
.collect(Collectors.toMap(
Order::getId,
Function.identity(),
(a, b) -> a,
LinkedHashMap::new // maintains insertion order
));
Custom collectors — when the built-ins don't fit
Collector.of() builds a custom collector when Collectors doesn't have what you need:
// Collector that builds an ImmutableList (Guava)
Collector<Order, ImmutableList.Builder<Order>, ImmutableList<Order>> toImmutableList =
Collector.of(
ImmutableList::builder, // supplier — creates the mutable accumulator
ImmutableList.Builder::add, // accumulator — adds an element
(b1, b2) -> b1.addAll(b2.build()), // combiner — merges two accumulators (parallel)
ImmutableList.Builder::build // finisher — converts accumulator to result
);
ImmutableList<Order> immutable = orders.stream().collect(toImmutableList);
The four functions: supplier creates a new mutable container, accumulator adds one element to the container, combiner merges two containers (used in parallel streams — must be associative), finisher converts the container to the final result type.
A more practical custom collector — collecting into a statistics object:
record OrderStats(long count, long totalValue, long maxValue) {}
Collector<Order, long[], OrderStats> statsCollector = Collector.of(
() -> new long[3], // [count, sum, max]
(arr, order) -> {
arr[0]++;
arr[1] += order.getTotal();
arr[2] = Math.max(arr[2], order.getTotal());
},
(a, b) -> new long[]{a[0]+b[0], a[1]+b[1], Math.max(a[2], b[2])},
arr -> new OrderStats(arr[0], arr[1], arr[2])
);
OrderStats stats = orders.stream().collect(statsCollector);
This computes count, sum, and max in a single pass with a primitive array accumulator — no boxing, no multiple stream passes.
teeing — splitting a stream into two collectors
Collectors.teeing (Java 12+) processes a stream through two collectors simultaneously and merges the results:
record Summary(List<Order> large, long smallTotal) {}
Summary summary = orders.stream().collect(
Collectors.teeing(
Collectors.filtering(o -> o.getTotal() > 10_000, Collectors.toList()),
Collectors.filtering(o -> o.getTotal() <= 10_000,
Collectors.summingLong(Order::getTotal)),
Summary::new
)
);
teeing replaces two separate stream passes or a partitioningBy with downstream collection when the two halves produce different result types. It processes each element exactly once, feeding it to both downstream collectors.
The single-pass principle
The practical value of complex collectors — custom collectors, teeing, nested groupingBy — is computing multiple results in a single iteration over the data. Each additional stream pass over a large collection adds cost. When you find yourself writing stream().filter(x).count() followed by stream().filter(x).collect(toList()), that's two passes where one suffices:
// Two passes
long count = orders.stream().filter(Order::isPending).count();
List<Order> pending = orders.stream().filter(Order::isPending).collect(toList());
// One pass with teeing
record PendingResult(long count, List<Order> orders) {}
PendingResult result = orders.stream()
.filter(Order::isPending)
.collect(Collectors.teeing(
Collectors.counting(),
Collectors.toList(),
PendingResult::new
));
For in-memory collections this is a minor optimization. For streams backed by database queries, file reads, or network data, avoiding multiple passes is a correctness concern as much as a performance one.