Collectors, flatMap, and Reduce in Java Streams — The Operations That Take More Than a Minute to Learn

by Eric Hanson, Backend Developer at Clean Systems Consulting

flatMap — the operation most developers reach for too late

map transforms each element into one element. flatMap transforms each element into zero or more elements — a stream — and flattens all those streams into one:

// map produces Stream<List<LineItem>> — a stream of lists
orders.stream()
      .map(Order::getLineItems)   // each order becomes a List<LineItem>

// flatMap produces Stream<LineItem> — a flat stream of all line items
orders.stream()
      .flatMap(order -> order.getLineItems().stream()) // each order contributes multiple elements
      .filter(item -> item.getQuantity() > 1)
      .collect(Collectors.toList());

The distinction: map is one-to-one. flatMap is one-to-many, then flattened. When the transformation produces a collection and you want to work with the elements of that collection, flatMap is correct. Using map produces nested streams that require unwrapping.

flatMap also handles optional flattening — a common pattern when filtering and transforming in one step:

// Map each ID to an Optional<Order>, then flatten to only present values
List<Order> found = orderIds.stream()
    .map(id -> orderRepository.findById(id)) // Stream<Optional<Order>>
    .flatMap(Optional::stream)               // Stream<Order> — only present values
    .collect(Collectors.toList());

Optional::stream (Java 9+) returns a stream of one element if present, empty stream if absent. flatMap with Optional::stream is the idiomatic way to filter and unwrap optionals in a single operation.

reduce — building a result from a stream

reduce is the general accumulation operation. It takes an identity value and an associative combining function, and folds all elements into a single result:

// Sum of order totals
long total = orders.stream()
    .mapToLong(Order::getTotal)
    .reduce(0L, Long::sum);

// String concatenation — illustrative only, use Collectors.joining in practice
String joined = Stream.of("a", "b", "c")
    .reduce("", (acc, s) -> acc + s); // "abc"

The identity value must be an identity for the combining function — 0 for addition, 1 for multiplication, "" for concatenation. If no elements are in the stream, the identity is returned.

The two-argument reduce returns T. The one-argument version (no identity) returns Optional<T> — empty if the stream is empty:

Optional<Order> mostExpensive = orders.stream()
    .reduce((a, b) -> a.getTotal() > b.getTotal() ? a : b);

When to use reduce vs specialized operations. For numeric reductions, use mapToInt/mapToLong/mapToDouble followed by sum(), average(), min(), max() — they're more readable and avoid boxing:

// Prefer this
long total = orders.stream().mapToLong(Order::getTotal).sum();

// Over this
long total = orders.stream().map(Order::getTotal).reduce(0L, Long::sum);

reduce earns its place for non-numeric accumulations where no specialized method exists — building a combined result type, finding an element by custom comparison, or folding into a mutable result when the combiner is associative.

The Collectors toolkit

Collectors is where streams become genuinely powerful for data processing. The operations beyond toList() and toSet():

groupingBy — partitioning into a Map

groupingBy groups elements by a classifier function, producing Map<K, List<V>>:

Map<String, List<Order>> byStatus = orders.stream()
    .collect(Collectors.groupingBy(Order::getStatus));
// { "pending" -> [...], "shipped" -> [...], "cancelled" -> [...] }

The downstream collector argument transforms the grouped lists:

// Count per status instead of list
Map<String, Long> countByStatus = orders.stream()
    .collect(Collectors.groupingBy(
        Order::getStatus,
        Collectors.counting()
    ));

// Sum of totals per status
Map<String, Long> totalByStatus = orders.stream()
    .collect(Collectors.groupingBy(
        Order::getStatus,
        Collectors.summingLong(Order::getTotal)
    ));

// Map to a different value type
Map<String, List<Long>> idsByStatus = orders.stream()
    .collect(Collectors.groupingBy(
        Order::getStatus,
        Collectors.mapping(Order::getId, Collectors.toList())
    ));

Multi-level grouping — group by status, then by customer:

Map<String, Map<Long, List<Order>>> byStatusThenCustomer = orders.stream()
    .collect(Collectors.groupingBy(
        Order::getStatus,
        Collectors.groupingBy(Order::getCustomerId)
    ));

partitioningBy — binary grouping

partitioningBy is a special case of groupingBy with a Predicate — always produces a map with two keys: true and false:

Map<Boolean, List<Order>> partition = orders.stream()
    .collect(Collectors.partitioningBy(order -> order.getTotal() > 10_000));

List<Order> largeOrders = partition.get(true);
List<Order> smallOrders = partition.get(false);

Slightly more efficient than groupingBy with a boolean classifier because the result map is always exactly two entries.

joining — string construction

Collectors.joining replaces StringBuilder loops for building strings from stream elements:

String csv = orders.stream()
    .map(Order::getId)
    .collect(Collectors.joining(", "));
// "ord-1, ord-2, ord-3"

String wrapped = orders.stream()
    .map(Order::getId)
    .collect(Collectors.joining(", ", "[", "]"));
// "[ord-1, ord-2, ord-3]"

toMap — explicit key and value extraction

toMap builds a Map with explicit key and value functions:

Map<Long, Order> ordersById = orders.stream()
    .collect(Collectors.toMap(Order::getId, Function.identity()));

The third argument handles duplicate keys — required if keys might collide:

// Keep the higher-value order on collision
Map<Long, Order> highValueByCustomer = orders.stream()
    .collect(Collectors.toMap(
        Order::getCustomerId,
        Function.identity(),
        (existing, replacement) ->
            existing.getTotal() > replacement.getTotal() ? existing : replacement
    ));

Without the merge function, duplicate keys throw IllegalStateException. This is intentional — toMap is strict about key uniqueness by default. The exception tells you that you have duplicate keys that require a decision; it's better than silently overwriting.

The fourth argument specifies the map implementation — useful when insertion order matters:

Map<String, Order> ordered = orders.stream()
    .collect(Collectors.toMap(
        Order::getId,
        Function.identity(),
        (a, b) -> a,
        LinkedHashMap::new  // maintains insertion order
    ));

Custom collectors — when the built-ins don't fit

Collector.of() builds a custom collector when Collectors doesn't have what you need:

// Collector that builds an ImmutableList (Guava)
Collector<Order, ImmutableList.Builder<Order>, ImmutableList<Order>> toImmutableList =
    Collector.of(
        ImmutableList::builder,              // supplier — creates the mutable accumulator
        ImmutableList.Builder::add,          // accumulator — adds an element
        (b1, b2) -> b1.addAll(b2.build()),  // combiner — merges two accumulators (parallel)
        ImmutableList.Builder::build         // finisher — converts accumulator to result
    );

ImmutableList<Order> immutable = orders.stream().collect(toImmutableList);

The four functions: supplier creates a new mutable container, accumulator adds one element to the container, combiner merges two containers (used in parallel streams — must be associative), finisher converts the container to the final result type.

A more practical custom collector — collecting into a statistics object:

record OrderStats(long count, long totalValue, long maxValue) {}

Collector<Order, long[], OrderStats> statsCollector = Collector.of(
    () -> new long[3],                          // [count, sum, max]
    (arr, order) -> {
        arr[0]++;
        arr[1] += order.getTotal();
        arr[2] = Math.max(arr[2], order.getTotal());
    },
    (a, b) -> new long[]{a[0]+b[0], a[1]+b[1], Math.max(a[2], b[2])},
    arr -> new OrderStats(arr[0], arr[1], arr[2])
);

OrderStats stats = orders.stream().collect(statsCollector);

This computes count, sum, and max in a single pass with a primitive array accumulator — no boxing, no multiple stream passes.

teeing — splitting a stream into two collectors

Collectors.teeing (Java 12+) processes a stream through two collectors simultaneously and merges the results:

record Summary(List<Order> large, long smallTotal) {}

Summary summary = orders.stream().collect(
    Collectors.teeing(
        Collectors.filtering(o -> o.getTotal() > 10_000, Collectors.toList()),
        Collectors.filtering(o -> o.getTotal() <= 10_000,
            Collectors.summingLong(Order::getTotal)),
        Summary::new
    )
);

teeing replaces two separate stream passes or a partitioningBy with downstream collection when the two halves produce different result types. It processes each element exactly once, feeding it to both downstream collectors.

The single-pass principle

The practical value of complex collectors — custom collectors, teeing, nested groupingBy — is computing multiple results in a single iteration over the data. Each additional stream pass over a large collection adds cost. When you find yourself writing stream().filter(x).count() followed by stream().filter(x).collect(toList()), that's two passes where one suffices:

// Two passes
long count     = orders.stream().filter(Order::isPending).count();
List<Order> pending = orders.stream().filter(Order::isPending).collect(toList());

// One pass with teeing
record PendingResult(long count, List<Order> orders) {}
PendingResult result = orders.stream()
    .filter(Order::isPending)
    .collect(Collectors.teeing(
        Collectors.counting(),
        Collectors.toList(),
        PendingResult::new
    ));

For in-memory collections this is a minor optimization. For streams backed by database queries, file reads, or network data, avoiding multiple passes is a correctness concern as much as a performance one.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

API Keys Are Not the Same as Authentication. Here Is the Difference.

API keys identify a caller. Authentication verifies identity. Treating them as equivalent is what leads to security models that look solid but are not.

Read more

Spring Data Repository Design — When findBy Methods Are Enough and When They're Not

Spring Data's derived query methods eliminate boilerplate for simple queries. They become unreadable for complex ones and break entirely for dynamic filtering. Here is where each approach belongs and how to recognize when you've outgrown derived queries.

Read more

Why Hong Kong Startups Are Turning to Flexible Async Contractors Over Full-Time Backend Hires

Full-time backend hiring in Hong Kong has become slower and more expensive. A growing number of startups have found a working alternative.

Read more

Why Documentation Tools Matter in Remote Teams

Remote work can feel like playing a game of telephone—messages get lost, decisions vanish, and context disappears. Documentation tools are your lifeline, keeping the team aligned and work moving smoothly.

Read more