ActiveRecord Query Patterns That Actually Scale

by Eric Hanson, Backend Developer at Clean Systems Consulting

The query that works fine in development

Development databases have hundreds of rows. Production databases have millions. A query that returns in 2ms against a seeded development database can take 8 seconds against production data — not because the query changed, but because ActiveRecord's convenient abstractions hide the execution plan until it matters.

The patterns below are about building queries that remain correct and performant as data grows. Some are optimizations. Several are correctness issues that happen to surface as performance problems.

Select only what you need

Model.all fetches every column. For tables with binary columns, text blobs, or JSON fields with significant payload, this means pulling megabytes of data you discard immediately:

# Fetches every column including avatar_data (binary), bio (text), settings (jsonb)
users = User.where(active: true)

# Fetches only what the operation needs
users = User.where(active: true).select(:id, :email, :name)

The cost isn't just network transfer — the database must read every page that contains those columns, and wide rows mean fewer rows per page. On a table with 2M rows and a 20KB average row size, a full-table scan is 40GB of data movement. With select, it's the indexed columns only.

select returns AR objects with only the specified attributes populated. Accessing an unselected attribute raises ActiveModel::MissingAttributeError immediately. This is the right behavior — it tells you when downstream code has a wider dependency than expected.

For bulk reads where you don't need AR objects at all, pluck is faster:

# Returns an array of email strings — no AR object allocation
User.where(active: true).pluck(:email)

# Returns an array of [id, email] arrays
User.where(active: true).pluck(:id, :email)

pluck executes immediately and returns primitives. You can't chain further AR methods after it. Use select when the relation needs to stay lazy; use pluck when you're at the end of the chain and want raw values.

exists? vs count vs any? — pick the right check

Three ways to check if records exist, with meaningfully different SQL:

User.where(role: :admin).count    # SELECT COUNT(*) — always scans
User.where(role: :admin).any?     # SELECT 1 ... LIMIT 1 — stops at first match
User.where(role: :admin).exists?  # SELECT 1 ... LIMIT 1 — same execution

count forces a full aggregate scan even when you only need a boolean. For large tables with many matching records, count does significantly more work than necessary. Use exists? or any? whenever the question is "does at least one record match?" not "how many records match?"

any? and exists? produce identical SQL when called on a relation. exists? also accepts conditions directly: User.exists?(role: :admin) — useful for one-off checks without building a relation first.

The antipattern to avoid:

# Loads all matching records into memory to check length
if User.where(role: :admin).all.length > 0

This is a full table scan into memory for a boolean check. Even count is better than this, and exists? is better still.

Avoid N+1 at the query design level

includes solves N+1 for known associations. But the N+1 that matters most in mature codebases is the one introduced by methods that look like model accessors:

orders.each do |order|
  puts order.user.subscription_tier  # N+1 if :user not eager-loaded
  puts order.most_recent_payment.amount  # N+1 if this is a method firing a query
end

The most_recent_payment case is the harder one. If it's defined on Order as:

def most_recent_payment
  payments.order(created_at: :desc).first
end

Then includes(:payments) preloads all payments, and .order().first filters in Ruby — expensive for orders with many payments. The correct pattern for "the most recent of an association" is a has_one with a scope, or a subquery:

# has_one approach — eager-loadable with includes
has_one :most_recent_payment, -> { order(created_at: :desc) }, class_name: "Payment"

# Then eager-loadable
Order.includes(:most_recent_payment)

For complex cases, a lateral join or subquery brings the work into a single database round-trip:

Order.joins(
  "JOIN LATERAL (
    SELECT * FROM payments
    WHERE payments.order_id = orders.id
    ORDER BY created_at DESC
    LIMIT 1
  ) latest_payment ON true"
).select("orders.*, latest_payment.amount AS latest_payment_amount")

This requires raw SQL but executes in one query regardless of result set size. For reporting queries over large datasets, this matters more than the code elegance.

find_each and find_in_batches for full-table operations

Loading a large AR collection with .each loads all matching records into memory simultaneously:

# Loads 500k records at once — triggers major GC mid-loop
Order.where(status: :pending).each { |o| process(o) }

find_each batches the query and processes in configurable batch sizes:

Order.where(status: :pending).find_each(batch_size: 500) { |o| process(o) }

Two important constraints: find_each requires ordering by primary key (it silently overrides any order you've applied) and cannot be used with limit on the outer relation (it sets its own limit per batch). If you need a specific order or to limit total records, find_in_batches gives you explicit batch arrays:

Order.where(status: :pending).find_in_batches(batch_size: 500) do |batch|
  batch.sort_by(&:created_at).each { |o| process(o) }
end

Sorting within each batch is O(batch_size log batch_size) rather than O(n log n) over the full set — acceptable for batch processing, not for UI queries that need total ordering.

Use database functions instead of Ruby

A common pattern that moves work from the database to Ruby unnecessarily:

# Ruby does the grouping — database returns all records
orders.group_by { |o| o.created_at.to_date }

# Database does the grouping — Ruby receives counts
Order.group("DATE(created_at)").count

The database has indexes, can parallelize aggregation, and returns a small hash instead of potentially millions of objects. Any aggregation — sum, count, average, group — belongs in the database. Ruby does it after the fact as a fallback, not as a first choice.

Date truncation, string manipulation, conditional aggregation — PostgreSQL handles all of these. Don't filter large result sets in Ruby when a WHERE clause does it in the database:

# Wrong — fetches everything, filters in Ruby
User.all.select { |u| u.created_at > 30.days.ago }

# Correct — database filters before returning
User.where("created_at > ?", 30.days.ago)

Indexes for the queries you actually run

The queries above are pointless on large tables without supporting indexes. Three index patterns worth understanding explicitly:

Partial indexes — index a subset of rows matching a condition:

CREATE INDEX idx_active_users ON users (email) WHERE deleted_at IS NULL;

A partial index on active users for a soft-delete table is a fraction of the size of a full index and faster because the query planner uses it for queries with the same condition. Add this when a large percentage of rows are logically inactive.

Composite indexes — index multiple columns together:

CREATE INDEX idx_orders_user_status ON orders (user_id, status);

A composite index on (user_id, status) satisfies WHERE user_id = ? AND status = ? in a single index scan. Column order matters: the index is useful for queries that filter on user_id alone, but not for queries that filter on status alone. Lead with the column used in equality conditions most frequently.

Covering indexes — include additional columns in the index so the query is satisfied from the index alone without touching the table:

CREATE INDEX idx_users_email_covering ON users (email) INCLUDE (id, name);

INCLUDE (PostgreSQL 11+) adds columns to the index leaf pages without affecting sort order. A query SELECT id, name FROM users WHERE email = ? can be answered entirely from the index — no table heap access. Use covering indexes for hot read paths where the query columns are known and stable.

In Rails migrations:

add_index :users, :email, where: "deleted_at IS NULL", name: "idx_active_users_email"
add_index :orders, [:user_id, :status]

Preventing count queries on large tables

COUNT(*) on tables with millions of rows without a WHERE clause that uses an index requires a full scan in PostgreSQL (unlike MySQL's MyISAM, which stored row counts separately). For dashboard displays showing total record counts, this query can take seconds:

# Potentially seconds on a large table
User.count

Three alternatives depending on the use case:

counter_cache — Rails maintains a count column on the parent model, incremented/decremented by callbacks. Correct for association counts. Incorrect if records are inserted or deleted outside of Rails.

pg_stat_user_tables — PostgreSQL maintains approximate row counts in system tables. Fast, slightly stale:

result = ActiveRecord::Base.connection.execute(
  "SELECT reltuples::bigint FROM pg_class WHERE relname = 'users'"
)
result.first["reltuples"]

Accurate within a few percent immediately after an ANALYZE. Useful for "approximately X users" dashboard displays.

Materialized views or periodic background jobs — compute and store the count on a schedule. Precise, always slightly behind, correct for any table regardless of insert source.

The query review checklist

Before any query goes to production on a table expected to exceed 100k rows:

Run EXPLAIN ANALYZE against production data (or a production-scale clone). Confirm the query planner uses an index for the primary filter. Check for Seq Scan on large tables — that's the signal an index is missing or not being used.

Check the result set size. A query that returns 50k rows to Ruby for filtering in memory is a query that should be reworked with a WHERE clause.

Check join cardinality. A join between two large tables on an unindexed column produces a nested loop or hash join over millions of rows. Index both sides of every join.

These aren't optimizations you apply later. They're queries you write correctly the first time, because adding an index to a table with 10M rows in production requires careful scheduling and an INDEX CONCURRENTLY to avoid table locks.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

How a Tech Lead Prevents Knowledge Silos and Technical Debt

Projects stall, bugs pile up, and only a few people understand critical systems. A strong tech lead ensures knowledge is shared and technical debt stays manageable.

Read more

How to Save Money When You Don’t Know Your Taxes

You get paid, you feel good… then suddenly remember taxes exist. And now you’re wondering how much of that money is actually yours.

Read more

How to Estimate Time for Projects You’ve Never Done Before

Estimating a project you’ve never tackled can feel like guessing the weather on Mars. But with the right approach, you can make surprisingly accurate predictions.

Read more

How to Set Rates That Actually Reflect Your Skill

Charging too little can make you feel undervalued, while charging too much can scare clients away. Here’s how to set rates that honor your expertise without losing work.

Read more