Your Table Structure Is Making Your Queries Harder Than They Need to Be

by Eric Hanson, Backend Developer at Clean Systems Consulting

When the query is the symptom, not the problem

You're in a code review. The query under discussion has four CTEs, a self-join, and a comment that says "this is the only way to get this data." Someone asks why it's so complicated. The answer is always some variant of: "because of how the table is structured."

Complex queries are often schema problems in disguise. When data is stored in a shape that doesn't match how it needs to be queried, developers add complexity at the query layer to compensate. Understanding that the root cause is structural — not a lack of SQL skill — is the first step to actually fixing it.

Mismatched granularity

The most common structural mismatch: the table stores data at the wrong level of granularity for your primary query.

-- Table stores one row per event, including event metadata that repeats
CREATE TABLE user_events (
  id BIGINT PRIMARY KEY,
  user_id BIGINT,
  event_type VARCHAR(100),
  event_data JSONB,
  session_id UUID,
  session_started_at TIMESTAMPTZ,   -- Repeated for every event in the session
  session_device_type VARCHAR(50),  -- Repeated for every event in the session
  created_at TIMESTAMPTZ
);

Query: "How many sessions per device type this week?"

-- Have to deduplicate sessions because session data is repeated per event
SELECT session_device_type, COUNT(DISTINCT session_id) AS session_count
FROM user_events
WHERE created_at >= NOW() - INTERVAL '7 days'
GROUP BY session_device_type;

The COUNT(DISTINCT session_id) is expensive — it requires sorting or hashing all matching rows. This query's complexity and cost are a direct result of storing session attributes at event granularity.

The structural fix: separate sessions and events.

CREATE TABLE sessions (
  id UUID PRIMARY KEY,
  user_id BIGINT REFERENCES users(id),
  device_type VARCHAR(50),
  started_at TIMESTAMPTZ
);

CREATE TABLE user_events (
  id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  session_id UUID REFERENCES sessions(id),
  event_type VARCHAR(100),
  event_data JSONB,
  created_at TIMESTAMPTZ
);

Now the session query is straightforward:

SELECT device_type, COUNT(*) AS session_count
FROM sessions
WHERE started_at >= NOW() - INTERVAL '7 days'
GROUP BY device_type;

EAV: the schema that makes every query hard

Entity-Attribute-Value (EAV) is the pattern where attributes are stored as rows rather than columns:

CREATE TABLE product_attributes (
  product_id BIGINT,
  attribute_name VARCHAR(100),
  attribute_value TEXT,
  PRIMARY KEY (product_id, attribute_name)
);

-- Data looks like:
-- (1, 'color', 'red')
-- (1, 'size', 'L')
-- (1, 'weight_kg', '0.5')

Flexibility at the cost of every query being a pivot operation:

-- Find all red, size-L products — requires self-joins per attribute
SELECT p.id
FROM products p
JOIN product_attributes pa_color ON p.id = pa_color.product_id
  AND pa_color.attribute_name = 'color' AND pa_color.attribute_value = 'red'
JOIN product_attributes pa_size ON p.id = pa_size.product_id
  AND pa_size.attribute_name = 'size' AND pa_size.attribute_value = 'L';

EAV trades query simplicity for schema flexibility. It's appropriate when the set of attributes is genuinely unknown and varies wildly per entity. It's overused when a JSON column would achieve the same flexibility with simpler queries:

-- PostgreSQL JSONB: flexible attributes with indexable structure
CREATE TABLE products (
  id BIGINT PRIMARY KEY,
  name TEXT NOT NULL,
  attributes JSONB
);

-- Index a specific attribute for fast filtering
CREATE INDEX idx_products_color ON products ((attributes->>'color'));

-- Query is straightforward
SELECT id FROM products WHERE attributes->>'color' = 'red' AND attributes->>'size' = 'L';

JSONB in PostgreSQL supports GIN indexes for arbitrary key queries and operator-based filtering. It's not a replacement for properly structured columns, but it's a far better fit than EAV for semi-structured data.

The wide table that needs too many nulls

A table with 80 columns where only 15 are populated for any given row is a sign that multiple entity subtypes are being stored in one table.

CREATE TABLE content (
  id BIGINT PRIMARY KEY,
  type VARCHAR(50),          -- 'article', 'video', 'podcast'
  title TEXT,
  body TEXT,                 -- NULL for video/podcast
  video_url TEXT,            -- NULL for article/podcast
  video_duration_seconds INT,-- NULL for article/podcast
  audio_url TEXT,            -- NULL for article/video
  audio_bitrate_kbps INT,    -- NULL for article/video
  ...
);

This is a type hierarchy problem. The correct pattern is a shared base table with type-specific tables:

CREATE TABLE content (
  id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  type VARCHAR(50) NOT NULL,
  title TEXT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE articles (
  content_id BIGINT PRIMARY KEY REFERENCES content(id) ON DELETE CASCADE,
  body TEXT NOT NULL
);

CREATE TABLE videos (
  content_id BIGINT PRIMARY KEY REFERENCES content(id) ON DELETE CASCADE,
  video_url TEXT NOT NULL,
  duration_seconds INT NOT NULL
);

The tradeoff: you now need a join to get the full record for a specific type. But your columns are all meaningful, your NOT NULL constraints are enforced, and queries against a single type don't load irrelevant columns.

The practical audit

Pick your most complex query — the one that took longest to write or that developers are afraid to modify. Trace backwards: why is it complex? Is it aggregating data that exists at the wrong granularity? Is it pivoting EAV rows into columns? Is it joining subtypes out of a nullable wide table? The query complexity is almost always pointing at a schema structural mismatch. Fix the structure; the query simplifies itself.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

TSMC and MediaTek Built Taipei's Engineering Culture Around Hardware — Software Backend Is an Afterthought

Taiwan produces some of the world's best engineers. Most of them are building chips, not backend systems — and that shapes the hiring market in ways Taipei software startups feel immediately.

Read more

Clear Acceptance Criteria in Backend Development

Clear acceptance criteria define exactly when a backend deliverable is considered complete. By setting measurable standards for performance, testing, and reliability, both the client and developer can verify the result with objective benchmarks.

Read more

Refactoring Fat ActiveRecord Models — The Cuts That Actually Work

Fat models accumulate in predictable ways and can be decomposed with a small set of mechanical refactors. Here is what actually works in production codebases, and what just moves the mess elsewhere.

Read more

How to Run Your Spring Boot App and Database Together With Docker Compose

Getting a Spring Boot application and PostgreSQL to start together correctly in Docker Compose requires more than just listing both services — you need health checks, proper dependency ordering, and connection URL configuration that works inside a container network.

Read more