Your Table Structure Is Making Your Queries Harder Than They Need to Be
by Eric Hanson, Backend Developer at Clean Systems Consulting
When the query is the symptom, not the problem
You're in a code review. The query under discussion has four CTEs, a self-join, and a comment that says "this is the only way to get this data." Someone asks why it's so complicated. The answer is always some variant of: "because of how the table is structured."
Complex queries are often schema problems in disguise. When data is stored in a shape that doesn't match how it needs to be queried, developers add complexity at the query layer to compensate. Understanding that the root cause is structural — not a lack of SQL skill — is the first step to actually fixing it.
Mismatched granularity
The most common structural mismatch: the table stores data at the wrong level of granularity for your primary query.
-- Table stores one row per event, including event metadata that repeats
CREATE TABLE user_events (
id BIGINT PRIMARY KEY,
user_id BIGINT,
event_type VARCHAR(100),
event_data JSONB,
session_id UUID,
session_started_at TIMESTAMPTZ, -- Repeated for every event in the session
session_device_type VARCHAR(50), -- Repeated for every event in the session
created_at TIMESTAMPTZ
);
Query: "How many sessions per device type this week?"
-- Have to deduplicate sessions because session data is repeated per event
SELECT session_device_type, COUNT(DISTINCT session_id) AS session_count
FROM user_events
WHERE created_at >= NOW() - INTERVAL '7 days'
GROUP BY session_device_type;
The COUNT(DISTINCT session_id) is expensive — it requires sorting or hashing all matching rows. This query's complexity and cost are a direct result of storing session attributes at event granularity.
The structural fix: separate sessions and events.
CREATE TABLE sessions (
id UUID PRIMARY KEY,
user_id BIGINT REFERENCES users(id),
device_type VARCHAR(50),
started_at TIMESTAMPTZ
);
CREATE TABLE user_events (
id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
session_id UUID REFERENCES sessions(id),
event_type VARCHAR(100),
event_data JSONB,
created_at TIMESTAMPTZ
);
Now the session query is straightforward:
SELECT device_type, COUNT(*) AS session_count
FROM sessions
WHERE started_at >= NOW() - INTERVAL '7 days'
GROUP BY device_type;
EAV: the schema that makes every query hard
Entity-Attribute-Value (EAV) is the pattern where attributes are stored as rows rather than columns:
CREATE TABLE product_attributes (
product_id BIGINT,
attribute_name VARCHAR(100),
attribute_value TEXT,
PRIMARY KEY (product_id, attribute_name)
);
-- Data looks like:
-- (1, 'color', 'red')
-- (1, 'size', 'L')
-- (1, 'weight_kg', '0.5')
Flexibility at the cost of every query being a pivot operation:
-- Find all red, size-L products — requires self-joins per attribute
SELECT p.id
FROM products p
JOIN product_attributes pa_color ON p.id = pa_color.product_id
AND pa_color.attribute_name = 'color' AND pa_color.attribute_value = 'red'
JOIN product_attributes pa_size ON p.id = pa_size.product_id
AND pa_size.attribute_name = 'size' AND pa_size.attribute_value = 'L';
EAV trades query simplicity for schema flexibility. It's appropriate when the set of attributes is genuinely unknown and varies wildly per entity. It's overused when a JSON column would achieve the same flexibility with simpler queries:
-- PostgreSQL JSONB: flexible attributes with indexable structure
CREATE TABLE products (
id BIGINT PRIMARY KEY,
name TEXT NOT NULL,
attributes JSONB
);
-- Index a specific attribute for fast filtering
CREATE INDEX idx_products_color ON products ((attributes->>'color'));
-- Query is straightforward
SELECT id FROM products WHERE attributes->>'color' = 'red' AND attributes->>'size' = 'L';
JSONB in PostgreSQL supports GIN indexes for arbitrary key queries and operator-based filtering. It's not a replacement for properly structured columns, but it's a far better fit than EAV for semi-structured data.
The wide table that needs too many nulls
A table with 80 columns where only 15 are populated for any given row is a sign that multiple entity subtypes are being stored in one table.
CREATE TABLE content (
id BIGINT PRIMARY KEY,
type VARCHAR(50), -- 'article', 'video', 'podcast'
title TEXT,
body TEXT, -- NULL for video/podcast
video_url TEXT, -- NULL for article/podcast
video_duration_seconds INT,-- NULL for article/podcast
audio_url TEXT, -- NULL for article/video
audio_bitrate_kbps INT, -- NULL for article/video
...
);
This is a type hierarchy problem. The correct pattern is a shared base table with type-specific tables:
CREATE TABLE content (
id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
type VARCHAR(50) NOT NULL,
title TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE articles (
content_id BIGINT PRIMARY KEY REFERENCES content(id) ON DELETE CASCADE,
body TEXT NOT NULL
);
CREATE TABLE videos (
content_id BIGINT PRIMARY KEY REFERENCES content(id) ON DELETE CASCADE,
video_url TEXT NOT NULL,
duration_seconds INT NOT NULL
);
The tradeoff: you now need a join to get the full record for a specific type. But your columns are all meaningful, your NOT NULL constraints are enforced, and queries against a single type don't load irrelevant columns.
The practical audit
Pick your most complex query — the one that took longest to write or that developers are afraid to modify. Trace backwards: why is it complex? Is it aggregating data that exists at the wrong granularity? Is it pivoting EAV rows into columns? Is it joining subtypes out of a nullable wide table? The query complexity is almost always pointing at a schema structural mismatch. Fix the structure; the query simplifies itself.