Good Database Design Is Boring Until Bad Design Breaks Everything
by Arif Ikhsanudin, Backend Developer
The schema that seemed fine
Two years into a product, your team is blocked. You need to add multi-currency support, but orders.total is stored as a DECIMAL(10,2) with the currency hardcoded in application logic. You need to query events by region, but users.address is a single TEXT field that was never parsed into structured components. You want to archive old data, but the primary key is an auto-increment integer and you're approaching the INT max value with no migration strategy.
These aren't bugs. They're schema decisions that felt fine when the product was small and became load-bearing walls you can't easily move. Good database design is invisible because it never forces these conversations. Bad design makes them unavoidable.
Principle 1: Store data in its smallest meaningful unit
-- Problematic: forces application to parse; makes WHERE clauses fragile
CREATE TABLE users (
id BIGINT PRIMARY KEY,
full_name TEXT, -- "John Smith" — unsortable by last name
address TEXT, -- "123 Main St, Springfield, IL 62701"
created_at TEXT -- "2024-01-15T10:30:00" stored as string
);
-- Better: each fact in its own column
CREATE TABLE users (
id BIGINT PRIMARY KEY,
first_name VARCHAR(100),
last_name VARCHAR(100),
address_line1 VARCHAR(255),
address_city VARCHAR(100),
address_state CHAR(2),
address_postal_code VARCHAR(20),
address_country CHAR(2),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
Storing structured data as free-form text trades a few minutes of schema design for years of brittle string parsing. The day you need to GROUP BY country or ORDER BY last_name, you'll regret every shortcut.
Principle 2: Use surrogate keys, but know their limits
Auto-increment integers (SERIAL, BIGSERIAL, AUTO_INCREMENT) are fast, space-efficient, and ordered. But they have two problems:
Exhaustion: INT maxes out at ~2.1 billion. High-volume tables hit this. Use BIGINT (up to ~9.2 quintillion) from the start. The storage cost difference (4 bytes vs 8 bytes per row) is trivial compared to a forced emergency migration.
Leakability and predictability: Sequential IDs in URLs expose business data (order volume, user count) and enable enumeration attacks. For externally-exposed IDs, use UUIDs or ULIDs (Universally Unique Lexicographically Sortable Identifiers — ULID preserves time-ordering while being random enough to prevent enumeration).
-- PostgreSQL: UUID primary key with pgcrypto
CREATE TABLE orders (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id BIGINT NOT NULL REFERENCES users(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
UUIDs add 16 bytes vs 8 for BIGINT, and random UUIDs cause index fragmentation (B-tree insertions are random rather than sequential). ULIDs and UUID v7 (time-ordered) reduce this fragmentation.
Principle 3: Model for your queries, not just your entities
Schema design textbooks focus on entities and relationships. Production schema design must also account for query patterns.
If your most frequent query is "get all active subscriptions for a user," and subscriptions is a child table with 10M rows, you need an index on (user_id, status) — not just user_id. That index should exist from day one, not be added after the first slow-query incident.
Before finalizing a schema, write out the five most frequent queries you expect. Verify that each query can use indexes and doesn't require a full table scan. This takes 20 minutes and prevents hours of remediation.
Principle 4: Don't conflate application concerns with schema concerns
-- Soft delete: status stored in the same table as the live data
ALTER TABLE users ADD COLUMN deleted_at TIMESTAMPTZ;
Soft deletes are a common pattern that creates silent correctness risks. Every query on users now needs WHERE deleted_at IS NULL — and if any query misses that filter, it silently returns deleted users. This is an application concern (we want to retain data for audit) leaking into the schema in a way that creates ongoing query complexity.
Alternatives: a separate deleted_users table (clean but complicates restore), row-level security policies (PostgreSQL's CREATE POLICY can enforce the filter at the database level automatically), or partitioning (separate partitions for active vs archived rows).
None of these are universally right. The point is to make the tradeoff consciously, not by default.
Principle 5: Plan for change
The schema you design today will be migrated dozens of times. Design for migrability:
- Add
NOT NULLconstraints carefully — adding them later on large tables requires a table rewrite or a multi-step migration (add nullable, backfill, add constraint). In PostgreSQL 12+,NOT NULLon columns with a default can be added without a rewrite for new tables. - Avoid wide tables where columns are frequently NULL for specific row types — this is a signal for subtype tables or a polymorphic design.
- Name foreign keys explicitly so migrations can reference them unambiguously.
-- Implicit FK name (DB generates it, varies by engine)
FOREIGN KEY (user_id) REFERENCES users(id)
-- Explicit FK name (deterministic, referenceable in migrations)
CONSTRAINT fk_orders_user_id FOREIGN KEY (user_id) REFERENCES users(id)
The boring work that keeps things working
Spend a few hours at the start of every significant feature designing the schema before writing application code. Write the queries that schema needs to support. Check that indexes cover those queries. Choose data types based on what the data actually is, not what's easiest. The schema will outlast the application code that uses it.