Stop Storing Everything in One Table. Normalization Exists for a Reason.

by Eric Hanson, Backend Developer at Clean Systems Consulting

The single-table trap

You're building an invoicing system. Time pressure is real, so you create one table:

CREATE TABLE invoices (
  id BIGINT PRIMARY KEY,
  invoice_number VARCHAR(50),
  customer_name VARCHAR(255),
  customer_email VARCHAR(255),
  customer_address TEXT,
  item_description TEXT,
  item_quantity INT,
  item_unit_price DECIMAL(10,2),
  item_total DECIMAL(10,2),
  invoice_date DATE,
  due_date DATE,
  status VARCHAR(50)
);

This works. Until the customer changes their email and you have to update 400 invoice rows. Or until you realize that one invoice has multiple line items, and you're storing comma-separated descriptions in item_description. Or until you need to query "all invoices with at least one item over $100" and you're doing substring parsing in SQL.

These are update anomalies and structural limitations — the exact problems normalization was designed to prevent.

What normalization actually means

Normalization is the process of organizing a schema so that each fact is stored in exactly one place. It's defined by "normal forms" — each higher form eliminates a specific class of anomaly.

First Normal Form (1NF): Each column contains a single atomic value. No repeating groups, no comma-separated lists.

-- Violates 1NF: multiple values in one column
item_descriptions TEXT  -- "Widget A, Widget B, Widget C"

-- Satisfies 1NF: one row per item
CREATE TABLE invoice_items (
  id BIGINT PRIMARY KEY,
  invoice_id BIGINT REFERENCES invoices(id),
  description TEXT,
  quantity INT,
  unit_price DECIMAL(10,2)
);

Second Normal Form (2NF): Every non-key column depends on the entire primary key. Relevant when you have composite primary keys.

If you have a table order_products(order_id, product_id, product_name, quantity), then product_name depends only on product_id, not on the full composite key (order_id, product_id). This is a partial dependency — violates 2NF. Fix it by extracting products(id, name).

Third Normal Form (3NF): No non-key column depends on another non-key column (no transitive dependencies).

-- Violates 3NF: zip_code determines city and state
orders(id, zip_code, city, state, total)

-- Satisfies 3NF: city/state stored once, referenced by zip
zip_codes(zip_code PRIMARY KEY, city, state)
orders(id, zip_code REFERENCES zip_codes, total)

In practice, strictly enforcing 3NF for things like zip codes is often overkill — you'd need to maintain a reference table for every derived value. Use judgment. The principle matters: avoid storing the same fact in multiple places.

The cost of denormalization

Update anomalies: A customer changes their address. If the address is stored in every order row, you need to update thousands of rows atomically. Miss one, and your data is inconsistent.

Redundant storage: Customer name and email stored in every invoice row. If you have 1 million invoices for 10,000 customers, you're storing each customer's name 100 times on average. This inflates table size, bloats backups, and wastes buffer pool space.

Query complexity: "Find all customers who changed their email since their first order" becomes a self-join or a subquery nightmare when emails are stored per-order. It's a trivial query when emails live in a customers table with an audit log.

When denormalization is actually correct

Normalization is not an absolute rule. Denormalization is sometimes the right choice, but it should be a deliberate architectural decision, not a default.

Read-heavy reporting tables: A data warehouse fact table storing pre-computed denormalized records for analytics queries is appropriate. You're trading write overhead (ETL processes) for read simplicity (no joins in analytics queries).

Performance-critical hot paths: If a query runs 100,000 times per second and the normalized version requires three joins, adding a redundant denormalized column with explicit application-level update logic may be warranted. Document this explicitly — it's technical debt with a known cost.

Event sourcing and audit logs: Immutable event records should store the state at the time of the event, not references to mutable entities. An order event should record the product name and price at purchase time, not reference the current product record.

The normalized invoicing schema

CREATE TABLE customers (
  id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  name VARCHAR(255) NOT NULL,
  email VARCHAR(255) NOT NULL UNIQUE,
  address TEXT
);

CREATE TABLE invoices (
  id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  customer_id BIGINT NOT NULL REFERENCES customers(id),
  invoice_number VARCHAR(50) NOT NULL UNIQUE,
  invoice_date DATE NOT NULL,
  due_date DATE NOT NULL,
  status VARCHAR(50) NOT NULL DEFAULT 'draft'
);

CREATE TABLE invoice_items (
  id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  invoice_id BIGINT NOT NULL REFERENCES invoices(id),
  description TEXT NOT NULL,
  quantity INT NOT NULL CHECK (quantity > 0),
  unit_price DECIMAL(10,2) NOT NULL CHECK (unit_price >= 0)
);

Now updating a customer email is one row. Querying "all invoices with an item over $100" is a simple join with an indexed filter. And the schema enforces the business rule that quantities must be positive, at the database level, not just in application code.

The practical standard

Apply normalization to 3NF as your baseline. Denormalize explicitly and document why when you do. If you find yourself writing application code to keep redundant columns in sync, that's a strong signal you've denormalized without the architectural justification to support it.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

Clients Who Change Scope Every Hour: How to Stay Sane

Scope creep can feel like running on a treadmill that keeps speeding up. Here’s how to survive—and even thrive—when clients can’t decide.

Read more

Spring Data Repository Design — When findBy Methods Are Enough and When They're Not

Spring Data's derived query methods eliminate boilerplate for simple queries. They become unreadable for complex ones and break entirely for dynamic filtering. Here is where each approach belongs and how to recognize when you've outgrown derived queries.

Read more

Spring Boot API Security Hardening — Headers, Input Validation, and the Vulnerabilities That Slip Through

Authentication and authorization are necessary but not sufficient for API security. Mass assignment, excessive data exposure, injection vulnerabilities, and missing security headers are the gaps that survive code review and appear in penetration tests.

Read more

Designing with Java Enums — When They're the Right Model and When They're Not

Java enums are more capable than most developers use them for, but that capability has limits. Here is a clear-eyed look at what enums do well, where they break down, and the design decisions that determine which side you end up on.

Read more