Stop Storing Everything in One Table. Normalization Exists for a Reason.

by Arif Ikhsanudin, Backend Developer

The single-table trap

You're building an invoicing system. Time pressure is real, so you create one table:

CREATE TABLE invoices (
  id BIGINT PRIMARY KEY,
  invoice_number VARCHAR(50),
  customer_name VARCHAR(255),
  customer_email VARCHAR(255),
  customer_address TEXT,
  item_description TEXT,
  item_quantity INT,
  item_unit_price DECIMAL(10,2),
  item_total DECIMAL(10,2),
  invoice_date DATE,
  due_date DATE,
  status VARCHAR(50)
);

This works. Until the customer changes their email and you have to update 400 invoice rows. Or until you realize that one invoice has multiple line items, and you're storing comma-separated descriptions in item_description. Or until you need to query "all invoices with at least one item over $100" and you're doing substring parsing in SQL.

These are update anomalies and structural limitations — the exact problems normalization was designed to prevent.

What normalization actually means

Normalization is the process of organizing a schema so that each fact is stored in exactly one place. It's defined by "normal forms" — each higher form eliminates a specific class of anomaly.

First Normal Form (1NF): Each column contains a single atomic value. No repeating groups, no comma-separated lists.

-- Violates 1NF: multiple values in one column
item_descriptions TEXT  -- "Widget A, Widget B, Widget C"

-- Satisfies 1NF: one row per item
CREATE TABLE invoice_items (
  id BIGINT PRIMARY KEY,
  invoice_id BIGINT REFERENCES invoices(id),
  description TEXT,
  quantity INT,
  unit_price DECIMAL(10,2)
);

Second Normal Form (2NF): Every non-key column depends on the entire primary key. Relevant when you have composite primary keys.

If you have a table order_products(order_id, product_id, product_name, quantity), then product_name depends only on product_id, not on the full composite key (order_id, product_id). This is a partial dependency — violates 2NF. Fix it by extracting products(id, name).

Third Normal Form (3NF): No non-key column depends on another non-key column (no transitive dependencies).

-- Violates 3NF: zip_code determines city and state
orders(id, zip_code, city, state, total)

-- Satisfies 3NF: city/state stored once, referenced by zip
zip_codes(zip_code PRIMARY KEY, city, state)
orders(id, zip_code REFERENCES zip_codes, total)

In practice, strictly enforcing 3NF for things like zip codes is often overkill — you'd need to maintain a reference table for every derived value. Use judgment. The principle matters: avoid storing the same fact in multiple places.

The cost of denormalization

Update anomalies: A customer changes their address. If the address is stored in every order row, you need to update thousands of rows atomically. Miss one, and your data is inconsistent.

Redundant storage: Customer name and email stored in every invoice row. If you have 1 million invoices for 10,000 customers, you're storing each customer's name 100 times on average. This inflates table size, bloats backups, and wastes buffer pool space.

Query complexity: "Find all customers who changed their email since their first order" becomes a self-join or a subquery nightmare when emails are stored per-order. It's a trivial query when emails live in a customers table with an audit log.

When denormalization is actually correct

Normalization is not an absolute rule. Denormalization is sometimes the right choice, but it should be a deliberate architectural decision, not a default.

Read-heavy reporting tables: A data warehouse fact table storing pre-computed denormalized records for analytics queries is appropriate. You're trading write overhead (ETL processes) for read simplicity (no joins in analytics queries).

Performance-critical hot paths: If a query runs 100,000 times per second and the normalized version requires three joins, adding a redundant denormalized column with explicit application-level update logic may be warranted. Document this explicitly — it's technical debt with a known cost.

Event sourcing and audit logs: Immutable event records should store the state at the time of the event, not references to mutable entities. An order event should record the product name and price at purchase time, not reference the current product record.

The normalized invoicing schema

CREATE TABLE customers (
  id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  name VARCHAR(255) NOT NULL,
  email VARCHAR(255) NOT NULL UNIQUE,
  address TEXT
);

CREATE TABLE invoices (
  id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  customer_id BIGINT NOT NULL REFERENCES customers(id),
  invoice_number VARCHAR(50) NOT NULL UNIQUE,
  invoice_date DATE NOT NULL,
  due_date DATE NOT NULL,
  status VARCHAR(50) NOT NULL DEFAULT 'draft'
);

CREATE TABLE invoice_items (
  id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  invoice_id BIGINT NOT NULL REFERENCES invoices(id),
  description TEXT NOT NULL,
  quantity INT NOT NULL CHECK (quantity > 0),
  unit_price DECIMAL(10,2) NOT NULL CHECK (unit_price >= 0)
);

Now updating a customer email is one row. Querying "all invoices with an item over $100" is a simple join with an indexed filter. And the schema enforces the business rule that quantities must be positive, at the database level, not just in application code.

The practical standard

Apply normalization to 3NF as your baseline. Denormalize explicitly and document why when you do. If you find yourself writing application code to keep redundant columns in sync, that's a strong signal you've denormalized without the architectural justification to support it.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

Secrets in Your Pipeline Are a Security Risk You Cannot Ignore

CI/CD pipelines require credentials to do their job — and that makes them a high-value target. How you store, inject, and rotate those secrets determines whether your pipeline is a security asset or a liability.

Read more

Recovering From a Public Mistake (Like a Website Crash)

Seeing your website go down in front of everyone is a stomach-dropping moment. But a public mistake doesn’t have to be a career-ender—it can be a chance to show professionalism and resilience.

Read more

How to Share Your Story Without Feeling Embarrassed

Talking about your experiences—successes and failures alike—can feel awkward. But sharing your story is one of the fastest ways to connect and grow.

Read more

Reducing API Complexity in Spring Boot — Consolidation, Query Parameters, and the Endpoints Worth Removing

Every endpoint is a permanent contract the moment a client integrates against it. API surface area grows easily and shrinks painfully. Here is how to keep it smaller from the start and how to reduce it when it has already grown.

Read more