Why Your CI Pipeline Takes Forever and What to Do About It
by Eric Hanson, Backend Developer at Clean Systems Consulting
The 45-Minute Build and What It's Actually Costing You
Your CI pipeline takes 45 minutes. Across a 10-person team pushing 3 PRs per day each, that's 22.5 hours of wall-clock wait time per day — not counting the context-switching overhead of each developer resuming work after getting feedback. At a fully-loaded developer cost of $150/hour, a 45-minute pipeline costs roughly $3,375/day in lost developer throughput. Your runners cost $40/day.
Nobody runs this math. They just accept that "CI is slow" and move on. This article is about not moving on.
Profile Before You Optimize
The biggest mistake in pipeline optimization is guessing where the time goes. Most engineers assume tests are the bottleneck. Often they're not.
Start by pulling the last 50 pipeline runs and extracting per-job timing. GitHub Actions exposes this via the API; Jenkins has the Build Timeline plugin; most platforms have something equivalent. Plot the duration distribution per job, not just the mean.
In most pipelines, 60–70% of the total duration comes from one or two jobs. Common culprits that are not the test suite:
- Docker image builds without layer caching — rebuilding the entire image on every run because
COPY . .comes beforeRUN pip install(or the Maven equivalent) - Dependency downloads — Maven or npm fetching hundreds of MB on every run because caching isn't configured
- Slow test environment startup — Testcontainers pulling and starting a full PostgreSQL image that takes 30 seconds before the first test runs
- Sequential execution of jobs that have no actual dependency on each other
Fix whichever of these is largest first. Don't touch the test suite until the infrastructure overhead is eliminated.
Dependency Caching: The Fastest Win
Caching build dependencies should be the first optimization in any pipeline. The return is immediate, the implementation is low-risk, and the impact is often 3–8 minutes per run.
# GitHub Actions: Gradle dependency caching
- uses: actions/setup-java@v4
with:
java-version: '21'
distribution: 'temurin'
cache: 'gradle' # Caches ~/.gradle/caches and ~/.gradle/wrapper
# GitHub Actions: Maven dependency caching
- uses: actions/setup-java@v4
with:
java-version: '21'
distribution: 'temurin'
cache: 'maven' # Caches ~/.m2/repository
The setup-java action handles cache key generation based on your lockfile automatically. If you're on a platform without this built-in, use actions/cache with a key derived from the hash of your pom.xml or build.gradle.
Docker Build Optimization
If your pipeline builds a Docker image, the build time is likely dominated by layer cache misses. The fix is ordering your Dockerfile so that frequently-changing layers (your application code) come after infrequently-changing layers (your dependencies).
# Slow: invalidates the dependency layer on every code change
FROM eclipse-temurin:21-jre
COPY . .
RUN ./gradlew dependencies
# Fast: dependencies layer is cached across most builds
FROM eclipse-temurin:21-jre AS builder
WORKDIR /app
COPY build.gradle settings.gradle ./
COPY gradle/ gradle/
RUN ./gradlew dependencies --no-daemon # Warm dependency cache layer
COPY src/ src/
RUN ./gradlew bootJar --no-daemon
FROM eclipse-temurin:21-jre
COPY --from=builder /app/build/libs/*.jar app.jar
ENTRYPOINT ["java", "-jar", "app.jar"]
With this ordering, the dependency layer is rebuilt only when build.gradle changes — which is infrequent. Code changes only rebuild the last two layers.
Parallelizing What Doesn't Need to Be Sequential
Most pipelines run checks sequentially that have no dependency on each other. Linting doesn't need to wait for tests. Security scanning doesn't need to wait for the Docker build.
jobs:
unit-tests:
runs-on: ubuntu-latest
steps: [...]
lint:
runs-on: ubuntu-latest
steps: [...]
security-scan:
runs-on: ubuntu-latest
steps: [...]
build-image:
needs: [unit-tests] # Only this one needs tests to pass first
runs-on: ubuntu-latest
steps: [...]
With three runners in parallel, a pipeline that was 30 minutes sequential might complete in 12 minutes — the duration of whichever parallel branch is slowest. This is free if you're already paying for parallel runners.
When the Tests Actually Are the Bottleneck
If after addressing infra overhead the tests themselves are slow, the fixes are more involved but still tractable:
- Test sharding: split the test suite across N runners.
pytestsupports--shard; JUnit 5 supports@Tagfiltering for manual sharding. Some platforms (Buildkite, Nx Cloud) handle sharding automatically based on historical timing data. - Test ordering: run the historically slowest tests first so failures surface early rather than after a 20-minute wait.
- Database test isolation: if each test creates and tears down a full database schema, switch to using transactions that roll back instead. This can reduce integration test suite time by 50% on large suites.
The target is a critical path under 10 minutes. Profile, fix the largest contributor, measure, repeat. Most teams reach the target in three or four iterations.