Back to the curriculum
Part 4 · Lesson 11
Production-Grade

Secrets, Zero-Downtime & Rollbacks

Why `.env` files end up in git history and how to avoid it.

advanced13 min readUpdated 2026-04-11

Secrets: stop writing them in `.env`

A .env file on the server with DATABASE_URL=postgres://... is the most common setup, and it is fine — as long as it is chmod 600, owned by root, and absolutely never checked in. The mistake is committing a "template" .env.example with a real production key, which happens all the time. For a bigger setup, docker secret or SOPS-encrypted files (Mozilla SOPS encrypts values with age or GPG, decrypts on deploy) let you commit the encrypted file and keep the key separate. Simple and boring beats clever every time.

Zero-downtime with Docker Compose

Compose v2 has a built-in --no-recreate pattern for rolling updates that most people miss. Use the deploy block:

app:
  image: ghcr.io/owner/app:latest
  deploy:
    update_config:
      order: start-first
      parallelism: 2

With order: start-first, Compose starts the new container, waits for its health check, adds it to the Traefik load balancer, then stops the old one. Requests in flight finish on the old container. New requests go to the new one. Zero dropped requests if your health checks are real.

Rollback in 30 seconds

Rollback is only fast if you can answer two questions in seconds: "what was the previous image SHA?" and "how do I point production at it?". The first is solved by always tagging CI builds with :$SHA in GHCR. The second is solved by not editing compose.yml by hand — instead, export APP_IMAGE_TAG from a file the deploy script reads. Rolling back is echo "APP_IMAGE_TAG=abc123" > .env.image && docker compose up -d. A good deploy pipeline makes this a one-command action.

The pre-flight checklist

Before every production push: docker compose config (validates the YAML), run the test suite in CI against the built image, run docker compose up in a staging namespace on the same box with a different subdomain, watch the health check flip. Skipping any of those is how you discover at 11 PM that production uses a different Postgres major version than staging.

Blast radius and single points of failure

One VPS means one single point of failure — if the box dies, so does your app. That is okay for most side projects; it is not okay for paying customers. The graceful next step is a second VPS running the same Compose stack and a DNS-level failover (Cloudflare Load Balancing, $5/mo, or a paid DNS provider with health checks). Before that, just accept the tradeoff and make sure your backups and restore drill actually work — they buy you a worst-case RTO of maybe an hour.

Key takeaways

  • Chmod 600 on `.env`, encrypt with SOPS if you want to commit it
  • `order: start-first` in Compose gives you zero-downtime deploys
  • Tag CI images with `:$SHA` so rollback is one command away
  • One VPS is one SPOF — accept it, or add a second box and DNS failover

Related documentation