You know what a container is and why it exists. Now let's open the hood. This article walks through what a container image really is — the Dockerfile that describes it, the layers that make it efficient, how those two fit together, and then takes you through building, sharing, and finally dissecting a real-world image piece by piece. By the end you should be able to look at any image and know exactly what it contains.

Image vs. Container — the Core Distinction

The single most important concept in this article is the difference between an image and a container. They are related but they are not the same thing, and confusing them causes endless frustration.

An image is the recipe — a read-only package sitting on disk that contains everything the application needs: an operating system base, the runtime, libraries, your code, and metadata. It never changes after it has been built. A container is what you get when you run that recipe — a live process with its own thin writable layer on top of the image. One image can produce many independent containers running at the same time.

💡 The Analogy — Image vs. Container

Think of an image as a baking recipe printed on a card. The card never changes — you can read it a hundred times and it stays the same. A container is the actual cake you bake from that recipe. You can bake ten cakes from the same card, each one independent. And crucially: if you eat the cake (delete the container), the recipe card (the image) is still there, ready to make another one.

The practical consequence that surprises many beginners: any files you create inside a container disappear when you delete the container. The container's writable layer is temporary. If you need data to survive — for a database, for uploaded files — you need volumes.

IMAGE (read-only) Layer 4 — App code Layer 3 — Dependencies Layer 2 — Runtime Layer 1 — Base OS Immutable · Sits on disk docker run Container A (running) Writable layer ✎ Image layers (shared, read-only) Delete container → writable layer gone Container B (running) Writable layer ✎ Image layers (shared, read-only) Both containers share the same image Each container gets its own writable layer but shares the image's read-only layers → Fast startup → Low disk use → Full isolation
One image → two containers. Each container adds its own thin writable layer but shares all the read-only image layers underneath.

The Dockerfile — Your Image's Recipe

A Dockerfile is a plain text file that describes, step by step, how to build a container image. Think of it as a cooking recipe: each line is one instruction, and the order matters. Every image you have ever pulled — PostgreSQL, nginx, Ubuntu, your colleague's app — was built from a Dockerfile written by somebody, somewhere.

Here is a complete, working Dockerfile for a small Python web app. It's only seven lines, but each one matters:

FROM python:3.12-slim WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt COPY . . EXPOSE 8080 CMD ["python", "app.py"]

Let's go through what is happening. Each line is an instruction, and there are only about a dozen instructions in the whole Dockerfile language. These seven cover the vast majority of real-world cases:

FROM Base image FROM python:3.12-slim

Picks the starting point — usually a Linux distribution plus a language runtime.

WORKDIR Set directory WORKDIR /app

Like cd — sets the directory where the following commands will run.

COPY Add files COPY . /app

Copies files from your project into the image during build.

RUN Execute a command RUN pip install ...

Runs a shell command during build (installs, compiles…). Each one creates a layer.

ENV Environment var ENV PORT=8080

Sets an environment variable available during build and at runtime.

EXPOSE Document a port EXPOSE 8080

Documents which port the app uses. Doesn't actually open it — read on.

CMD Default command CMD ["python", "app.py"]

The command that runs when a container starts. Last CMD wins.

A few details that trip up beginners on their first Dockerfile:

  • Build-time vs. run-time. The instructions FROM, WORKDIR, COPY, and RUN all execute once, when you build the image. CMD is different — it doesn't run during build. It is stored as metadata and only executes when a container is started from the finished image.
  • EXPOSE does not actually open a port. Despite the name, it is purely documentation: a note that says "this app listens on port 8080." To make the port reachable from the outside world, you pass -p 8080:8080 when running docker run. We'll see this in action shortly.
  • Only the last CMD matters. If you write several CMD lines, Docker silently ignores all but the last one. Use it exactly once, at the bottom of your Dockerfile.
  • Order matters a lot — and that's because of layers, which we get to next.

Once your Dockerfile is ready, you turn it into an image with one command, run in the directory that contains it:

$ docker build -t my-app:1.0 .

The -t my-app:1.0 flag gives your image a name and a version tag. The dot at the end (easy to miss!) tells Docker to look for the Dockerfile in the current directory. Docker then walks through your instructions one by one and produces an image stored on your machine, ready to run.

Layers — What an Image Really Is

Now that you have seen a Dockerfile, here is what Docker is actually doing with it behind the scenes.

Every instruction in your Dockerfile creates a separate layer. A layer is a snapshot of what changed on the filesystem at that step — a tiny "delta" that captures only the differences from the layer below it. An image is not one big file. It is a stack of these layer-deltas, stitched together to look like a single coherent filesystem when a container runs.

Take the Python Dockerfile from the previous section. Docker turns it into roughly these layers, stacked from bottom to top:

  • Layer 1: everything in python:3.12-slim (a slim Debian Linux + Python 3.12) — from FROM
  • Layer 2: the /app directory exists — from WORKDIR
  • Layer 3: requirements.txt sits inside /app — from the first COPY
  • Layer 4: Python packages are installed — from RUN pip install
  • Layer 5: the rest of the source code is in place — from the second COPY
💡 The Analogy — How layers work

Imagine a stack of transparent sheets of glass. The bottom sheet is the base OS. The next sheet adds the programming language runtime. The next adds your dependencies. The top sheet adds your application code. Each sheet only contains what changed from the sheet below it. The container looks down through all the sheets at once and sees a complete filesystem — but each sheet can be shared across many different stacks. If you swap the top sheet (your app code), you only re-make that one sheet, not the whole stack.

The single most important property of layers is this: layers are content-addressed and shared. Docker identifies each layer by a SHA256 hash of its contents — a unique fingerprint. If ten different images on your machine all start FROM python:3.12-slim, that base layer is stored exactly once on disk and reused by all ten. When you pull a new image from a registry, Docker only downloads the layers it doesn't already have. This is why a fresh Node.js app image might download in a few seconds: the underlying Node.js and Debian layers were already on your machine from a previous pull.

How Dockerfile and Layers Work Together — Layer Caching

Layers are not just a storage trick. They are also a build-speed trick. This is where the Dockerfile and layers really come together.

When you change something in your project and run docker build again, Docker walks through each instruction in the Dockerfile and asks: did anything change for this step? If nothing has changed, Docker reuses the cached layer from the previous build, skipping the work entirely. If something has changed, Docker re-runs that instruction — and every instruction after it — because those later layers might depend on what changed.

That single rule has a huge practical consequence: the order of instructions in your Dockerfile directly controls how fast your rebuilds are. Put things that change rarely (the base image, dependency installs) before things that change often (your application code).

Layer Order Matters for Build Speed ❌ Slow rebuilds FROM python:3.12-slim COPY . . ← app code first RUN pip install -r reqs.txt Edit any source file → reinstall ALL dependencies every time ✓ Fast rebuilds FROM python:3.12-slim COPY requirements.txt . RUN pip install -r reqs.txt COPY . . ← app code last Edit source → only last layer rebuilds dependencies cached ✓
Figure 1 — Put rarely-changing instructions at the top and frequently-changing ones at the bottom. A change to any layer invalidates everything below it.
The golden rule of layer ordering: instructions that change rarely (base images, dependency installs) go before instructions that change often (your application code). This single habit can cut rebuild times from minutes to seconds.

Multi-stage builds: keeping final images tiny

There is one more layer trick worth knowing. To build certain applications — especially compiled ones in Go, Java, or Rust — you need compilers, build tools, and test frameworks. But none of those need to ship in the final image that runs in production. They add hundreds of megabytes and security vulnerabilities.

Multi-stage builds let you use multiple FROM statements in a single Dockerfile. You build your application in a "heavy" stage that contains the toolchain, then copy only the finished artifact into a tiny final stage. Everything from the heavy stage is thrown away. Here is the pattern for a Go application:

# Stage 1: build — heavy (~800 MB with the Go toolchain) FROM golang:1.25 AS builder WORKDIR /src COPY . . RUN go build -o /app # Stage 2: production — tiny (~10 MB, just the binary) FROM scratch COPY --from=builder /app /app CMD ["/app"]

The final image contains only the compiled binary — no Go compiler, no source code, no package manager. Docker's own documentation shows this exact pattern shrinking a Go "hello world" image from 805 MB down to 8 MB — a 99% reduction.

Three quick wins for any Dockerfile:
  • Add a .dockerignore file (like .gitignore) to exclude .git, node_modules, and any .env files from the build context.
  • Combine related shell commands: RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/* — cleaning up in the same layer so the cache doesn't get baked in.
  • Put dependency installs before code copies, as we just saw above.

Building and Sharing — A Practical Walkthrough

Knowing how to write a Dockerfile is half the picture. The other half is sharing the image you built, so it can run on other people's machines — your colleagues' laptops, a CI server, a production cluster, anywhere.

For that you need a registry: a server that stores images and lets others download them. The default public registry is Docker Hub at hub.docker.com, and you can sign up for a free account in under a minute. There are other registries — GitHub's GHCR, Amazon's ECR, Google's Artifact Registry, and more — and we'll compare them in a separate article. For this walkthrough, Docker Hub is the easiest place to start.

Here is the whole flow at a glance:

Build once · Push once · Pull anywhere 1 · Your laptop docker build Reads your Dockerfile, creates an image locally on your machine. docker push 2 · Docker Hub hub.docker.com Your image is stored here, publicly accessible to anyone in the world (unless you made it private). docker pull (automatic) 3 · Anyone's machine docker run Downloads any missing layers automatically and starts a container. The same image runs identically on every machine that pulls it.
Figure 2 — The build / push / pull workflow. The registry sits in the middle as the distribution hub.

Now the actual commands, in the order you would run them on the day you publish your first image:

# 1. Build the image — tag it with your Docker Hub username $ docker build -t alice/weather-app:1.0 . [+] Building 12.3s (10/10) FINISHED => => writing image sha256:a3ed95caeb02... => => naming to docker.io/alice/weather-app:1.0 # 2. Log in to Docker Hub (only needed the first time on a machine) $ docker login Username: alice Password: ******** Login Succeeded # 3. Push your image to Docker Hub $ docker push alice/weather-app:1.0 The push refers to repository [docker.io/alice/weather-app] 1.0: digest: sha256:a3ed95caeb02ffe... size: 1862

That's it. The image is now public on Docker Hub. Anyone in the world can run it with a single command — and they don't even need to pull it first, because docker run downloads any missing pieces automatically:

$ docker run -p 8080:8080 alice/weather-app:1.0 Unable to find image 'alice/weather-app:1.0' locally 1.0: Pulling from alice/weather-app [...image layers download...] Status: Downloaded newer image for alice/weather-app:1.0 * Running on http://0.0.0.0:8080

A few details worth knowing on your first push:

  • The image name alice/weather-app:1.0 follows the format <username>/<repository>:<tag>. The username scopes the image to your Docker Hub account.
  • By default, images pushed to Docker Hub are public — anyone can pull them. To keep an image private, you need to create a private repository on Docker Hub before pushing to it.
  • Notice how docker push only transfers the layers Docker Hub doesn't already have. If you push three versions of the same app over time, the base image and dependency layers are uploaded only once. The same happens in reverse for pulls.
  • The example above used the -p 8080:8080 flag at run time. This is what actually publishes the port to the outside world — remember, the EXPOSE in the Dockerfile was only documentation.

This same workflow — build once, push once, pull anywhere — is the foundation of nearly every modern deployment pipeline. We'll explore choosing the right registry, securing them, and automating pushes from CI in a separate article.

Tags and Digests: Naming Your Images

In the example above, we pushed an image called alice/weather-app:1.0. The 1.0 part is the tag — a human-readable label for that specific version of the image. A full image reference looks like registry/repository:tag, for example docker.io/nginx:1.25 or python:3.12-slim.

Tags are convenient, but they come with a hidden danger: tags are mutable. Anyone with push access can re-point a tag to a completely different image at any time. The nginx:latest tag you pulled this morning may not be the same image as the nginx:latest someone else pulls tomorrow afternoon. Production systems that depend on latest can change behaviour silently overnight.

Tag vs. Digest — Mutable vs. Immutable Tag — can change nginx:latest Today → v1.25 Tomorrow → v1.26 (?) Next week → v1.27 (?) ⚠ Not reproducible vs Digest — never changes nginx@sha256:a3ed95caeb02... SHA256 fingerprint of the exact image contents If content changes, the hash changes. Same hash = identical image, every time, anywhere. ✓ Fully reproducible · tamper-evident
Figure 3 — A tag is a sticky note that anyone can move. A digest is a fingerprint mathematically tied to the exact contents of the image.

The fix is the digest: a SHA256 hash computed from the image's bytes themselves. You pin to it like this: python:3.12-slim@sha256:06a3f7b1.... If anything inside the image changes — a single byte — the hash changes too. The same digest, anywhere in the world, on any day, refers to exactly the same bits.

For development, tags are fine — they're readable and convenient. For anything that runs in CI or production, you should pin to digests. The maintenance pain (digests are long and unmemorable) is solved by tools like Docker Scout, Renovate, and Dependabot, which automatically track new versions and update the digests in your Dockerfile via pull requests.

Dissecting a Real Image — Inside postgres:16

To finish, let's open up a real, well-known image and see what's actually inside one. The official postgres:16 image — running countless databases in production around the world — is a great case study because it is public, well-engineered, and small enough to grasp at a glance.

When you run docker pull postgres:16, Docker downloads about 425 MB across roughly a dozen Dockerfile-level layers. Those layers can be grouped logically into seven distinct purposes:

What's Inside postgres:16 ~425 MB total · grouped into seven logical layers ↑ TOP — closest to the running container L7 Entrypoint & defaults init script & CMD < 1 MB A shell script that initialises an empty database on first run, then hands off to the postgres daemon. L6 PostgreSQL 16 server + client tools postgres · psql · pg_dump · libs ~280 MB ~66% of image The actual database engine, command-line tools, and every shared library they depend on. By far the bulk of the image. L5 PostgreSQL apt repo package source + GPG keys < 1 MB Configures Debian's apt to pull Postgres straight from the official upstream repository. L4 gosu utility root → postgres user switcher ~2 MB A safer alternative to sudo, used by the entrypoint to drop privileges before starting the daemon. L3 Locale & timezone UTF-8 + tzdata < 1 MB UTF-8 locale + timezone data so PostgreSQL handles text in any language correctly. L2 System user "postgres" dedicated, non-root < 1 MB A dedicated user account. PostgreSQL should never run as root — for safety. L1 Debian Bookworm slim FROM base image ~30 MB A minimal Debian Linux user space — the foundation. Often shared with other images on your machine. ↓ BOTTOM — the foundation Read bottom-up: L1 is built first, L7 last. Each layer adds on top of the previous one.
Figure 4 — A logical view of the postgres:16 image. The actual database engine (L6) is two-thirds of the entire image; everything else is supporting setup.

Let's read the diagram from the bottom up — the order in which Docker actually builds the image:

L1 · Debian Bookworm slim (~30 MB) — a minimal Debian Linux user space. No Python, no compilers, no documentation — just enough Linux to run programs. This is the FROM of the postgres Dockerfile, the foundation on which everything else is built. Because many other images share this same base, the layer is often already on your machine from a previous pull.

L2 · System user "postgres" (under 1 MB) — a dedicated user account. The PostgreSQL server should never run as root, so the image creates a non-privileged postgres user that the daemon will switch to before starting.

L3 · Locale and timezone setup (under 1 MB) — UTF-8 locale and the C standard library's locale data. Without this, PostgreSQL would struggle with text in non-English languages — accents, non-Latin scripts, sorting rules, and so on.

L4 · gosu utility (~2 MB) — a tiny tool used by the startup script to drop from root to the postgres user when starting the daemon. It is the safer cousin of sudo for use inside containers.

L5 · PostgreSQL apt sources and GPG keys (under 1 MB) — configures the official PostgreSQL Debian repository so that apt can install Postgres straight from the upstream source, rather than the older version shipped with Debian itself.

L6 · PostgreSQL 16 server + tools (~280 MB) — this is the bulk of the image: the actual postgres daemon, the client tools (psql, pg_dump, pg_restore, etc.), and every shared library they depend on. Roughly two-thirds of the image's total size lives in this single logical layer.

L7 · Entrypoint script + defaults (under 1 MB) — a shell script called docker-entrypoint.sh that runs when the container starts. It checks whether a database has already been initialised, creates an empty cluster if not, applies any custom configuration you provided through environment variables, then hands control over to the postgres daemon.

On top of the layers, the image also carries metadata — small pieces of information Docker reads when starting a container from it:

  • EXPOSE 5432 — documents the standard PostgreSQL port
  • USER postgres — runs the daemon as the postgres user, not root
  • VOLUME /var/lib/postgresql/data — declares where the database files live, encouraging you to mount a persistent volume there
  • ENTRYPOINT docker-entrypoint.sh — the script that runs when the container starts
  • CMD ["postgres"] — the default command passed to the entrypoint

Why does any of this matter? Because once you can read an image's layers, you understand exactly what's running inside your container — no magic. You can predict where data persists (in /var/lib/postgresql/data, which is why you mount a volume there), why port 5432 needs publishing, why the first startup is slightly slower than later ones (the entrypoint initialises the cluster on first run), and which Linux distribution your queries are actually running on top of (Debian Bookworm).

This pattern — minimal base, dedicated user, system setup, application binaries, entrypoint script, metadata — repeats across nearly every well-built image you'll encounter. The official nginx image follows it. The redis image follows it. Once you've written a few images of your own, yours will start to look like this too.

Main References

  1. Docker Documentation. Dockerfile reference.
    docs.docker.com/reference/dockerfile
  2. Docker Documentation. Images and layers — how storage drivers work.
    docs.docker.com/engine/storage/drivers
  3. Docker Documentation. Multi-stage builds.
    docs.docker.com/build/building/multi-stage
  4. Docker Documentation. Image tags, pinning by digest.
    docs.docker.com/reference/cli/docker/image/pull
  5. Docker Official Images — postgres. The Dockerfile behind the postgres:16 image.
    github.com/docker-library/postgres
← Back to all articles