topics: docker (post)

Reducing Docker Image Size

An Introduction to Docker Images

A Docker image is a list of references to read-only layers, where each layer represents differences to the previous layer. When a container is run using an image, an additional “thin” writable layer is added to the images list of layers. This additional writable layer, often referred to as the “container layer” contains all changes made by the running container such as modifications to files, the deletion of files and the creation of new files. The Docker Storage Driver is responsible for applying layers iteratively providing a single “unified” representaion of the resulting filesystem. To find out which Storage Driver you are using, you can run:

$ docker info 2>&1 | grep "Storage Driver"

Consequently, if changes are made to a container, a commit command is provided to add the “container layer” to the images list of read-only layer references, resulting in a new image incorporating the changes made on the container. The commit option is capable of additonal utility, such as the modification of environment variables associated with the image.

The commit command takes the form:

$ docker commit [OPTIONS] CONTAINER [REPOSITORY[:TAG]]

How Are Images Built ?

Docker provides a build command which executes a list of instructions defined in a Dockerfile. When a Dockerfile is processed, a read-only layer is created and added to the images list of references for each instruction. The “FROM” instruction allows a Dockerfile to extend or add additional steps to the list defined previously by the image referenced. To create an entirely new Docker image, the special case “FROM scratch” is provided. This is a no-op as of Docker 1.5.0 and does not add an additional layer to the image built. As an example, the Debian Jessie Dockerfile currently looks like:

FROM scratch
ADD rootfs.tar.xz /
CMD ["/bin/bash"]

To pull the Debian Jessie image down from the default docker.io registry:

$ docker pull debian:jessie

With the debian:jessie image pulled locally, we can look at its size and the layers that the image is composed from:

$ docker inspect $(docker images debian:jessie -q)
  ...  
  "Size": 123035276,
  ...
  "RootFS": {
      "Type": "layers",
      "Layers": [
          "sha256:fe4c16cbf7a4c70a5462654cf2c8f9f69778db280f235229bd98cf8784e878e4"
      ]
  }

Avoid Unnecessary Layers

With each instruction in a Dockerfile adding a new layer to the image, the resultant image size behaviour may be initially counter-intuitive. For example, if we construct a Dockerfile with an instruction that creates a 10MB file and then an instruction that deletes the 10MB file, we might expect the image size to be unchanged from the parents. Lets see what happens:

# Dockerfile
FROM debian:jessie
RUN dd if=/dev/zero of=/root/test.file bs=1024 count=10240
RUN rm /root/test.file
$ docker build . -t test:size
$ docker inspect $(docker images test:size -q)
  ...   
  "Size": 133521036,
  ...
  "RootFS": {
      "Type": "layers",
      "Layers": [
          "sha256:fe4c16cbf7a4c70a5462654cf2c8f9f69778db280f235229bd98cf8784e878e4",
          "sha256:bbc0457423101b167b672a5fd38ecbfd096cc683d06a9283662e0116d3c1247b",
          "sha256:90ef4236fcddd1192b50804a977d1dfcb5451fc4a38843edd66db1cba7f3ff40"
      ]
  }

We can see that 2 additional layers have been added to the list of layers, one for each RUN instruction, and that the resultant image size is 10MB larger than the Debian Jessie image used as a base.

Wrapping the two RUN instructions up into one results in an image size unchanged from the parents, with a total of 2 layers.

# Dockerfile
FROM debian:jessie
RUN dd if=/dev/zero of=/root/test.file bs=1024 count=10240 \
    && rm /root/test.file
  ...
  "Size": 123035276,
  ...
  "RootFS": {
      "Type": "layers",
      "Layers": [
          "sha256:fe4c16cbf7a4c70a5462654cf2c8f9f69778db280f235229bd98cf8784e878e4",
          "sha256:1e0f9edcb7a4cc121369e94f8d15f56c7b8afe7163ade6ff8de71885b9e64735"
      ]
  }

Apt cache cleanup

When updating Apt package lists using apt-get update, repository and package data is cached in /var/lib/apt/lists. The contents can be safely deleted after installing all required packages as the content will be regenerated if apt-get update is run again.

When installing packages using Apt, a cache of downloaded archive files is kept in /var/cache/apt. Interestingly, the Debian Jessie Docker image includes hooks for DPkg::Post-Invoke and APT::Update::Post-Invoke which delete .deb and .bin cache files, performing apt-get clean esque tidying automatically. The implementation can be seen in debootstrap, with the hooks added to /etc/apt/apt.conf.d/docker-clean ordinarily.

To remove the Apt repository and package cache, avoiding adding an additional layer in doing so, the cache should be removed in the same RUN instruction as the packages were installed. For example:

# Dockerfile
FROM debian:jessie
RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*

Apt –no-install-recommends

When using apt to install packages, the default behaviour is to install any packages that are recommended.

What does recommended mean?

From the Debian Policy Manual:

Recommends
This declares a strong, but not absolute, dependency.
The Recommends field should list packages that would be found together with this one in all but unusual installations.

Apt provides a –no-install-recommends argument which prevents packages that are recommended from being installed by default. This may result in a smaller image size if the installation of packages that are not required is avoided. As an example:

# Dockerfile
FROM debian:jessie
RUN apt-get update && apt-get install -y --no-install-recommends curl && rm -rf /var/lib/apt/lists/*

As a note of caution, the above avoids the installation of packages: ca-certificates, krb5-locales, libsasl2-modules, with the ca-certificates package required if SSL certificate validation is required. Caution should be employed.

Use Alpine Linux

Alpine Linux is a security-oriented, lightweight Linux distribution based on musl libc and busybox.

The latest Alpine Linux Docker image comes in at under 5MB with the current Debian Jessie image around 123MB. There may be disadvantages to using Alpine Linux, including the potential need to familiarise oneself with a different package manager. I have found issues running PHP compiled against Musl libc, such as the lack of GLOB_BRACE support. A more thorough discussion of Alpine Linux as a base image will be the main focus of a future blog post.