Slimming down Dockerfiles: Decrease the size of Gitlab’s CI runner from 900 to 420 mbMar 22 2015
I’ve been leveraging Gitlab CI for our continuous integration needs, running both the CI site and CI runners on our CoreOS cluster in docker containers. It’s working well. On the runner side, after cloning the ci-runner repositroy and running a
docker build -t base-runner . , I was a little disappointed with the size of the runner. It weighed in at 900MB, a fairly hefty size for something that should be a lightweight process. I’ve built the ci-runner dockerfile with the name “base-runner”:
base-runner latest aaf8a1c6a6b8 2 weeks ago 901.1 MB
The dockerfile is well documented and organized, but I immediately noticed some things which cause dockerfile bloat. There are some great resources on slimming down docker files, including optimizing docker images and the docker-alpine project. The advice comes down to:
- Use the smallest possible base layer (usually Ubuntu is not needed)
- Eliminate, or at least reduce, layers
- Avoid extraneous cruft, usually due to excessive packages
Let’s make some minor changes to see if we can slim down this image. At the top of the dockerfile, we see the usual apt-get commands:
# Update your packages and install the ones that are needed to compile Ruby RUN apt-get update -y RUN apt-get upgrade -y RUN apt-get install -y curl libxml2-dev libxslt-dev libcurl4-openssl-dev libreadline6-dev libssl-dev patch build-essential zlib1g-dev openssh-server libyaml-dev libicu-dev # Download Ruby and compile it RUN mkdir /tmp/ruby RUN cd /tmp/ruby && curl --silent ftp://ftp.ruby-lang.org/pub/ruby/2.0/ruby-2.0.0-p481.tar.gz | tar xz RUN cd /tmp/ruby/ruby-2.0.0-p481 && ./configure --disable-install-rdoc && make install
RUN command creates a separate layer, and nothing is cleaned up. These artifacts will stay with the container unnecessarily. Running another
RUN rm -rf /tmp won’t help, because the history is still there. We need things gone and without a trace. We can “flatten” these commands and add some cleanup commands while preserving readability:
# Update your packages and install the ones that are needed to compile Ruby # Download Ruby and compile it RUN apt-get update -y && apt-get upgrade -y && apt-get install -y curl libxml2-dev libxslt-dev libcurl4-openssl-dev libreadline6-dev libssl-dev patch build-essential zlib1g-dev openssh-server libyaml-dev libicu-dev && mkdir /tmp/ruby && cd /tmp/ruby && curl --silent ftp://ftp.ruby-lang.org/pub/ruby/2.0/ruby-2.0.0-p481.tar.gz | tar xz && cd /tmp/ruby/ruby-2.0.0-p481 && ./configure --disable-install-rdoc && make install && apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/*
There’s only one run command, and the last two lines cleanup the apt-get downloads and
tmp space. Let’s see how we well we do:
$ docker images | grep base-runner base-runner latest 2a454f84e4e8 About a minute ago 566.9 MB
Not bad; with one simple modification we went from 902mb to 566mb. This change comes at the cost of build speed. Because there’s no previously cached layer, we always start from the beginning. When creating docker files, I usually start with multiple run commands so history is preserved while I’m working on the file, but then concatenate everything at the end to minimize cruft.
566mb is a good start, but can we do better? The goal of this build is to install the ci-runner. This requires Ruby and some dependencies, all documented on the ci-runner’s readme. As long as we’re meeting those requirements, we’re good to go. Let’s switch to debian:wheezy. We’ll also need to tweak the locale setting for debian. Our updated dockerfile starts with this:
# gitlab-ci-runner FROM debian:wheezy MAINTAINER Michael Hamrah <email@example.com> # Get rid of the debconf messages ENV DEBIAN_FRONTEND noninteractive # Update your packages and install the ones that are needed to compile Ruby # Download Ruby and compile it RUN apt-get update -y && apt-get upgrade -y && apt-get install -y locales curl libxml2-dev libxslt-dev libcurl4-openssl-dev libreadline6-dev libssl-dev patch build-essential zlib1g-dev openssh-server libyaml-dev libicu-dev && mkdir /tmp/ruby && cd /tmp/ruby && curl --silent ftp://ftp.ruby-lang.org/pub/ruby/2.0/ruby-2.0.0-p481.tar.gz | tar xz && cd /tmp/ruby/ruby-2.0.0-p481 && ./configure --disable-install-rdoc && make install && apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/*
Let’s check this switch:
base-runner latest 40a1465ebaed 3 minutes ago 490.3 MB
Better. A slight modification can slim this down some more; the dockerfile builds ruby from source. Not only does this take longer, it’s not needed: we can just include the
ruby-dev packages; on debian:wheezy these are good enough for running the ci-runner. By removing the install-from-source commands we can get the image down to:
base-runner latest bb4e6306811d About a minute ago 423.6 MB
This now more than 50% less then the original, with a minimal amount of tweaking.
Pushing Even Further
Normally I’m not looking for an absolute minimal container. I just want to avoid excessive bloat, and some simple commands can usually go a long way. I also find it best to avoid packages in favor of pre-built binaries. As an example I do a lot of work with Scala, and have an sbt container for builds. If I were to install the SBT package from debian I’d get a container weighing in at a few hundred megabytes. That’s because the SBT package pulls in a lot of dependencies: java, for one. But if I already have a jre, all I really need is the sbt jar file and a bash script to launch. That considerably shrinks down the dockerfile size.
When selecting a base image, it’s important to realize what you’re getting. A linux distribution is simply the linux kernel and an opinionated configuration of packages, tools and binaries. Ubuntu uses aptitude for package management, Fedora uses Yum. Centos 6 uses a specific version of the kernel, while version 7 uses another. You get one set of packages with Debian, another with Ubuntu. That’s the power of specific communities: how frequently things are updated, how well they’re maintained, and what you get out-of-box. A docker container jails a process to only see a specific part of the filesystem; specifically, the container’s file system. Using a distribution ensures that the required libraries and support binaries are there, in place, where they should be. But major distributions aren’t designed to run specific applications; their general-purposes servers that are designed to run a variety of apps and processes. If you want to run a single process there’s a lot that comes with an OS you don’t need.
There’s a re-emergence of lightweight linux distributions in the docker world first popularized with embedded systems. You’re probably familiar with busybox useful for running one-off bash commands. Because of busybox’s embedded roots, it’s not quite intended for packages. Alpine Linux is another alternative which features its own official registry. It’s still very small, based on busybox, and has its own package system. I tried getting gitlab’s ci-runner working with alpine, but unfortunately some of the ruby gems used by ci-runner require GNU packages which aren’t available, and I didn’t want to compile them manually. In terms of time/benefit, I can live with 400mb and move on to something else. For most things you can probably do a lot with Alpine and keep your containers really small: great for doing continuous deploys to a bunch of servers.
The bottom line is know what you need. If you want a minimal container, build up, rather than slim down. You usually need the runtime for your application (if any), your app, and supporting dependencies. Know those dependencies, and avoid cruft when you can.