Michael Paltsev
- Apr 23, 2023
- 8 min read

Mastering Dockerfile Efficiency: Optimizing Layers, Security, and Best Practices

Updated: Dec 25, 2023

Most of us have been working with Docker or at least heard of it. We've used and created Dockerfiles and usually, we've never gone deep into it. That is, we know how to read a Dockerfile, and we know how to write a simple one, but do we know how to get the most out of it? In this article, I hope that I'll show you at least one thing that will improve your knowledge about Dockerfiles and how to get the most out of them.

Image Layers

Let's start with the one thing that we care about the most. Well, maybe not the most, but we certainly pay attention to it whenever we need to pull an image or push one - the size of the image.

We know that the layers in the image add up to its size, but how does it work?

First of all, a layer in an image is created for every instruction in our Dockerfile. But not all instructions are treated equally. All of them will create an image layer, but only some will make the layer take up space.

Instructions such as FROM, COPY, ADD, RUN, and others will create a layer that will take up space, while instructions such as ENTRYPOINT, ENV, LABEL, and WORKDIR will not.

Let's examine the following Dockerfile:

If we'll build it, assuming we have a tar.gz file by the name my_file that has a my_file.txt file in it, we'll get a new image:

Before we head to examine the new image, let's have a look at the output that we've got from the build command. We can see that the instructions in the Dockerfile have been translated into steps. We have 5 instructions and 5 matching steps.

Every step is said to be running somewhere. These are different containers that when they are executed, their output form a layer. But we can also see that some are being removed right after the execution. Those that are removed will not count toward the size of the image.

Let's take a closer look by running:

This will output the layers in the image. You'll notice that there are only two layers there. Presumably, one is the layer that represents the parent image and the other is the layer that was generated by the ADD instruction. But that is not entirely correct.

If we'll take a look into the history of the image by running

we'll see that one of the layers is a layer from our parent image.

Okay. Enough with this layer thing. We've got it. Some instructions produce layers that count towards the image size and they are important while others do not, making them not important.

Again, that statement is partially correct. Layers in a Dockerfile can be seen as read-only items in a stack where each layer is cached and the one that is stacked above it holds only the diff.

Thus, if we'll add a new instruction before the ENTRYPOINT in our Dockerfile, Docker will use the cached layers of the previous build and will produce a new image.

But, changing the order of the instructions in a Dockerfile, or even modifying one of the "non-important" instructions can generate a completely new image that will have different layers. This is important because pushing and pulling images that have some layers that are already cached can decrease the transfer size - this means faster transfers that cost less.

In addition, if one instruction adds a file and the one after it deletes it, because the layers are read-only, the image will not get smaller.

One way to get around that is to use a file that is called .dockerignore. This file is very similar to .gitignore but is used by Docker to ignore adding files and/or directories to the image that you don't want.

To sum this section up, here are the best practices for Dockerfile instructions:

Structure the Dockerfile in such a way that the instructions that change the most, e.g., instructions that add your binaries to the image, will be placed at the end of the file while those that don't, be placed at the beginning.
When you handle files in your instructions, a good idea will be to group those instructions such that you'll be able to remove unneeded files to save space. E.g., when you install a package with a package manager, clear the cache in the same instruction and not the one after.
Use .dockerignore to not add unwanted files.

Parent Image

We talked about the image layers and we've mentioned the parent image. If you want a smaller image you should start with a small parent image. But how should we pick one?

We first need to know that there are two different definitions: a Parent Image and a Base Image. The difference between the two is just in the first line of their Dockerfile:

Base images are built FROM scratch
Parent Images are built FROM either Base Images or other Parent Images

Because most of the time we use Parent Images, we'll be using this term.

Now that we know this important piece of information, let's break a miff. The kernel that you use is not of the parent image but of the hosting machine. This means that the only thing that the parent image does is bring with it its utilities while relying heavily on the infrastructure of the hosting machine. This also means that you won't be able to run Windows-based dockers on a non-Windows host (usually of the same version).

Usually, this is not an issue. Most of the Docker images run on Linux, and the way Docker runs on Windows and Mac is by utilizing a VM that is running Linux. I.E., WSL2 in Windows and a full VM in Mac.

The issue arises when you try to run something that was compiled for a specific architecture on a different one, e.g., x86 on ARM-based machines. Docker Desktop however can handle these issues.

So what are the best practices for choosing the parent image?

Use an image from a trusted source. The best way is to create your own Docker repository, but if you don't have one or don't want to put effort into it, use the Docker Hub.
Always use a specific tag for your parent image, and even better its digest number (we'll cover that in the security section).
Your parent image should be very small and should have only files that are needed to build your Dockerfile and run your application.
Your parent image should match the architecture on which you are going to run your Docker container.

If you can't find such a parent image, expand your searches. But, if you've done your research and nothing can be used, you should create one yourself.

There are two ways of doing that:

Use a parent or a base image and create a new parent image.
Create a new base image from scratch. But this is for a different blog post.

Using a Multi-Stage Build

We've talked about decreasing the image size by using small parent images. But sometimes you need to either install a package in your Docker image or even worse, build one from scratch.

Does that mean that we need to use parent images that have package managers or that are capable of compiling code?

The short answer is no. For this, we have the multi-stage build. And it is going to help us with the task of keeping our image as clean and small as possible.

As the name implies, you can break your image build into different stages where in each stage you generate an important piece of your image without impacting your final result.

Let's see an example. Suppose you want to build a Java application and deploy it in a Docker container. You'll need a JDK and maven to build it and then, to save some space, only a JRE to run it. How should we do it?

Before the multi-stage build, we had to create two different Dockerfiles; one for building the code and the other for executing it. Now, we can combine the two into one Dockerfile which in turn improves our ability to maintain it. Below you can see an example of a Dockerfile that has a multi-stage build defined.

Here we go, we have a multi-stage build. The first stage, which generates the Builder is responsible for building the code while the second stage is only used to run it.

Running:

Will create a Docker image that will only have a JRE with our packaged application.

You can even have more than two stages. Consider a scenario where you need to generate multiple images for your application. E.g., one that will run with an experimental setting or a debug flag.

Then, running:

Will only build the Builder and the Debug image.

In conclusion, the following rules should tell you when you should consider using a multi-stage build:

You need to use resources in build-time that are not needed in runtime.
You want to generate different images and you want to reuse the

Security

In the previous sections, we talked about the image layers, the multi-stage build, and the parent image. Now it's time to talk about something very important that incorporates in it all of what we've talked about - security.

Vulnerabilities

The parent image you use can bring vulnerabilities with it. We've already talked about using a specific tag (which is not latest) for your parent images. This will keep your builds consistent - changes will be introduced only when you decide to change the tag and not when a new version gets the latest tag. This will keep new vulnerabilities outside of your image but current vulnerabilities are still there to stay.

The same applies to packages that you install during the multi-stage build: always use a specific version for the package that you want to install.

So how do we recognize the vulnerabilities as soon as possible? The answer is by running security scans during our CI pipeline and failing the build if they find anything. For that, you can use an open-source tool by the name of grype - https://github.com/anchore/grype/ in the following way:

In this example, we've scanned our image using grype which told us that there are no vulnerabilities found.

Let's try a different one:

Here, on the other hand, we can see that there are multiple vulnerabilities. A useful feature of this tool is the FIXED-IN column where you can see the tag of the image at which the vulnerability was fixed.

Using Digests instead of Tags

Usually, when we use a parent image we refer to it by its name and tag. But how can we reliably say that the image that we've downloaded is the correct one? After all, we can name and tag our images in any way we please, so creating a malicious image and tagging it with a name and tag of a trusted one is a very easy task. And there are ways to route your traffic to a different Docker repository so these kinds of threats are real. Therefore, the question that we need to as is: how can we defend ourselves?

Digests come to the rescue!

Different images, even if they have the same name and tag will get different digests that are created when you push an image to a V2 repository. When you use a digest instead of a tag you can rely on the docker engine to verify the digest of the pulled image. This protects you from impersonating images.

Don't run as Root

We never run as root on our computers so why would we do that in a Docker container?

You should prefer to create a specific user in your Dockerfile, giving him permission over the needed files, and executing the application using that specific user:

Don't Expose the Data

We never want to expose any data but when we build containers it is very easy to accidentally add sensitive files to the image. To avoid that, a good idea is to use the .dockerignore file and to avoid copying entire directories.

All in all, the rule of thumb is: less is more :)

Linting

We've talked about different topics and their best practices but failed to mention an important tool that can help us with validating our Dockerfiles - the linter.

One such open-source linter is called hadolint. Let's see it in action.

First, we need to download it:

And then, let's run grype to verify that there are no known vulnerabilities there:

We now can run hadolint on our Dockerfile. Let's try it with our latest multi-stage docker:

You can see that it was able to find 3 mistakes - we didn't pin the versions when we install different dependencies, and we didn't use a WORKDIR when we copied files.

As you can see, and probably already know, linting can greatly improve your ability to write good Dockerfiles.

Conclusion

In this article, we've gone over some of the more common areas of Dockerfiles. We've learned about the layers and their importance in the build flow of the image and its size. We've also learned about the parent image and how to use a multi-stage build to our advantage. All of this came in handy when we talked about security best practices.

Hope you've enjoyed it as much as I've enjoyed writing it.