Table of contents
Tips for writing docker files - Basic setup, Caching, root user, services organization, networking and resource limits
The content here is under the Attribution 4.0 International (CC BY 4.0) license
Docker has revolutionized how developers build and deploy applications, being one of the most popular container engines (Burns et al., 2019). Docker has support for different programming languages and runs natively on Linux, as opposed to virtual machines that mimic an entire operational system, docker containers run on Linux namespaces, removing the overhead that virtual machines have, for example, the boot time. The virtual machine needs time to boot, while docker is a service that starts on the host operational system.
As opposed to the official best practices (DockerHub, n.d.) on writing docker files, the goal here is to share tips on approaches in how to write the docker files, this is not a beginners guide in how docker works or how to use it.
For context, this post is somehow connected to the playlist (Marabesi, 2020) I’ve built to keep track of topics that are related to docker.
NOTE: if you are starting with docker, have a look at the curriculum (Srivastav, 2021) first to get used to docker basics. Personally I have a presentation around docker basics available as well (Marabesi, 2017).
NOTE 2: if you are interested in the nodejs docker image (how it is build) have a look at the official git repository (Nodejs, 2021)
Docker images and services
Docker images are the base for the containers, which in turn are the running instances of the image. The image is built from a Dockerfile, which is a set of instructions that the docker engine uses to build the image.
1. Make the basic setup with a standard image
Starting to build docker images requires steps with previous knowledge of the docker platform and at least to understanding of a few commands such as RUN, COPY and FROM. Based on those commands, the generated image can be big or small, it depends. Docker hub offers ready-to-use images without the need to build one, and they are classified as (from the biggest to the smallest): standard images, slim and alpine. The following snippet depicts a Dockerfile with standard image:
FROM node:23 # <--- standard image, and also the bigger compared to the next two version
WORKDIR /var/www/app
COPY . .
RUN npm ci && npm run build
EXPOSE 5000
CMD npm run serve
Dockerfile with slim image:
FROM node:23-slim # <--- slim image, smaller, but also has less dependencies installed by default
WORKDIR /var/www/app
COPY . .
RUN npm ci && npm run build
EXPOSE 5000
CMD npm run serve
Dockerfile with alpine image:
FROM node:23-alpine # <--- slim image, the smallest, but also it has some drawbacks such as missing needed dependencies by the code
WORKDIR /var/www/app
COPY . .
RUN npm ci && npm run build
EXPOSE 5000
CMD npm run serve
Usually, the setup using the basic image is faster as it comes with almost everything to run the program. on the other side though, the Alpine version has almost nothing to run the program, it has just the core, nothing else. Which in many cases will make the program not run, depending on the dependencies.
npm ci
In this example, the npm ci command is used to install the dependencies. Here are the reasons for that:
- It is optimized for continuous integration and installs dependencies much faster by skipping certain checks and steps that npm install performs.
- automatically removes the node_modules
The recommended approach is to start with the standard image, then move to the slim image and finally to the alpine image. This way, it is easier to debug issues that might arise from missing dependencies or configurations.
2. Caching
Caching in docker is used to avoid re-fetching dependencies over and over again even when they don’t change. To avoid that, the docker layer system can be used to trick the engine (Amigoscode & with Nana, 2020).
FROM node:23-slim
WORKDIR /var/www/app
COPY package*.json ./ # <--- Caches the npm dependencies
RUN npm ci && npm run build
COPY . .
EXPOSE 5000
CMD npm run serve
3. The root user
The root user is the default user in which the container runs, which makes easier the process of setting up permissions to access files or to set up configurations. Usually this is a bad practice (Tal & Goldberg, 2021), the container should not run with the root user due security issues (Fisher, 2019; Fisher, 2021). However for the process to set up the docker image this can be a bit harder, given the fact that setting up a different user with less permission can make the image setup.
If no user is given (as, in the last 3 dockerfiles shown in the previous section),
docker will build it using root
, which of course has security issues. To fix this issue
docker offers the flag USER
.
FROM node:23-slim
USER node # <!--- specify the user for docker to build and run the image
WORKDIR /var/www/app
COPY package*.json ./
COPY . .
RUN npm ci && npm run build
EXPOSE 5000
CMD npm run serve
This tip relies on the same approach as the previous one, first, make it work with the root user, then start to trick around permissions with a specific user. This is a common approach when building docker images, as it is easier to get the image working first, and then start to improve it. However, current images are not always built with the root user, so omiting the user is feasiable for some of them. Make sure to check it before building the image.
4. Separate concerns, avoid building different services into one image
As a best practice, the recommended way to build containers is: one container equals one process. Which can avoid problems when it comes to managing them as (DockerHub, n.d.) describes in the section “Decouple applications”. This means that each container should run a single service, such as a web server, a database, or a message broker. This approach allows for better scalability, maintainability, and flexibility in deploying and managing applications for local development.
5. setup docker file first, and then move to docker-compose (if needed)
Usually, docker-compose is the next step when building services to use with docker, though developers tend to skip the first step which is to understand how the image works and then move on to compose.
6. Networking and sharing hosts
Docker creates its network interface, which in turn containers communicate with each other. Therefore, there are scenarios
in which this behavior is not desired. For example, a database. As the database holds state (the data) usually it is used
by an external provider (RDS, MongoDB atlas etc). By default, the container can’t access external ports, which in turn will
block the database connection. There are two possible options for that, using
a network
flag or using the add-host
flag.
# using --network flag
docker run --rm --network=host nginx
There is a side effect of using the network flag, which will ignore the docker network created automatically by docker and the container will run as if it were in the host. Impacting the port that the application runs and therefore prevents the possibility of blue-green deployments (Marcus, 2019), which requires two instances of the same app running, each on its specific port.
host network mode
The host
networking is not available on Docker Desktop for Mac/Windows.
The add-host
gives the flexibility needed to overcome the port issue. The
flag maps a specific host to an IP, the following example maps the localhost
to be the host.
# using --add-host flag
docker run --rm --add-host=localhost:192.168.1.102 nginx
7. dockerignore
The .dockerignore
file is used to specify which files and directories should be ignored when building the docker image. This is useful to avoid copying unnecessary files into the image, which can make it larger and slower to build. The .dockerignore
file works similarly to a .gitignore
file, allowing you to specify patterns for files and directories to exclude.
# .dockerignore
node_modules/
Using the ignore file can help to reduce the size of the image and speed up the build process by excluding files that are not needed in the container. For example, if you have a node_modules
directory that contains all the dependencies of your application, you might want to exclude it from the image to avoid copying unnecessary files.
8. Security
Security is a crucial aspect of docker images, as they can be vulnerable to attacks if not properly configured. Docker Scout is the tool that can be used to scan docker images for vulnerabilities. It is a command-line tool that can be used to scan images for known vulnerabilities and provide recommendations for fixing them.
Docker compose
This section focuses on the docker-compose only.
1. Different docker compose files for different environments
Docker compose files are used to compose the container orchestration, therefore sometimes it is needed to use different behavior based on the environment that the application is one. For example, in development mode, the database container might be needed, but in production, it might not be the case.
For that, it is possible to create different docker-compose files for each environment. For example, for development, staging and production we might have:
- development:
docker-compose-dev.yml
- staging and production:
docker-compose-deploy.yml
It is also possible to share code among each docker file, which might make
sense to create a docker-compose.yml
as the base for the two files previously
mentioned.
2. CPU and memory limit
Sometimes we want to limit the resources of a given image, it might fit a scenario in which is needed to measure the application performance, or, in an environment with constrained resources.
docker-compose
offers two properties for that, mem_limit and cpus.
mem_limit is a hard limit, meaning the container will not try to consume
more memory even if available. On the other hand, the cpus is based on the
cores that the machine has.
version: '2'
services:
webserver:
image: nginx
mem_limit: 1024mb # memory ram setting
cpus: 0.8 # cpu setting
ports:
- 80:80
- 443:443
testable:
build:
args:
context: ./webapp
user: 'node'
Limiting the resources of a container can help to avoid resource contention and ensure that the container does not consume more resources than it needs. This can be especially important in production environments where multiple containers are running on the same host. In addition, limiting the resources of a container can help to prevent performance issues and ensure that the container runs smoothly even under low resources. I personally use this approach to check how the application behaves under low resources.
Related subjects
- Studying the Practices of Deploying Machine Learning Projects on Docker
- Docker Caching — Introduction to Docker Layers
- The Magic of Docker Desktop is Now Available on Linux (official documentation available at Install Docker Desktop on Ubuntu)
References
- Burns, B., Beda, J., & Hightower, K. (2019). Kubernetes: up and running: dive into the future of infrastructure. O’Reilly Media.
- DockerHub. Best practices for writing Dockerfiles. Retrieved May 23, 2020, from https://docs.docker.com/develop/develop-images/dockerfile_best-practices
- Marabesi, M. (2020). Docker. https://youtube.com/playlist?list=PLN7yVcqYnDlX7EzsleJ1jD_D0q-cYIP7r
- Srivastav, P. (2021). A comprehensive tutorial on getting started with Docker! https://docker-curriculum.com
- Marabesi, M. (2017). Docker 101 - Getting started. https://www.slideshare.net/marabesi/docker-101-getting-started
- Nodejs. (2021). docker-node. https://github.com/nodejs/docker-node
- Amigoscode, & with Nana, T. (2020). Docker and Kubernetes - Full Course for Beginners. https://youtu.be/Wf2eSG3owoA?t=6332
- Tal, L., & Goldberg, Y. (2021). 10 best practices to containerize Node.js web applications with Docker. https://snyk.io/blog/10-best-practices-to-containerize-nodejs-web-applications-with-docker
- Fisher, B. (2019). Docker and Node.js Best Practices from Bret Fisher at DockerCon. http://www.youtube.com/watch?v=Zgx0o8QjJk4
- Fisher, B. (2021). Top 4 Tactics To Keep Node.js Rockin’ in Docker. https://www.docker.com/blog/keep-nodejs-rockin-in-docker
- Marcus. (2019). How to do Zero Downtime Deployments of Docker Containers. https://coderbook.com/@marcus/how-to-do-zero-downtime-deployments-of-docker-containers