关注微信公众号
第一手干货与资讯
加入官方微信群
获取免费技术支持
The rise of containerization has been a revolutionary development for many organizations. Being able to deploy applications of any kind on a standardized platform with robust tooling and low overhead is a clear advantage over many of the alternatives. Viewing container images as a packaging format also allows users to take advantage of pre-built images, shared and audited publicly, to reduce development time and rapidly deploy new software.
However useful shared public images are, most users will also require custom images that define how to run their own tools and services. Whether customizing readily available software, packaging and running internal tools, or creating images as a release medium for your own projects, creating images is a fundamental part of the container paradigm. In this guide, we’ll talk about how to create your own images and some of the considerations to keep in mind as you do.
Container images are static bundles of files that represent everything a container runtime, like Docker, needs to run a container. Images include the filesystem layout, all of the required applications and dependencies, and configuration.
Each image is built from either a parent image (an image used as the starting point for the new image) or from an empty pseudo-image called scratch. Most parent images typically provide a filesystem structure that resembles a minimal Linux system, package management tools, and the core functionality that you’d expect from a command line environment. Parent images are available for most popular Linux distributions, often in a variety of configurations. Images are also available preconfigured for different programming languages and ecosystems.
scratch
Container images are built by applying “layers” onto previous images. Each filesystem layer represents a point-in-time record of the filesystem state after certain actions. Images that have common ancestry share filesystem layers, allowing for reduced overhead and greater consistency between images.
There are a few different ways to create container images. One of the easiest to to get started with is to interactively create images. You can run a container image with an interactive shell, perform the actions needed to get the operating system into the desired state, and then save the result. This is a good way to test ideas and validate your processes.
To begin, start up a container using your chosen parent image. We need to pass in a few arguments to the docker run command to start the container with the correct configuration. Pass the --interactive or -i flag to indicate that the container’s STDIN should be opened. Additionally, we need to use the --tty or -t flag to allocate a psuedo-TTY to be able to run interactive commands. Lastly, we need to spawn an actual shell like /bin/bash so that we have an interface to interact with the container.
docker run
--interactive
-i
STDIN
--tty
-t
/bin/bash
To demonstrate, we can start up an Ubuntu 18.04 container with a Bash shell session by typing:
docker run -it ubuntu:18.04 /bin/bash
Docker will check for the Ubuntu 18.04 image locally and, if necessary, pull in any missing or stale image layers from Docker Hub. After all of the required layers are available, Docker will allocate a pseudo-TTY and start a Bash shell, dropping you into a new session within the container:
Unable to find image 'ubuntu:18.04' locally 18.04: Pulling from library/ubuntu f476d66f5408: Pull complete 8882c27f669e: Pull complete d9af21273955: Pull complete f5029279ec12: Pull complete Digest: sha256:70fc21e832af32eeec9b0161a805c08f6dddf64d341748379de9a527c01b6ca1 Status: Downloaded newer image for ubuntu:18.04 root@b9f0772826d1:/#
From here, you can you can make changes to the file system to reflect the environment you need. As a simple example, we can add a message to a file:
echo 'hello there!' > /message
When you are finished with your changes, exit the session to back out of the container and return to your local shell:
exit
From here, we can take a look at our exited container by asking Docker to list all processes, including those that have completed:
docker ps --all
The output will list the recently terminated container.
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 3ccb8e54fb03 ubuntu:18.04 "/bin/bash" 15 seconds ago Exited (0) 6 seconds ago trusting_hofstadter
If you do not provide a name for your container instance, Docker generates a random name for you. In this instance, our container has been named trusting hofstadter. We can use this or the container ID (3ccb8e54fb03 here) to refer to this specific container instance.
trusting hofstadter
3ccb8e54fb03
If you are happy with your changes, you can save the image you created using the docker commit command. To do so, you need to provide the container name or the container ID from the last command as well as the name you want to use for the saved image. Here, we’ll name our image hello_world for simplicity:
docker commit
hello_world
docker commit trusting_hofstadter hello_world
sha256:ca6c6aeaa7dd0eea9d9e2590f02a41143827103daaafa0c6a74022976a211570
If we check the list of available container images on our system, our new image will be among the results:
docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE hello_there latest b3d29b4e3601 13 seconds ago 69.9MB ubuntu 18.04 7698f282e524 8 days ago 69.9MB
Now we can check whether our message is present within the image by running a container that displays the file we saved:
docker run hello_there cat /message
hello there!
Here, we passed the cat /message command to a new container spawned from our image to display the contents of the message file within the image.
cat /message
If this is the action we want to execute automatically whenever the image is run, we can update the image with that command by modifying one of the container image’s attributes as we commit a new image. We’ll base this off of our most recent container (which we can get the ID of with docker ps -lq) and commit the new image as hello_there:fixed:
docker ps -lq
hello_there:fixed
docker commit --change='CMD ["cat", "/message"]' $(docker ps -lq) hello_there:fixed
sha256:4310c3d569f5d338828296f6a0e26155f8015a26a5814688a0751f8432c20351
Now, we can run containers from the new image without specifying a command to execute at runtime:
docker run hello_there:fixed
This is a quick way to interactively create images to test out ideas, figure out dependencies, etc.
While creating images interactively is sometimes more comfortable for beginners, it does have some serious disadvantages.
Images created interactively don’t provide a clean record of what actions were taken to create the image. This makes maintaining and updating the image very challenging over time. Since the process relies on logging in and completing a series of steps manually, it also is more error prone. Due to the way that container images are built up with a new layer for each additional action, it’s also easy to accidentally create unoptimized images with extra layers that are not needed. And finally, as we saw above with the --change flag, we often have to manipulate the final image using additional commands to access fairly basic Docker functionality.
--change
In most real world situations, it’s preferable to create images from a Dockerfile instead. A Dockerfile is a plain text file that contains instructions that tell the Docker build engine how to create an image. The primary responsibilities of a Dockerfile include:
Dockerfile
Once a Dockerfile is defined, the docker build command can interpret it and combine it with a build context — a file path or URL representing a working directory — to create a new container image. This process enables simpler automation and leaves a good record of the actions taken to create the image.
docker build
The Dockerfile can be checked into source control and builds can be generated automatically by CI/CD processes as part of the development and release cycle. Furthermore, you have access to the full range of Docker image instructions at build time instead of having to specify changes after the image is built as we did when getting our interactive image to automatically run a command at start. Overall, this method of building images is self-documenting and offers more repeatability and flexibility than building images interactively.
To get an understanding of the general format of a Dockerfile, let’s show a very simple example by recreating our previous image.
Start by creating and moving into a directory that’ll serve as our build context:
mkdir ~/hello_world cd ~/hello_world
As mentioned above, the build context is a path or URL on the host system that is accessible to Docker during the build process. This is useful for copying files to the container image, for example. It’s important to note that the entire build context is sent to the Docker daemon at build time, so if you choose a directory with a large number of unnecessary files and subdirectories, it can increase the build time and resource usage during the build process substantially for no purpose.
Inside our new, clean build context, use your favorite text editor to create and open a file called Dockerfile to define the container image:
nano Dockerfile
Inside, specify the parent image we want to use as our starting point using the FROM instruction. We’ll use the ubuntu:18.04 image, just as we did in the interactive image:
FROM
ubuntu:18.04
FROM ubuntu:18.04
Next, we can create the /message file within the image using the RUN instruction:
/message
RUN
FROM ubuntu:18.04 RUN echo 'hello there!' > /message
Finally, we can specify the default action for containers spawned from this image by defining a CMD:
CMD
FROM ubuntu:18.04 RUN echo 'hello there!' > /message CMD ["cat", "/message"]
Save and close the file when you are finished. Now, create a new image from the Dockerfile using the docker build command. We can set the tag for the image to hello_world:first_dockerfile by including the -t flag. Notice the dot at the end of the command, indicating that the current directory should be used as the build context for the new image:
hello_world:first_dockerfile
docker build -t hello_world:first_dockerfile .
Sending build context to Docker daemon 2.048kB Step 1/3 : FROM ubuntu:18.04 ---> 7698f282e524 Step 2/3 : RUN echo 'hello there!' > /message ---> Running in 6b7c30da3b51 Removing intermediate container 6b7c30da3b51 ---> 06d07578c714 Step 3/3 : CMD ["cat", "/message"] ---> Running in 61cf938b6554 Removing intermediate container 61cf938b6554 ---> bf7f352c9604 Successfully built bf7f352c9604 Successfully tagged hello_there:first_dockerfile
Upon running the command, the Docker build process will set the context to the current directory and look for a Dockerfile. It will then interpret the instructions within to set up an environment and build the image according to the definition.
Once the image is built, we can create a container from it in much the same way as we did last time:
docker run hello_there:first_dockerfile
The results are similar, but with greater repeatability, accountability, and control over the process.
We can enhance our first Dockerfile to make it more flexible and, at the same time, demonstrate how the build context can influence the resulting image with a few minor modifications.
Instead of hard coding the message within the Dockerfile, we can place the message in a separate file located within our build context. This helps us separate data from the actual build process.
Inside of the directory with your Dockerfile, create a file called message with a new message inside. The easiest way to do this is to echo a string directly to a new filename:
message
echo 'message stored in build context' > message
Now, we have our message in a dedicated file instead of embedded within our Dockerfile. We need to adjust the build process to reflect this change though. Open the Dockerfile with your text editor to make the change:
We need to change the second line, RUN echo 'hello there!' > /message, to refer to the external file we created. We can copy the file from the build context into the image using the COPY instruction. Since the following command looks for the message at /message, we’ll copy the file there:
RUN echo 'hello there!' > /message
COPY
FROM ubuntu:18.04 COPY message /message CMD ["cat", "/message"]
The first argument to COPY refers to the file in the build context, while the second argument refers to the filesystem location within the actual image.
When you are finished build a new image using a different tag:
docker build -t hello_there:second_dockerfile .
Sending build context to Docker daemon 3.072kB Step 1/3 : FROM ubuntu:18.04 ---> 7698f282e524 Step 2/3 : COPY message /message ---> cd3c3e6866c1 Step 3/3 : CMD ["cat", "/message"] ---> Running in 868b88329e89 Removing intermediate container 868b88329e89 ---> 8156f474f6a3 Successfully built 8156f474f6a3 Successfully tagged hello_there:second_dockerfile
When you start a container based on the new image, you should see the message stored in the message file on your computer:
docker run hello_there:second_dockerfile .
message stored in build context
NoteThe message printed by the container is static upon building. If you change the message with the file after building, you will need to rebuild the image to update the message being output.
The message printed by the container is static upon building. If you change the message with the file after building, you will need to rebuild the image to update the message being output.
Now that you’ve gotten a bit of experience with some very simple Dockerfile instructions, it’s worthwhile to take a closer look at the most common operations available. We’ll take a look at the most common instructions, determine if they primarily affect the image build stage or the container runtime stage, and describe their general use.
We’ll start with a few operations typically found towards the beginning of the Dockerfile:
The FROM instruction must be the first item in a Dockerfile. It specifies a parent image that your image will start from. You can specify any image found on Docker Hub, a different image registry, or any local image. You specify the image using the <image_name>:<tag_name> format.
<image_name>:<tag_name>
For instance, to base your image off the official Ubuntu 18.04 image, you can specify the ubuntu repository on Docker Hub with the 18.04 tag, like this:
ubuntu
18.04
FROM ubuntu:18.04 . . .
The ubuntu images are maintained by the official Ubuntu team, so the repository is not namespaced under an individual’s account. To pull images from a repository that isn’t as foundational as the Ubuntu repository, you typically have to specify the account namespace prior to the image name, separated by a slash.
For instance, to use an image called test_image with the tag v1 from a demouser account, you would use the following syntax:
test_image
v1
demouser
FROM demouser/test_image:v1 . . .
While the vast majority of images use the FROM instruction to specify a parent image, you can also use FROM scratch to indicate that your image should not have a parent image. The scratch image is a pseudo image that indicates that you want to start without a parent image. In this case, the images filesystem is completely blank and everything must be built from the ground up to create the image layers and filesystem. You most often see FROM scratch used in the official base images offered by Docker.
FROM scratch
The LABEL instruction provides the ability to add key-value metadata to the image. This allows you to include arbitrary information about your image that might be useful for auditing or with automated processes. The LABEL instruction can be used many times within a single Dockerfile.
LABEL
The basic syntax for adding a LABEL to your image is the following:
. . . LABEL <key>=<value> . . .
You can include multiple labels on the same line or you can add an additional LABEL instruction for subsequent items.
These two are effectively the same:
. . . LABEL key1="value 1" key2="value 2" . . .
. . . LABEL key1="value 1" LABLE key2="value 2" . . .
Values that include spaces must be enclosed within quotes as demonstrated above.
The ADD instruction is a way to copy file from the local filesystem, a remote URL, or a local compressed archive file to a location within the image filesystem. This allows you to add arbitrary files from local and remote sources into your image. The ADD instruction is very similar to the COPY instruction we will learn more about later.
ADD
The syntax of the basic ADD instruction looks like this:
. . . ADD <source> <destination> . . .
The source in this case can refer to:
Sources can contain wildcard matching characters, which will be expanded when building to generate a list of valid files to copy. The source can also be a compressed or uncompressed tar archive on the local system. In this case, Docker will automatically extract the archive and copy the contents to the destination.
The destination can be an absolute or relative path. If a relative path is given, it is understood to be in relation to the WORKDIR, an instruction we’ll talk about later. If a destination ends with a trailing slash, it is interpreted as a directory. Otherwise, it is taken to be the name the file should be copied as in the image filesystem.
WORKDIR
The COPY instruction looks incredibly similar to the ADD instruction at first glance, and overlaps in purpose. The difference is that COPY only works with local files and provides no automatic archive extraction.
While these limitations may initially feel limiting, COPY is the recommended option for scenarios it can operate in due to its less ambiguous and easy-to-interpret behavior.
The COPY syntax mirrors that of ADD:
. . . COPY <source> <destination> . . .
The only difference being that the source here can only refer to local files or directories, which will be copied without extraction to the destination. Like ADD, sources can also use wildcard characters to match multiple files.
The COPY command can also take a --from= argument for use in multi-stage builds. This allows files to be copied from a previous image into the current image when the build process uses multiple images to produce the final image.
--from=
The ENV instruction is used to set an environmental variable during the build and run stages of the image. This allows you to set variables that will influence the image during the build process or that will be available to processes when containers are actually running.
ENV
The ENV instruction can use either of the following syntaxes:
. . . ENV <key> <value> ENV <key>=<value> . . .
The second form can contain multiple key-value pairings on the same line, separated by a space.
Once the environment variable is set, it is interpreted as it would be in any standard shell environment for the rest of the build process and in the containers spawned from the image.
The EXPOSE instruction communicates which ports the container’s services are listening on. This does not affect the way the container is built or run in any way but rather informs the container’s user which ports are being used. The user can then choose to publish the ports, exposing them to the network, if desired.
EXPOSE
The EXPOSE instruction is basically a dedicated mechanism for communicating with the container user about your image’s network ports. The basic syntax can be any of the following:
. . . EXPOSE <port> EXPOSE <port>/tcp EXPOSE <port>/udp . . .
If no protocol is specified, as in the first format, TCP is assumed. To expose the same port for both protocols, a separate line for each is required.
The RUN instruction is one of the most common instructions within any Dockerfile. The operation will execute the statements given to it within the image’s filesystem environment during the build process. This is the primary mechanism for making changes to build your image. It can run any commands available within the image’s filesystem.
The RUN instruction has two separate forms, the choice of which impacts its execution:
. . . RUN <command> RUN ["<executable", ..., "argn"] . . .
The first format, providing a raw command and arguments after the RUN instruction, will execute the given command in a shell session. This is the simpler format that executes in much the same way as it would on the command line. Since the command is executed by a shell, normal shell processing of wildcards, environment variables, etc. take place during execution.
The second format, which provides an executable and a list of arguments within a JSON array is not executed in a shell environment. This means that any behavior that relies on shell interpretation will not function correctly unless you choose to use a shell as the initial executable. This style of execution can be more predictable and can help you avoid side effects in situations where you do not need to rely on shell processing.
The USER instruction controls what user, and optionally group, commands are executed as in the image environment. This can affect both the build and runtime process as every subsequent command that is run in the container environment (RUN, CMD, and ENTRYPOINT) will be executed by the provided user. The USER instruction can be used multiple times to switch users for certain commands.
USER
ENTRYPOINT
By default, all instructions that manipulate the container file system are run as root. This makes configuration easy, but running containers as root can have severe security implications.
root
The syntax for the USER instruction is as follows:
USER <user_name_or_ID>:<group_name_or_ID>
The user and group components can either specify the name or the numerical ID within the image’s filesystem. Keep in mind that any user or group referenced must already exist on the image. You might need to execute commands using RUN prior to using USER to configure the identities you require.
The group element is optional and if it is left out, the colon can also be omitted. If the group is not specified, Docker will use the user’s primary group or the root group if that is undefined.
The VOLUME instruction is responsible for creating mount points within the container image’s filesystem for mounting external volumes from the host or other locations. This is the primary filesystem-based method for sharing data between the host and container or between containers. The VOLUME instruction specifies the internal mount point, but does not map it to any specific location on the host. Instead, this mapping is specified at runtime.
VOLUME
The syntax of VOLUME instruction can either take a series of strings separated by spaces or a JSON array. For instance, these two will produce the same result:
VOLUME ["/data/vol1", "/data/vol2"] VOLUME /data/vol1 /data/vol2
It is important to not think of the VOLUME instruction like the mount command in Linux. With the VOLUME instruction, all actions that will interact with the data within the volume must be performed prior to specifying the volume. Any data creation or copying performed after the VOLUME line will be discarded. Instead, you must perform your data actions on the mount point and then label it with VOLUME after you’ve performed the actions.
mount
The WORKDIR instruction declares the directory context for instructions like RUN, CMD, ENTRYPOINT, COPY, and ADD. It specifies the location on the filesystem where these commands should executed. This has implications for any relative paths or commands that perform actions in relation to the current directory.
The WORKDIR instruction will automatically create the directory and any parent directories necessary. It can be used as many times as desired througout the Dockerfile to change the context as required for the build and runtime instructions.
To declare a WORKDIR, you can use the following syntax:
WORKDIR <filesystem_path>
Usually, it is best to provide an absolute path to remove ambiguity. If a relative path is given, it will be interpreted as relative to the previous WORKDIR value.
The CMD instruction provides the default execution instructions for when a container is run from the image. While this can be overridden at runtime, this is the primary way to specify what should happen when a user executes docker run on your image. Since CMD helps specify the runtime command for the container, it can only be used once in a Dockerfile.
The CMD instruction can be used on its own or with conjunction with the ENTRYPOINT instruction we’ll discuss next. Because of this flexibility, the CMD instruction has a few different syntax variations that can affect how it is interpreted.
The first, called the “shell” form, simply lists out the commands and arguments as a string as they’d be given on the command line:
CMD ls -al /var/log | wc -l
This variant of the instruction is executed by passing the line directly to /bin/sh -c. This means that the string is interpreted and processed by the shell, which allows for piping, substitution, and any other shell magic that might alter the meaning of the command. This is a good format to use for simple commands that use the existing environment.
/bin/sh -c
The second format is called the “exec” form. This is the recommended format for most use cases as it is predictable and avoids unintentional shell behavior that can change the execution of the command. The syntax of the exec form uses a JSON array with the executable as the first element and the parameters as each subsequent element:
CMD ["full/path/to/executable", "param1", "param2"]
Unlike the shell form, this form executes the first element directly and passes the remaining elements as arguments. This means no shell substitution or string manipulation is performed. This is clean way of ensuring that the commands provided are executed exactly as written.
The third format looks similar to the exec form and is used in conjunction with the ENTRYPOINT instruction (we cover this next). If an ENTRYPOINT is provided, the JSON array specified by CMD will be interpreted as arguments to the entry point command. This means that the CMD will contain only parameters with no executable:
ENTRYPOINT ["/path/to/entrypoint"] CMD ["param1", "param2"]
This format allows users to easily override the specific arguments to the entry point command at runtime. This is useful if you want to provide a default executable and a default set of arguments, but allow the user to override just the arguments or override the entire command if desired. If your image should almost always run a certain command but might require different arguments at runtime, this is a good way of configuring that.
The ENTRYPOINT instruction allows you to configure a container that can be run as an executable by default. The target of an ENTRYPOINT instruction is command and parameters that should be always run, unless overridden, when the container is started. This command allows you to operate the container as if you were operating the command or script specified by the instruction.
The ENTRYPOINT instruction has two syntaxes, similar to CMD.
The first syntax is a string that will be passed to /bin/sh -c. As before, this will cause the string to be interpreted by the shell, so string manipulation, substitution, etc. will take place. The format looks like this:
ENTRYPOINT ls -al
The second format is the recommended alternative. It uses a JSON array to specify the command and any parameters. These will be executed directly without using a shell. This means that no variable substitution or other shell behavior will take place. However, it has the advantage of being predictable and making it possible to coordinate with an associated CMD instruction.
If your Dockerfile includes both a ENTRYPOINT and a CMD instruction, they both must use the exec form. When both instructions are present, the ENTRYPOINT will be interpreted as the command and required parameters. The CMD will be interpreted as the default, easily override-able parameters.
As a simple example, we can imaging a Dockerfile with the following instructions:
FROM ubuntu:18.04 ENTRYPOINT ["ls", "-al"] CMD ["/var/log"]
In this scenario, by default, when the resulting image is run, it will execute the command ls -al /var/log:
ls -al /var/log
docker build -t entry_cmd_test . docker run -it entry_cmd_test
total 272 drwxr-xr-x 3 root root 4096 May 15 14:06 . drwxr-xr-x 1 root root 4096 May 15 14:07 .. -rw-r--r-- 1 root root 3788 May 15 14:07 alternatives.log drwxr-xr-x 2 root root 4096 May 15 14:07 apt -rw-r--r-- 1 root root 35330 May 15 14:06 bootstrap.log -rw-rw---- 1 root utmp 0 May 15 14:06 btmp -rw-r--r-- 1 root root 178733 May 15 14:07 dpkg.log -rw-r--r-- 1 root root 3232 May 15 14:06 faillog -rw-rw-r-- 1 root utmp 29492 May 15 14:06 lastlog -rw------- 1 root root 6464 May 15 14:06 tallylog -rw-rw-r-- 1 root utmp 0 May 15 14:06 wtmp
We can easily change the directory being targeted by providing a parameter when we run that will override the one specified in CMD:
docker run -it entry_cmd_test /run
total 20 drwxr-xr-x 1 root root 4096 May 15 21:20 . drwxr-xr-x 1 root root 4096 Aug 21 16:28 .. drwxrwxrwt 2 root root 4096 May 15 14:06 lock drwxr-xr-x 2 root root 4096 May 15 14:06 mount drwxr-xr-x 2 root root 4096 May 15 21:20 systemd -rw-rw-r-- 1 root utmp 0 May 15 14:06 utmptotal 20
The ENTRYPOINT command and executable is still persistent even though we’ve overridden the parameters that were defined in CMD. This can provide some pretty interesting flexibility in how you construct your images.
Each time you issue a RUN instruction, Docker executes the command and commits the results as an additional layer for your image. Image layers are extremely important to understand as you build images. Each additional layer adds additional size to your image. So being mindful of what instructions create layers can help you reduce image bloat.
The other important thing to understand about image layers is how it affects Docker’s build cache. Docker uses a build cache to reduce the time it spends rebuilding images. During a rebuild, if it determines that nothing has changed in a layer, it will use the cached layer and move on. If it determines that a change has happened, it will invalidate the cached layer and all subsequent layers and re-execute the instructions from that point on. Docker cannot perfectly determine when it’s best to invalidate its cache, so you must think about how your instructions affect the cache. By crafting your instructions carefully, you can manipulate the caching behavior to rely on the cache when it is safe and to bust the cache when you need to rebuild fresh.
A good example of this interplay between reducing image size and busting outdated layers is when installing software from a repository. With many Linux distributions including Debian, Ubuntu, and Alpine Linux, installing software is conventionally broken down into a multi-part process. First, the local package indexes are updated by pulling down the latest information about the packages available from remote repositories. Afterwards, specific packages can be installed with the package manager using the local index to request the appropriate software from the remote repository.
An initial implementation of this process in a Dockerfile might look something like this:
. . . RUN apt update RUN apt install -y <package1> . . .
This works well on our first run, but it does have some problems. When we craft translate these processes into RUN instructions, we want to reduce the resulting image layer size and make sure that the build process does not reuse an outdated image layer on rebuilds when a significant change has occurred.
Let’s walk through what happens if we add another package to the installation line and then rebuild the image:
. . . RUN apt update RUN apt install -y <package1> <package2> . . .
In this case, Docker will begin evaluating the instructions to see where it can use image layers it already has cached to save time and resources. Since the apt update line hasn’t changed, it assumes that the resulting layer is still acceptable and will not re-run that instruction. Next, it will skip down to the apt install line. Since a new package has been added, it knows that it cannot use the previous layer for that instruction, so it reruns the command. However, since the apt update was not executed this time, the package index available at that time may be outdated. One or both of the packages the system is requesting to download might no longer be available. In this case, Docker’s cached layer has prevented our rebuild from executing successfully.
apt update
apt install
If we have to add a new package to our install process, we need to make sure that the packaging command also reruns the package index update so that the build does not try to install using stale package information. We can achieve this by bundling the two packaging commands into a single RUN instruction. That way, if we change the installed package list, Docker re-run both the update and install commands together. Since the instruction that install the packages also updates the package index, we’ll always install with a fresh package list:
. . . RUN apt update && apt install -y <package1> <package2> . . .
While this ensures that we always use the latest index to install packages, we don’t need that index present on our completed image. Since the index files have served their purpose and are no longer necessary, we can string an additional command on the end of our RUN instruction to clean up the filesystem location where the index files are stored. This will prevent these files from being committed into the image layer when the RUN instruction completes:
. . . RUN apt update && apt install -y <package1> <package2> && rm -r /var/lib/apt/lists/* . . .
At this point, we’ve reduced this image layers to include all of the packages we need but none of the extra files that we aren’t using. The layer will automatically be rebuilt when we change the packages we need, but it will be reused in other cases, speeding up our builds.
There is plenty more to learn about building your own images. With some reading and experimentation, it is usually possible to get a working Dockerfile that can build an image to your specifications. Once you’ve reached that goal, you can begin looking for opportunities to optimize your image to reduce the image size or build time. These are not usually of too much concern when you’re working on your own computer or deploying to a few machines, but they become incredibly important as you begin to use containers within CI/CD pipelines and automatic deployments.