This class is an introduction to Containers and Docker. The main goal is to explain the purpose of containers, the benefits and how to use it. We'll also take a look into how the AWS ECS Service works and its purpose.
Contents
A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another.
A container is supposed to run a single main process. Once the process is completed/stopped, the container will exit.
Both the virtual machine and the container works with resource isolation, which protects one of the instances(VM or container) to access resources from another instance, even having both running on the same hardware.
Despite both having the resource isolation feature, they work in a different way. While the virtual machine virtualizes the hardware, the container virtualizes the operating system. This makes the container a lot more portable and efficient.
Containers are an abstraction at the app layer that packages code and dependencies together.
Multiple containers can run on the same machine and share the OS kernel with other containers, each running as isolated processes in user space. Containers take up less space than VMs (container are typically tens of MBs in size), can handle more applications and require fewer VMs and Operating systems.
Virtual machines (VMs) are an abstraction of physical hardware turning one server into many servers.
A hypervisor is a process that separates a computer’s operating system and applications from the underlying physical hardware. The hypervisor allows multiple VMs to run on a single machine.
Each VM includes a full copy of an operating system, the application, binaries, and libraries - the full size can take in the scale of GBs.
Before talking about the benefits of containers, it's important to make clear that not all kinds of workloads are a good fit for a container. Because of the container nature, an application with multiple processes and services running on a single machine is not a good fit for a container.
Once you confirm that a specific service is a good fit for a container, the benefits of a container include:
- Containers are normally small compared to VMs, so it's easy to move them around
- Because everything required by the application is inside the container, the execution of the container will be the same anywhere
- Speeds up the development process since the Developer can have multiple containers running on his local computer to simulate a very reliable production environment(in terms of functionality)
Docker is the most common container option in the market and has thousands of public images available to pack your application, but there are other options like the ones listed below.
Docker is a tool to facilitate the creation, deployment and execution of applications by using containers. It's composed by the Docker daemon and the Docker client.
The Docker daemon exposes a RestAPI that is accessed by the Docker client. We use the Docker client to submit instructions to the Docker daemon so it can execute the containers, create new images, delete existing containers, connect to a running container, etc.
The Docker client is the command line tool used to interact with the Docker Deamon. With the client you can do things like creating new docker images, interact with containers(start, stop, delete or execute commands in the container, etc), interact with images(delete, list, etc) and many more.
Below you have some of the most common commands used in the docker client.
- docker run
- Creates and execute a new container
- docker build
- Build a new image from a Dockerfile
- docker exec
- Execute a command on a running container
- docker images
- List the images stored on the local system managed by the Docker daemon
- docker ps
- List the running containers
- docker container [action]
- Execute an action related to containers
- docker image [action]
- Execute an action related to images
- docker logs [container id]
- Fetch the logs of a container
Reference for all Docker client commands and options: https://docs.docker.com/engine/reference/commandline/docker/
A container registry is a place where images can be stored and it can be public or private. Examples of registries:
Registries are used to share an image that was created for a specific purpose. It can be used to store an image with specific pre-requisites to run multiple applications like:
- an image with apache+php+modules to run multiple php applications
- an image with a specific version of jdk to run multiple java applications
- an image with a specific version of mysql to run multiple databases
It can also be used to store the final version of your application(artifact) that will run in production
- an image with apache+php+modules+yourapp
- an image with python binaries and your application to run a machine learning algorithm
- an image with terraform cli that executes any terraform code
A Docker registry is often used as a tool in the middle of the CI/CD process, since the Docker image created during the CI pipeline needs to be stored somewhere so it can be used during the CD pipeline. More related to this in class #6.
A Dockerfile is a set of instructions(like a recipe) to create a new Docker image. These instructions are used to install into the image the application pre-requisites as well as include all the application related files.
A Dockerfile is always based on an existing image and the instructions included in the Dockerfile go on top of that base image. Example of Dockerfile:
FROM golang:1.14rc1-buster
COPY bin/ /
RUN chmod 544 /app/app1.bin && \
chmod 500 /var/logs/*
CMD [ "/app/app1.bin", "run" ]
- FROM
- The base image that will be used to build the new image
- COPY
- Copy files between the local filesystem to the container file system
- ADD
- Similar to the COPY instruction, but it can be used to copy directly from a remote URL(http or https)
- RUN
- Execute a command into the container. The binary that will be executed must reside inside the container
- ENV
- Creates an Environment Variable in the container
- EXPOSE
- Expose a TCP or UDP port to inform Docker that the container will listen on the specified ports
- USER
- Specify which user account will be used to run the commands inside the container when it runs
- CMD
- Set a default command and default parameters which will be executed when Docker is run. Those values can be overwritten during the container execution
- ENTRYPOINT
- Used when the container needs to be run as an executable.
- WORKDIR
- Sets the working directory to run the instructions: RUN, CMD, ENTRYPOINT, COPY and ADD
Reference to all Dockerfile Instructions: https://docs.docker.com/engine/reference/builder/
Simple Docker file:
docker build -t website:01 .
docker run -it -p 8081:80 website:01
An Image is an ordered collection of root filesystem changes and the corresponding execution parameters for use within a container runtime. An image typically contains a union of layered filesystems stacked on top of each other. An image does not have state and it never changes.
Each instruction in a Docker file adds a new layer during the build of the image.
An image is supposed to have everything required (pre-requisites and binaries) to run a specific application. Ideally, we target for small images, to make the container as much portable as possible.
Below we have three examples of a Dockerfile that creates an image with the same purpose, but the difference in how the image is created(which instructions we have on the Dockerfile) can make a huge difference in the final result. Let's build three Docker images with those Dockerfiles.
docker build -t image:01 -f artifacts/Dockerfile.Image1 .
docker build -t image:02 -f artifacts/Dockerfile.Image2 .
docker build -t image:03 -f artifacts/Dockerfile.Image3 .
docker images|grep ^image
docker history image:01
docker history image:02
docker history image:03
The objective of those three images is the same: Have the contents of the httpd-2.4.41.tar.gz
file uncompressed into the /tmp
folder.
Even considering that all three Dockerfiles achieved the objective, there are huge differences in how this objective was reached on each approach. Additionally the final result of each image is really different(final image almost 50% from the biggest to the smallest image). This shows the importance of properly writing a Dockerfile.
On the third approach, which is the most appropriate for our objective, it was used the multi-stage build functionality. You can learn more about this approach here.
- Always use a tag in the image you'll use(FROM instruction).
The tag represents a specific image and is supposed to be immutable. This means that the image with the same tag will always be the same. This is important, so by specifying a versioned tag(not using the tag latest
), you know exactly which base image will be used during the build of your own image.
Keep the same mindset when creating your images. Once an image is created and published with a specific tag, that tag should belong to that artefact only. Any new image published should use a different tag.
- Because of the way images are built (layers), keep the instructions that will change less on top of your Dockerfile
During the build of an image, Docker can re-use layers previously created. This only happens when the lower layers are the same.
Consider two very similar images, where the only difference between it is a RUN
step. Even this step only adds a single file with 13 bytes to the image, it completely affects the result of remaining layers of the image.
Observe that the apt-get update
steps are exactly the same and add the same amount of MB to the image, however, it has a different SHA (79b65ac314b1 on the first image and a70625894939 on the second).
IMAGE CREATED CREATED BY SIZE
bc8d9068fd51 25 seconds ago /bin/sh -c rm -rf /tmp/httpd-2.4.41.tar.gz 0B
510793020658 27 seconds ago /bin/sh -c tar xzvf /tmp/httpd-2.4.41.tar.gz 39.5MB
2ae1c163c167 29 seconds ago /bin/sh -c #(nop) ADD cc9f7bcc45b8069f007672… 9.27MB
fa26d5204572 35 seconds ago /bin/sh -c apt-get -y install curl 14.3MB
79b65ac314b1 About a minute ago /bin/sh -c apt-get update 27.9MB
c3234c0c7372 About a minute ago /bin/sh -c echo "image change" > /tmp/devops… 13B
5ba0d7847404 2 days ago /bin/sh -c #(nop) WORKDIR /tmp 0B
ccc6e87d482b 4 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0B
<missing> 4 weeks ago /bin/sh -c mkdir -p /run/systemd && echo 'do… 7B
<missing> 4 weeks ago /bin/sh -c set -xe && echo '#!/bin/sh' > /… 745B
<missing> 4 weeks ago /bin/sh -c [ -z "$(apt-get indextargets)" ] 987kB
<missing> 4 weeks ago /bin/sh -c #(nop) ADD file:08e718ed0796013f5… 63.2MB
IMAGE CREATED CREATED BY SIZE
95a90787b890 2 days ago /bin/sh -c rm -rf /tmp/httpd-2.4.41.tar.gz 0B
1631588fac44 2 days ago /bin/sh -c tar xzvf /tmp/httpd-2.4.41.tar.gz 39.5MB
7141d3a52fe8 2 days ago /bin/sh -c #(nop) ADD cc9f7bcc45b8069f007672… 9.27MB
06e2b5d8b3e7 2 days ago /bin/sh -c apt-get -y install curl 14.3MB
a70625894939 2 days ago /bin/sh -c apt-get update 27.9MB
5ba0d7847404 2 days ago /bin/sh -c #(nop) WORKDIR /tmp 0B
ccc6e87d482b 4 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0B
<missing> 4 weeks ago /bin/sh -c mkdir -p /run/systemd && echo 'do… 7B
<missing> 4 weeks ago /bin/sh -c set -xe && echo '#!/bin/sh' > /… 745B
<missing> 4 weeks ago /bin/sh -c [ -z "$(apt-get indextargets)" ] 987kB
<missing> 4 weeks ago /bin/sh -c #(nop) ADD file:08e718ed0796013f5… 63.2MB
Considering the behavior exposed above, it's important to keep the steps that will result in an identical state, to be executed before steps that will result in different states. For example, if you're building an image with an application under development and you have the following steps on your image build:
- Copying application files
- Installing application pre-requisites(apache+php) with specific versions
- Installing updates on the image
In that case, you can consider that step 2 will always have the same result. If building the image multiple times a day, step 3 will most likely have the same result as well. On the other hand, step 1 will have a different result, mostly every build because it is constantly changed during the development process.
In that case, ordering the steps as 2-3-1 in your Dockerfile, will make Docker reuse the step 2 and 3 for mostly every build, so the build process will be faster and you won't need to keep downloading and installing the application pre-requites and updates on every build you do.
ECS(Elastic Container Service) is a fully managed containers orchestration service available on AWS. A container orchestration tool is responsible for coordinating and managing all aspects of the lifecycle of containers execution.
As an example, it can be configured to run 2 replicas of a specific image, and whenever something goes wrong with one the replicas, the orchestration tool will notice it and spin up a new container, so it keeps 2 replicas running all the time.
ECS is composed mainly of three components: ECS Cluster, ECS Service and ECS Task Definition.
The ECS Cluster is a group of EC2 instances that will be used to run containers. ECS Clusters are Region-specific, but it can span across multiple AZs in that specific Region, providing a high availability container solution.
The ECS Service is responsible for running and maintaining the requested number of tasks(containers) of a specific image in the cluster.
It's also responsible by associating the containers running with a specific load balancer, so the traffic trying to access the service can be balanced between multiple containers.
The ECS Task definition describes the container execution parameters.
The Task definition contains information about the image used, resources(CPU and memory) that will be made available for that container, AWS IAM Role assumed by the container and volumes to be mounted, so when a new container is created it will always use the same configurations.
- You can also run containers on AWS ECS through Fargate to provide serverless compute for containers. With Fargate, you won't need to manage compute instances to run your containers, as your containers will run in a serverless engine.
- ECS fully integrates with AWS Load Balancer and AutoScaling groups, so you can balance the requests among multiple containers, as well your ECS service and increase the number of replicas of your task definition based on things like CPU and memory, so if your application is receiving a larger number of users, more containers will be available to support that load.
ECR (Elastic Container Registry) is a fully managed Docker Container Registry to store your Docker images. It fully integrated with ECS and it eliminates the need to operate and manage a container repository tool.
All images are stored in a high-available and scalable architecture. It also integrates with IAM, so you can make use of roles and policies to manage the resource-level permissions for each repository.
In order to push an image to ECR, you can follow this guide.
In order to pull an image from ECR, you can follow this guide.
- What is a container and its benefits
- What is a container registry and what it's used for
- How to run a container
- How to create a new container image
- Be able to access a service running from a container on our local machine
- What is ECS and its purpose
- What is ECR and how we pull and push images from/to it
- List of other container tools
- ECS, Deep Dive. https://www.youtube.com/watch?v=qbEPae8YNbs.