Docker Containers can be stopped and started again. Changes made to the file system in a running container will survive this deliberate stop and start cycle. Data in memory and running processes obviously do not. A container that crashes cannot just be restarted and will have a file system in an undetermined state if it can be restarted. When you start a container after it was stopped, it will go through its full startup routine. If heavy duty processes needs to be started – such as a database server process – this startup time can be substantial, as in many seconds or dozens of seconds.
Linux has a mechanism called CRIU or Checkpoint/Restore In Userspace. Using this tool, you can freeze a running application (or part of it) and checkpoint it as a collection of files on disk. You can then use the files to restore the application and run it exactly as it was during the time of the freeze. See https://criu.org/Main_Page for details. Docker CE has (experimental) support for CRIU. This means that using straightforward docker commands we can take a snapshot of a running container (docker checkpoint create <container name> <checkpointname>). At a later moment, we can start this snapshot as the same container (docker start –checkpoint <checkpointname> <container name> ) or as a different container.
The container that is started from a checkpoint is in the same state – memory and processes – as the container was when the checkpoint was created. Additionally, the startup time of the container from the snapshot is very short (subsecond); for containers with fairly long startup times – this rapid startup can be a huge boon.
In this article, I will tell about my initial steps with CRIU and Docker. I got it to work. I did run into an issue with recent versions of Docker CE (17.12 and 18.x) so I resorted back to 17.04 of Docker CE. I also ran into an issue with an older version of CRIU, so I built the currently latest version of CRIU (3.8.1) instead of the one shipped in the Ubuntu Xenial 64 distribution (2.6).
I will demonstrate how I start a container that clones a GitHub repository and starts a simple REST API as a Node application; this takes 10 or more seconds. This application counts the number of GET requests it handles (by keeping some memory state). After handling a number of requests, I create a checkpoint for this container. Next, I make a few more requests, all the while watching the counter increase. Then I stop the container and start a fresh container from the checkpoint. The container is running lightningly fast – within 700ms – so it clearly leverages the container state at the time of creating the snapshot. It continues counting requests at the point were the snapshot was created, apparently inheriting its memory state. Just as expected and desired.
Note: a checkpoint does not capture changes in the file system made in a container. Only the memory state is part of the snapshot.
Note 2: Kubernetes does not yet provide support for checkpoints. That means that a pod cannot start a container from a checkpoint.
In a future article I will describe a use case for these snapshots – in automated test scenarios and complex data sets.
The steps I went through (on my Windows 10 laptop using Vagrant 2.0.3 and VirtualBox 5.2.8):
- use Vagrant to a create an Ubuntu 16.04 LTS (Xenial) Virtual Box VM with Docker CE 18.x
- downgrade Docker from 18.x to 17.04
- configure Docker for experimental options
- install CRIU package
- try out simple scenario with Docker checkpoint
- build CRIU latest version
- try out somewhat more complex scenario with Docker checkpoint (that failed with the older CRIU version)
Create Ubuntu 16.04 LTS (Xenial) Virtual Box VM with Docker CE 18.x
My Windows 10 laptop already has Vagrant 2.0.3 and Virtual Box 5.2.8. Using the following vagrantfile, I create the VM that is my Docker host for this experiment:
After creating (and starting) the VM with
vagrant up
I connect into the VM with
vagrant ssh
ending up at the command prompt, ready for action.
And in just to make sure we are pretty much up to date, I run
sudo apt-get upgrade
Downgrade Docker CE to Release 17.04
At the time of writing there is an issue with recent Docker version (at least 17.09 and higher – see https://github.com/moby/moby/issues/35691) and for that reason I downgrade to version 17.04 (as described here: https://forums.docker.com/t/how-to-downgrade-docker-to-a-specific-version/29523/4 ).
First remove the version of Docker installed by the vagrant provider:
sudo apt-get autoremove -y docker-ce \ && sudo apt-get purge docker-ce -y \ && sudo rm -rf /etc/docker/ \ && sudo rm -f /etc/systemd/system/multi-user.target.wants/docker.service \ && sudo rm -rf /var/lib/docker \ && sudo systemctl daemon-reload
then install the desired version:
sudo apt-cache policy docker-ce sudo apt-get install -y docker-ce=17.04.0~ce-0~ubuntu-xenial
Configure Docker for experimental options
Support for checkpoints leveraging CRIU is an experimental feature in Docker. In order to make use of it, the experimental options have to be enabled. This is done (as described in https://stackoverflow.com/questions/44346322/how-to-run-docker-with-experimental-functions-on-ubuntu-16-04)
sudo nano /etc/docker/daemon.json
add
{ "experimental": true }
Press CTRL+X, select Y and press Enter to save the new file.
restart the docker service:
sudo service docker restart
Check with
docker version
if experimental is indeed enabled.
Install CRIU package
The simple approach with CRIU – how it should work – is by simply installing the CRIU package:
sudo apt-get install criu
(see for example in https://yipee.io/2017/06/saving-and-restoring-container-state-with-criu/)
This installation results for me in version 2.6 of the CRIU package. For some actions that proves sufficient, and for others it turns out to be not enough.
Try out simple scenario with Docker checkpoint on CRIU
At this point we have Docker 17.04, Ubuntu 16.04 with CRIU 2.6. And that combination can give us a first feel for what the Docker Checkpoint mechanism entails.
Run a simple container that writes a counter value to the console once every second (and then increases the counter)
docker run --security-opt=seccomp:unconfined --name cr -d busybox /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'
check on the values:
docker logs cr
create a checkpoint for the container:
docker checkpoint create --leave-running=true cr checkpoint0
leave the container running for a while and check the logs again
docker logs cr
now stop the container:
docker stop cr
and restart/recreate the container from the checkpoint:
docker start --checkpoint checkpoint0 cr
Check the logs:
docker logs cr
You will find that the log is resumed at the value (19) where the checkpoint was created:
Build CRIU latest version
When I tried a more complex scenario (see next section) I ran into this issue. I could work around that issue by building the latest version of CRIU on my Ubuntu Docker Host. Here are the steps I went through to accomplish that – following these instuctions: https://criu.org/Installation.
First, remove the currently installed CRIU package:
sudo apt-get autoremove -y criu \ && sudo apt-get purge criu -y \
Then, prepare the build environment:
sudo apt-get install build-essential \ && sudo apt-get install gcc \ && sudo apt-get install libprotobuf-dev libprotobuf-c0-dev protobuf-c-compiler protobuf-compiler python-protobuf \ && sudo apt-get install pkg-config python-ipaddr iproute2 libcap-dev libnl-3-dev libnet-dev --no-install-recommends
Next, clone the GitHub repository for CRIU:
git clone https://github.com/checkpoint-restore/criu
Navigate into to the criu directory that contains the code base
cd criu
and build the criu package:
make
When make is done, I can run CRIU :
sudo ./criu/criu check
to see if the installation is successful. The final message printed should be: Looks Good (despite perhaps one or more warnings).
Use
sudo ./criu/criu –V
to learn about the version of CRIU that is currently installed.
Note: the CRIU instructions describe the following steps to install criu system wide. This does not seem to be needed in order for Docker to leverage CRIU from the docker checkpoint commands.
sudo apt-get install asciidoc xmlto sudo make install criu check
Now we are ready to take on the more complex scenario that failed before with an issue in the older CRIU version.
A More complex scenario with Docker Checkpoint
This scenario failed with the older CRIU version – probably because of this issue. I could work around that issue by building the latest version of CRIU on my Ubuntu Docker Host.
In this case, I run a container based on a Docker Container image for running any Node application that is downloaded from a GitHub Repository. The Node application that the container will download and run handles simple HTTP GET requests: it counts requests and returns the value of the counter as the response to the request. This container image and this application were introduced in an earlier article: https://technology.amis.nl/2017/05/21/running-node-js-applications-from-github-in-generic-docker-container/
Here you see the command to run the container – to be be called reqctr2:
docker run --name reqctr2 -e "GIT_URL=https://github.com/lucasjellema/microservices-choreography-kubernetes-workshop-june2017" -e "APP_PORT=8080" -p 8005:8080 -e "APP_HOME=part1" -e "APP_STARTUP=requestCounter.js" lucasjellema/node-app-runner
It takes about 15 seconds for the application to start up and handle requests.
Once the container is running, requests can be sent from outside the VM – from a browser running on my laptop for example – to be handled by the container, at http://192.168.188.106:8005/.
After a number or requests, the counter is at 21:
At this point, I create a checkpoint for the container:
docker checkpoint create --leave-running=true reqctr2 checkpoint1
I now make a few additional requests in the browser, bringing the counter to a higher value:
At this point, I stop the container – and subsequently start it again from the checkpoint:
docker stop reqctr2 docker start --checkpoint checkpoint1 reqctr2
It takes less than a second for the container to continue running.
When I make a new request, I do not get 1 as a value (as would be the result from a fresh container) nor is it 43 (the result I would get if the previous container would still be running). Instead, I get
This is the next value starting at the state of the container that was captured in the snapshot. Note: because I make the GET request from the browser and the browser also tries to retrieve the favicon, the counter is increased by two for every single time I press refresh in the browser.
Note: I can get a list of all checkpoints that have been created for a container. Clearly, I should put some more effort in a naming convention for those checkpoints:
docker checkpoint ls reqctr2
The flow I went through in this scenario can be visualized like this:
The starting point: Windows laptop with Vagrant and Virtual Box. A VM has been created by Vagrant with Docker inside. The correct version of Docker and of the CRIU package have been set up.
Then these steps are run through:
- Start Docker container based on an image with Node JS runtime
- Clone GitHub Repository containing a Node JS application
- Run the Node JS application – ready for HTTP Requests
- Handle HTTP Requests from a browser on the Windows Host machine
- Create a Docker Checkpoint for the container – a snapshot of the container state
- The checkpoint is saved on the Docker Host – ready for later use
- Start a container from the checkpoint. This container starts instantaneously, no GitHub clone and application startup are required; it resumes from the state at the time of creating the checkpoint
- The container handles HTTP requests – just like its checkpointed predecessor
Resources
Sources are in this GitHub repo: https://github.com/lucasjellema/docker-checkpoint-first-steps
Article on CRIU: http://www.admin-magazine.com/Archive/2014/22/Save-and-Restore-Linux-Processes-with-CRIU
Also: on CRIU and Docker: https://yipee.io/2017/06/saving-and-restoring-container-state-with-criu/.
Docs on Checkpoint and Restore in Docker: https://github.com/docker/cli/blob/master/experimental/checkpoint-restore.md
Home of CRIU: and page on Docker support: https://criu.org/Docker; install CRIU package on Ubuntu: https://criu.org/Packages#Ubuntu
Install and Build CRIU Sources: https://criu.org/Installation
Docs on Vagrant’s Docker providingprovisioning: https://www.vagrantup.com/docs/provisioning/docker.html
Article on downgrading Docker : https://forums.docker.com/t/how-to-downgrade-docker-to-a-specific-version/29523/4
Configure Docker for experimental options: https://stackoverflow.com/questions/44346322/how-to-run-docker-with-experimental-functions-on-ubuntu-16-04
Issue with Docker and Checkpoints (at least in 17.09-18.03): https://github.com/moby/moby/issues/35691
you write “using this vagrant file” but you dosent write there any vagrantfile…
Hi Shtu, You will find the vagrant file in the GitHub repo: https://github.com/lucasjellema/docker-checkpoint-first-steps/blob/master/vagrantfile
Kind regards, Lucas
Hi Lucas,
I tried the following console command but I was not able to get the version of docker you mentioned. I’m running Mint 19 mate. I don’t know if the bug you mentioned has been fixed(seems not) but can you tell me the commands or instructions needed to get the version you recommend? [below is my console output]
root@mint:/home/aries# sudo apt-cache policy docker-ce
docker-ce:
Installed: (none)
Candidate: (none)
Version table:
root@mint:/home/aries# sudo apt-get install -y docker-ce=17.04.0~ce-0~ubuntu-xenial
Reading package lists… Done
Building dependency tree
Reading state information… Done
E: Version ‘17.04.0~ce-0~ubuntu-xenial’ for ‘docker-ce’ was not found