First steps with Docker Checkpoint - to create and restore snapshots of running containers image 10

First steps with Docker Checkpoint – to create and restore snapshots of running containers

Docker Containers can be stopped and started again. Changes made to the file system in a running container will survive this deliberate stop and start cycle. Data in memory and running processes obviously do not. A container that crashes cannot just be restarted and will have a file system in an undetermined state if it can be restarted. When you start a container after it was stopped, it will go through its full startup routine. If heavy duty processes needs to be started – such as a database server process – this startup time can be substantial, as in many seconds or dozens of seconds.

Linux has a mechanism called CRIU or Checkpoint/Restore In Userspace. Using this tool, you can freeze a running application (or part of it) and checkpoint it as a collection of files on disk. You can then use the files to restore the application and run it exactly as it was during the time of the freeze. See https://criu.org/Main_Page for details. Docker CE has (experimental) support for CRIU. This means that using straightforward docker commands we can take a snapshot of a running container (docker checkpoint create <container name> <checkpointname>). At a later moment, we can start this snapshot as the same container (docker start –checkpoint <checkpointname> <container name> ) or as a different container.

The container that is started from a checkpoint is in the same state – memory and processes – as the container was when the checkpoint was created. Additionally, the startup time of the container from the snapshot is very short (subsecond); for containers with fairly long startup times – this rapid startup can be a huge boon.

In this article, I will tell about my initial steps with CRIU and Docker. I got it to work. I did run into an issue with recent versions of Docker CE (17.12 and 18.x) so I resorted back to 17.04 of Docker CE. I also ran into an issue with an older version of CRIU, so I built the currently latest version of CRIU (3.8.1) instead of the one shipped in the Ubuntu Xenial 64 distribution (2.6).

I will demonstrate how I start a container that clones a GitHub repository and starts a simple REST API as a Node application; this takes 10 or more seconds. This application counts the number of GET requests it handles (by keeping some memory state). After handling a number of requests, I create a checkpoint for this container. Next, I make a few more requests, all the while watching the counter increase. Then I stop the container and start a fresh container from the checkpoint. The container is running lightningly fast – within 700ms – so it clearly leverages the container state at the time of creating the snapshot. It continues counting requests at the point were the snapshot was created, apparently inheriting its memory state. Just as expected and desired.

Note: a checkpoint does not capture changes in the file system made in a container. Only the memory state is part of the snapshot.

Note 2: Kubernetes does not yet provide support for checkpoints. That means that a pod cannot start a container from a checkpoint.

In a future article I will describe a use case for these snapshots – in automated test scenarios and complex data sets.

The steps I went through (on my Windows 10 laptop using Vagrant 2.0.3 and VirtualBox 5.2.8):

  • use Vagrant to a create an Ubuntu 16.04 LTS (Xenial) Virtual Box VM with Docker CE 18.x
  • downgrade Docker from 18.x to 17.04
  • configure Docker for experimental options
  • install CRIU package
  • try out simple scenario with Docker checkpoint
  • build CRIU latest version
  • try out somewhat more complex scenario with Docker checkpoint (that failed with the older CRIU version)

 

Create Ubuntu 16.04 LTS (Xenial) Virtual Box VM with Docker CE 18.x

My Windows 10 laptop already has Vagrant 2.0.3 and Virtual Box 5.2.8. Using the following vagrantfile, I create the VM that is my Docker host for this experiment:

 

After creating (and starting) the VM with

vagrant up

I connect into the VM with

vagrant ssh

ending up at the command prompt, ready for action.

And in just to make sure we are pretty much up to date, I run

sudo apt-get upgrade

image

Downgrade Docker CE to Release 17.04

At the time of writing there is an issue with recent Docker version (at least 17.09 and higher – see https://github.com/moby/moby/issues/35691) and for that reason I downgrade to version 17.04 (as described here: https://forums.docker.com/t/how-to-downgrade-docker-to-a-specific-version/29523/4 ).

First remove the version of Docker installed by the vagrant provider:

sudo apt-get autoremove -y docker-ce \
&& sudo apt-get purge docker-ce -y \
&& sudo rm -rf /etc/docker/ \
&& sudo rm -f /etc/systemd/system/multi-user.target.wants/docker.service \
&& sudo rm -rf /var/lib/docker \
&&  sudo systemctl daemon-reload

then install the desired version:

sudo apt-cache policy docker-ce

sudo apt-get install -y docker-ce=17.04.0~ce-0~ubuntu-xenial

 

Configure Docker for experimental options

Support for checkpoints leveraging CRIU is an experimental feature in Docker. In order to make use of it, the experimental options have to be enabled. This is done (as described in https://stackoverflow.com/questions/44346322/how-to-run-docker-with-experimental-functions-on-ubuntu-16-04)

 

sudo nano /etc/docker/daemon.json

add

{
"experimental": true
}

Press CTRL+X, select Y and press Enter to save the new file.

restart the docker service:

sudo service docker restart

Check with

docker version

if experimental is indeed enabled.

 

Install CRIU package

The simple approach with CRIU – how it should work – is by simply installing the CRIU package:

sudo apt-get install criu

(see for example in https://yipee.io/2017/06/saving-and-restoring-container-state-with-criu/)

This installation results for me in version 2.6 of the CRIU package. For some actions that proves sufficient, and for others it turns out to be not enough.

image

 

Try out simple scenario with Docker checkpoint on CRIU

At this point we have Docker 17.04, Ubuntu 16.04 with CRIU 2.6. And that combination can give us a first feel for what the Docker Checkpoint mechanism entails.

Run a simple container that writes a counter value to the console once every second (and then increases the counter)

docker run --security-opt=seccomp:unconfined --name cr -d busybox /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'

check on the values:

docker logs cr

create a checkpoint for the container:

docker checkpoint create  --leave-running=true cr checkpoint0

image

leave the container running for a while and check the logs again

docker logs cr

SNAGHTML19a5da6

now stop the container:

docker stop cr

and restart/recreate the container from the checkpoint:

docker start --checkpoint checkpoint0 cr

Check the logs:

docker logs cr

You will find that the log is resumed at the value (19) where the checkpoint was created:

SNAGHTML197d66e

 

Build CRIU latest version

When I tried a more complex scenario (see next section) I ran into this issue. I could work around that issue by building the latest version of CRIU on my Ubuntu Docker Host. Here are the steps I went through to accomplish that – following these instuctions: https://criu.org/Installation.

First, remove the currently installed CRIU package:

sudo apt-get autoremove -y criu \
&& sudo apt-get purge criu -y \

Then, prepare the build environment:

sudo apt-get install build-essential \
&& sudo apt-get install gcc   \
&& sudo apt-get install libprotobuf-dev libprotobuf-c0-dev protobuf-c-compiler protobuf-compiler python-protobuf \
&& sudo apt-get install pkg-config python-ipaddr iproute2 libcap-dev  libnl-3-dev libnet-dev --no-install-recommends

Next, clone the GitHub repository for CRIU:

git clone https://github.com/checkpoint-restore/criu

Navigate into to the criu directory that contains the code base

cd criu

and build the criu package:

make

When make is done, I can run CRIU :

sudo ./criu/criu check

to see if the installation is successful. The final message printed should be: Looks Good (despite perhaps one or more warnings).

Use

sudo ./criu/criu –V

to learn about the version of CRIU that is currently installed.

Note: the CRIU instructions describe the following steps to install criu system wide. This does not seem to be needed in order for Docker to leverage CRIU from the docker checkpoint commands.

sudo apt-get install asciidoc  xmlto
sudo make install
criu check

Now we are ready to take on the more complex scenario that failed before with an issue in the older CRIU version.

A More complex scenario with Docker Checkpoint

This scenario failed with the older CRIU version – probably because of this issue. I could work around that issue by building the latest version of CRIU on my Ubuntu Docker Host.

In this case, I run a container based on a Docker Container image for running any Node application that is downloaded from a GitHub Repository. The Node application that the container will download and run handles simple HTTP GET requests: it counts requests and returns the value of the counter as the response to the request. This container image and this application were introduced in an earlier article: https://technology.amis.nl/2017/05/21/running-node-js-applications-from-github-in-generic-docker-container/

Here you see the command to run the container – to be be called reqctr2:

docker run --name reqctr2 -e "GIT_URL=https://github.com/lucasjellema/microservices-choreography-kubernetes-workshop-june2017" -e "APP_PORT=8080" -p 8005:8080 -e "APP_HOME=part1"  -e "APP_STARTUP=requestCounter.js"   lucasjellema/node-app-runner

image

It takes about 15 seconds for the application to start up and handle requests.

Once the container is running, requests can be sent from outside the VM – from a browser running on my laptop for example – to be handled  by the container, at http://192.168.188.106:8005/.

After a number or requests, the counter is at 21:

image

At this point, I create a checkpoint for the container:

docker checkpoint create  --leave-running=true reqctr2 checkpoint1

image

I now make a few additional requests in the browser, bringing the counter to a higher value:

imageAt this point, I stop the container – and subsequently start it again from the checkpoint:

docker stop reqctr2
docker start --checkpoint checkpoint1 reqctr2

image

It takes less than a second for the container to continue running.

When I make a new request, I do not get 1 as a value (as would be the result from a fresh container) nor is it 43 (the result I would get if the previous container would still be running). Instead, I get

imageThis is the next value starting at the state of the container that was captured in the snapshot. Note: because I make the GET request from the browser and the browser also tries to retrieve the favicon, the counter is increased by two for every single time I press refresh in the browser.

Note: I can get a list of all checkpoints that have been created for a container. Clearly, I should put some more effort in a naming convention for those checkpoints:

docker checkpoint ls reqctr2

image

The flow I went through in this scenario can be visualized like this:

image

The starting point: Windows laptop with Vagrant and Virtual Box. A VM has been created by Vagrant with Docker inside. The correct version of Docker and of the CRIU package have been set up.

Then these steps are run through:

  1. Start Docker container based on an image with Node JS runtime
  2. Clone GitHub Repository containing a Node JS application
  3. Run the Node JS application – ready for HTTP Requests
  4. Handle HTTP Requests from a browser on the Windows Host machine
  5. Create a Docker Checkpoint for the container – a snapshot of the container state
  6. The checkpoint is saved on the Docker Host – ready for later use
  7. Start a container from the checkpoint. This container starts instantaneously, no GitHub clone and application startup are required; it resumes from the state at the time of creating the checkpoint
  8. The container handles HTTP requests – just like its checkpointed predecessor

 

Resources

Sources are in this GitHub repo: https://github.com/lucasjellema/docker-checkpoint-first-steps

Article on CRIU: http://www.admin-magazine.com/Archive/2014/22/Save-and-Restore-Linux-Processes-with-CRIU

Also: on CRIU and Docker: https://yipee.io/2017/06/saving-and-restoring-container-state-with-criu/.

Docs on Checkpoint and Restore in Docker: https://github.com/docker/cli/blob/master/experimental/checkpoint-restore.md

 

Home of CRIU:   and page on Docker support: https://criu.org/Docker; install CRIU package on Ubuntu: https://criu.org/Packages#Ubuntu

Install and Build CRIU Sources: https://criu.org/Installation

 

Docs on Vagrant’s Docker providingprovisioning: https://www.vagrantup.com/docs/provisioning/docker.html

Article on downgrading Docker : https://forums.docker.com/t/how-to-downgrade-docker-to-a-specific-version/29523/4

Configure Docker for experimental options: https://stackoverflow.com/questions/44346322/how-to-run-docker-with-experimental-functions-on-ubuntu-16-04

Issue with Docker and Checkpoints (at least in 17.09-18.03): https://github.com/moby/moby/issues/35691

3 Comments

  1. shtu August 7, 2018
    • Lucas Jellema August 19, 2018
      • Lord Aries August 20, 2018