Lazy Loading with AWS Fargate priscilla du preez dOnEFhQ7ojs unsplash scaled

Lazy Loading with AWS Fargate

Introduction

Some time ago my colleague informed me that AWS announced a new feature [1]: starting containers faster with AWS Fargate. The idea is to start the container before all the layers of the container are downloaded from the container registry (ECR). The idea sounds promising, but does it work? And, if so, will the users of the containers notice? In this blog I will try to find out, using the blog post that AWS wrote for this [2].

How do they do it?

AWS supports now Seekable OCI (SOCI), that adds metadata to the ECR image. The advantage is that the container itself doesn’t need not be changed, the metadata is stored in a different location. When the container starts, AWS Fargate loads only the layers that it needs to start the container. When the container is started, the remaining layers are downloaded when they are needed. The container can start up to 50% faster, which sounds great!

How do we do it?

There are two ways of getting the metadata into ECR. One is to add metadata for every image that is added to ECR, by starting a Lambda function when the appropriate EventBridge event is triggered. The disadvantage is that metadata is also added for images that are not suitable for Lazy Loading. The alternative is to download and install the software on a Virtual Machine and choose which images will be used for Lazy Loading. I followed this last path, using an Amazon Linux 2023 VM that is based on ARM.

I wrote a CloudFormation template [3] to show what is going on. CloudFormation will download and install the nerdctl tool, together with the buildkit. We cannot use Docker because SOCI needs a container tool that based on containerd. The download and the installation of buildkit and the SOCI tool is less straightforward than installing docker because this software is not part of a distribution (yet?). I had to download the software via curl, and then untar the software. For buildkit I added a systemd service file to run buildkit in the background.

I’m following the same steps as the AWS blog post, but I added two steps. The first extra step is to build the container myself. In the test phase I don’t just look at the startup time of the container, but I also look how long it takes to use the software in the container.

Containers

I created test containers for Alpine, Debian, Ubuntu and AlmaLinux with Apache HTTP installed on it. To make it attractive for SOCI to add metadata to the layer, I added two very big files on these containers: one of 100 Mb and one of 200 Mb of random characters.

Tests

During the tests I looked at the way the Lazy Loading is implemented: are standard layers of the operating system Lazy Loaded? Are my installation files of Apache HTTP lazy loaded? You can find the results in this table:

OSDefault layers in OSOS layers lazy loadedHTTP lazy loadedBig files layer lazy loaded
Alpine10 of 10 of 11 of 1
Debian11 of 11 of 11 of 1
Ubuntu11 of 11 of 11 of 1
AlmaLinux11 of 11 of 11 of 1
Lazy loaded image layers

I measured both the startup speed and the time it takes to download the big files. In this list you see the minimum, average and maximum time that the containers take to start. I also added the time it takes for the curl command to download the 200Mb file. I followed the blog in doing each test five times per operating system:

OSStartup time without SOCI (min/avg/max)Startup time with SOCI (min/avg/max)Time to curl the 200 Mb file without SOCI (min/avg/max)Time to curl the 200 Mb file with SOCI (min/avg/max)
Alpine19.207 / 21.099 / 24.98612.740 / 15.362 / 21.9460.38 / 0.39 / 0.411.40 / 1.50 / 1.56
Debian19.719 / 21.475 / 27.81613.139 / 13.344 / 14.0470.38 / 0.39 / 0.391.29 / 1.31 / 1.34
Ubuntu18.574 / 20.394 / 23.60514.328 / 14.870 / 15.7870.39 / 0.39 / 0.401.34 / 1.44 / 1.57
AlmaLinux18.774 / 23.155 / 36.88212.301 / 13.415 / 16.2860.39 / 0.39 / 0.391.18 / 1.32 / 1.42
Alpine with WordPress16.468 / 17.945 / 19.47513.426 / 14.049 / 14.994
Start time and curl time

Faster starting the container comes with some costs: running the container can become significantly slower than before. In all cases the second time that the file is downloaded, the download is as fast as without using SOCI. I didn’t add the big file layer to the container with WordPress.

I found the results somehow strange: I would expect that the Alpine SOCI downloads would be faster than the SOCI downloads of the other distributions. The OS layers of Alpine are loaded from the start. I expected that the download from Alpine would therefore be faster than the dynamically loaded OS files from the other distributions, because less files are dynamically downloaded. That’s obviously not how it works.

WordPress

The next question is if users will notice this difference. A file of 200Mb is a rather big file, so differences might be smaller when the files are smaller. On the other hand: when more files are dynamically downloaded and all these files are requested per file than the overhead of using Lazy Loading might increase.

To see how users might experience this, I added WordPress to a new Alpine Dockerfile. I also started MySQL in a container running on the test instance. Running the database on the instance saves about 12 minutes of deployment in RDS, for this test this works fine.

I created a WordPress database and when you deploy the stack you can use that data to see the effects of Lazy Loading on the website. The first user might see the effect a little bit, but when I saw the numbers above I feared that the delays would be more significant. End-users would probably not notice anything when we would go over to SOCI containers with all our images without telling anything to anyone. It is strange however that the differences between Lazy Loading after the files have been downloaded and not-Lazy Loading at all are still present in the WordPress environment. The difference was not there when I downloaded just one big file. The difference is not very big: just about 0.1 seconds.

In the WordPress test I’m first going to a web page with some text and two images. I’m deterening the download time of both the text and the two images. Then I go to the second page with one image. The results are in the table below.


Curl of page without SOCI (min/avg/max)

Curl of page with SOCI (min/avg/max)

Curl of image without SOCI (min/avg/max)

Curl of image with SOCI (min/avg/max)
Alpine + WordPress, 1st page0.41 / 0.44 / 0.460.75 / 0.81 / 0.850.00 / 0.02 / 0.040.03 / 0.03 / 0.03
Alpine + WordPress, 2nd page0.42 / 0.45 / 0.470.46 / 0.51 / 0.540.00 / 0.01 / 0.040.00 / 0.00 / 0.00
Curl time of pages and images from a WordPress site

I initially thought that loading the images would take more time than the text in the blog post. When I looked into this, the images were smaller than the php and html text on the page. When the page loads then the container has to make a connection to the WordPress database as well. The database is on the test server, so loading a page generates extra network traffic between the container and the test server.

Load order

I added a small test in the WordPress container. When a non-Lazy Loaded layer defines a file earlier than the Lazy Loaded layer that is defined later, will my end-users then see the oldest version of that file? I was happy to see that this wasn’t the case. When you try to load the /lazy.html page, you will always see “lazy”. When this wouldn’t be the case, I wouldn’t dare to use SOCI because the results of a Lazy Loaded container might then be different than the results of a non-Lazy Loaded container.

Costs

The costs of SOCI depend on the size of the layers that are lazy loaded and the number of files that are present in those layers. There are no extra running costs for using SOCI in an image.

Conclusion

When I read the AWS article, I hoped that the layers that were Lazy Loaded would be downloaded directly after starting the container. This is unfortunately not the case. Lazy Loading can have a considerable effect on the speed of the container for the first user. In a test with WordPress this effect was hardly noticable for end-users. This means that you have to be aware of this effect, but that it makes sense just to test it with your own applications and see if you notice any differences.

The positive effect of SOCI on autoscaled systems behind a Load Balancer will be low: the Load Balancer takes a lot of time to check if a container is healthy and the few seconds that one might save on starting an extra container can then be ignored.

Links

[1] Announcement: https://aws.amazon.com/about-aws/whats-new/2023/07/aws-fargate-container-startup-seekable-oci/

[2] Blog: https://aws.amazon.com/blogs/aws/aws-fargate-enables-faster-container-startup-using-seekable-oci/

[3] You can find the template in my github repository: https://github.com/FrederiqueRetsema/Blogs-2023 (directory lazyloading).

Photo by Priscilla Du Preez on Unsplash

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.