In this blog, I will show how you can use the SAM (Serverless Application Model) to get a presigned upload URL to AWS S3 that can be used exactly once [1]. In AWS it is possible to use a presigned URL to upload files, but the URL is valid for a specified duration and can be used multiple times. This doesn’t mean, however, that one-time upload URLs cannot be implemented in AWS.
Issues with previous solutions
There are situations, where the S3 event after uploading the file takes place after multiple seconds. Sometimes even after more than one minute. When you have a bucket without versioning you might get just one event for two or more uploaded files. When the second file is uploaded fast enough and the events don’t fire for every file, then the first file is overwritten.
The combination might also occur. You might get an error because of duplicate files, but you don’t see a file with the upload/ prefix. The second uploaded file is then processed by the instance of the Lambda function that processes the first uploaded file.
S3 with versioning
When you read this, or when you read the previous two blogs, you might think “why are we not using S3 with versioning to solve this problem”? The general idea is very simple: when you upload the first file, then no previous versions are present – so the file can be processed. When the second file is uploaded, a previous version does exist and the version can be ignored. One of the advantages of this solution is, that you will keep the contents of every uploaded file, even when only the first will be accepted and processed. This blog will explore this solution:
I wrote a delete function that deletes the files that are uploaded just once:
The architecture is very similar to the previous blogs with DynamoDB and Memcached [2]. In this blog, I will mostly talk about the differences between the previous versions and the S3 versioning solution. As before, I use the Serverless Application Model (SAM) as an improvement on CloudFormation.
Changes in MoveFirstUploadToAccepted
Unfortunately the version numbers don’t start with 1: they are assigned randomly. This means, that we have to find out if there are multiple versions using the list_object_versions API call. This call returns all versions that are present in the same order as they have been uploaded. When the version number we get in the events data for the Lambda version is the same as the version number of the first uploaded file, we can copy this file to a new file with the accepted/ prefix. When the version numbers are different, we know that the data in the events parameter of the Lambda function refers to a second, third, … uploaded file.
In previous examples, the file is deleted directly after copying when it is the first uploaded file. In this solution this would complicate the algorithm: when we would find the first version of the file then we would also have to check the accepted/ prefix if the file is already copied before. I therefore decided not to delete the file before the timeout has expired, and I created a CloudWatch scheduled event that starts one minute after the timeout ended. I added one minute to the timeout value, because the cron scheduler works in minutes, not in seconds. Because of rounding errors, the file could be deleted before it is processed if I wouldn’t add that minute.
Which solution is the best?
When you can use S3 versioning, this solution is definitely the best: you will store all the different uploads instead of (max) two and you can look at the files with the uploads/ prefix with all the versions to see what is going on. Is there a man-in-the-middle attack and, if so, what data does it send? Or is it your own software on the client side that fails for some reason?
When it is for compliance reasons not possible to use S3 versioning on your bucket, then you can use the DynamoDB solution for situations with a few file uploads and the Memcached solution for situations with many file uploads.
Play along
As usual, you can find back all code in a GitHub repository [1]. You can find the URL of the function that delivers the one time upload URL in the outputs of the CloudFormation stack after you enroll the package.yaml file. Use the API key section of the API gateway (as described in the first blog in this series) to get the enrolled API key. The client directory contains some scripts that can be used to simulate certain situations:
- upload.py: normal upload, will succeed
- upload2x.py: will upload two different file versions for the same S3 filename
- upload3x.py: will upload three different file versions for the same S3 filename
- upload_with_delay.py: will wait a little bit too long (40 seconds) between receiving the upload URL and uploading the file
All these scripts have the same parameters. The first parameter is the URL that delivers the one time presigned URL. The second parameter is the API Key. The last parameter is the file you want to upload. The upload2x.py has two filenames, the upload3x.py has three filenames. For example:
cd client
python upload.py https://2lmht3jedj.execute-api.eu-west-1.amazonaws.com/Prod/getpresignedurl r2aVK769WH8IQse0cs5A17hbNKkqUEVK1tJ4hKFr myfile1.txt
This series…
I showed solutions in DynamoDB and Memcached in combination with an S3 bucket without versioning in the previous blogs.
Links
[1] Github repository: https://github.com/FrederiqueRetsema/AMIS-Blog-AWS , directory OneTimeUploadUrlMemcached
[2] Previous blogs “Using one-time upload URLs in AWS using DynamoDB”: https://technology.amis.nl/aws/using-one-time-upload-urls-in-aws-using-dynamodb/,
“Using one-time upload URLs in AWS with Memcached”: https://technology.amis.nl/aws/using-one-time-upload-urls-in-aws-with-memcached/