In this blog, I will show how you can use the SAM (Serverless Application Model) to get a presigned upload URL to AWS S3 that can be used exactly once [1]. In AWS it is possible to use a presigned URL to upload files where the URL is valid for a specified duration. Within this time, it can be used multiple times. This doesn’t mean, however, that one-time upload URLs cannot be implemented in AWS.
DynamoDB solution
I solved this puzzle by first creating the Lambda function “GetPresignedURL”, which generates the URL. Currently, the URL is valid for (by default) 30 seconds. A random filename is created, this filename also contains the date and the time. This makes troubleshooting easier for the operators that have to work with the application later. The filename also contains the uploads/ prefix. The full filename is added to a DynamoDB table which contains valid filenames. When this is done, the URL and the fields that are necessary to upload the file are generated and then sent back to the client.
The client has now a URL that it can use to upload the file. The permissions in S3 are the same as the IAM permissions of the Lambda function.
When the file is uploaded, an S3 event is triggered. This event calls a Lambda function “MoveFirstUploadToAcceptedPrefix”. This function first deletes the filename from the DynamoDB table. When this is successful, then the file is copied to the accepted/ directory. Another S3 “ObjectCreated” event will trigger the “ProcessAcceptedFile” Lambda function. S3 knows the difference between files that are uploaded by clients and files that are copied by the Lambda function by looking at the prefix of the files. The “ProcessAcceptedFile Lambda function doesn’t contain much: it just writes the bucket name and the full filename to the CloudWatch logs.
Deleting an item in DynamoDB
The delete_item function call in DynamoDB will delete a record. When the record doesn’t exists the function will not raise an error message. Within our application, we need to know if the record existed before the deletion or not. When this was the first uploaded file then the filename exists as an item in DynamoDB and we can delete the record and copy the file. When the same Lambda function is called a second time for the second upload, the record doesn’t exist in DynamoDB anymore and then we should just ignore this second upload. We can use a ConditionExpression for that:
dynamodb = boto3.client('dynamodb')
response = dynamodb.delete_item(
TableName = dynamodb_table,
Key = {'FullFilename': {
'S': full_filename
},
},
ConditionExpression = 'attribute_exists(FullFilename)'
)
I added some code to delete the uploaded file only for the first uploaded version. This leaves the last uploaded duplicate file in the uploads/ directory. When you are searching for differences in content between the first uploaded file and the last upload, then the first one has the accepted/ prefix and the last one has the uploads/ prefix.
Monitoring
Though it is possible to look in the S3 uploads/ folder for files that remain there, it is better to have a metric that warns you if someone tries to upload several files under the same name. The Lambda function “MoveFirstUploadToAcceptedPrefix” raises an error when the filename cannot be deleted from the DynamoDB table. The metric “Errors” from this Lambda function then triggers a CloudWatch alarm. This alarm sends data to an SNS topic, which sends an SMS to inform operators that this is happening. The operator can search for ERROR in all log streams to find which file generated the error and can look from there.
It might also be useful to look at the lead time of the different steps in this process. For this reason, I included the X-Ray layer [2]. The image doesn’t contain the S3 actions from end-users: from the X-Ray image it seems that the client directly calls the “MoveFirstUploadToAcceptedPrefix” Lambda function. Unfortunately, this makes it also impossible to determine the total lead time between the upload of the file and the final processing. Before I took a screenshot of this X-ray image, I started several upload scripts. The upload.py script uploaded just one file, the files upload2x.py and upload3x.py scripts gave the (expected) errors, this resulted in the (partly) yellow bars around some circles in the X-Ray dashboard.
Security!
We now have secured the access to the S3 bucket, by using one-time presigned URLs. But I didn’t talk about limiting the access to the API that gives us the one-time presigned URL. I used an API Key for that.
One of the disadvantages of SAM is that when you create sub resources based on a main resource, that it is impossible to refer to these resources in the rest of the template. As an example: when you create the API Gateway as part of creating a Lambda function then it is impossible to refer to this API Gateway in the creation of the API key. The same is true when you want to create the API Gateway and the keys in one go (as it is done in this example) and then print the value of the key as output of the CloudFormation template. Fortunately, it is not too difficult to find the API Keys after the CloudFormation template is deployed. Use the following commands. If you don’t use profiles in your environment, then just omit the last parameter: use
aws apigateway get-api-keys --profile amis
to get the key,
and with the key, get the value via the following command:
aws apigateway get-api-key --api-key 73bazc2g18 --include-value --profile amis
You can now use the upload script in the client directory. The URL is in the outputs of the CloudFormation template that you used to install the environment.
python upload.py https://2lmht3jedj.execute-api.eu-west-1.amazonaws.com/Prod/getpresignedurl szeFFEKZV84viT9WsggEo47sdI8CLxgM97siRd9F myfile1.txt
The API Key will also help with throttling the connection and it also enforces a maximum number of connections per day.
Security?
One thing that kept me bugging was the remark that the client that uses the URL has the same permissions as the Lambda function that was used to generate the URL. Does this also mean that, when we use the information from the URL, we will be able to put a file with any name in the uploads/ directory of the S3 folder? And will we be able to connect to the S3 environment to insert new files in there by the AWS API? Let’s try this out!
First, change the Lambda function to allow for links of 30 minutes instead of 30 seconds. You can do this by changing the parameter “TimeoutInSeconds” from 30 seconds to 1800 seconds. Then, use the upload script to upload a file. A result of what we get back from the API Gateway might be:
[…]
Content: b'{"url": {"url": "https://onetimepresigneduploadurldynamodb-sourcebucket-d4o0jwmqrpbw.s3.amazonaws.com/", "fields": {"key": "uploads/UploadFilename-20210120_142416-5biYhU", "AWSAccessKeyId": "ASIAQTBTMULUGEHRPLDA", "x-amz-security-token": "IQo3b32pZ2luX2VjELf//////////wEaCWV1LXdlc3QtMSJIMEYCIQDDie3Y2bv3...goF1CPF", "policy": "eyJl...J9XX0=", "signature": "qNL3...x2U="}}}'
Status code: 204
Content: b''
[...]
Open a sandbox environment: use for example the new CloudShell feature in AWS. Then use the aws configure command to install the AWS AccessKeyId and use the security token as secret access key.
When you try to upload a file to the S3 bucket using the AWS API, then you get an error message that the key doesn’t exist (even when you use the same filename as the signed URL uses):
In the image, I tried the following commands. When you follow along, then fill in the values that you get as output from the upload.py script in your own environment:
aws configure (with the AWSAccessKeyId and the x-amz-security-token from the result from calling the API Gateway)
echo "File 1" >> file1.txt
aws s3 cp ./file1.txt s3://onetimepresigneduploadurldynamodb-sourcebucket-iujf5kt6tm81/uploads/UploadFilename-20210120_1421416-5biYhU
Fortunately, the result is an error: “An error occurred (InvalidAccessKeyId) when calling the PutObject operation: The AWS Access Key Id you provided does not exist in our records”.
One last try: let’s change the upload script a little bit and add the text ‘-test’ to the upload url:
When you now try to use the upload script, you will see an error message:
I’m glad that this doesn’t work. I have more confidence in the working of signed upload URLs now I did these tests…
Play along
As usual, you can find all code in a GitHub repository [1]. You can use the CloudFormation GUI and upload the package.yaml file in your own account. Acknowledge the three checkboxes for creating IAM resources, IAM resources with custom names and that AWS CloudFormation might require the CAPABILITY_AUTO_EXPAND capability. The URL to get the one time signed URL can be found in the outputs of the CloudFormation template. You can find some scripts in the client subdirectory that can be used to simulate certain situations:
- upload.py: normal upload, will succeed (as long as you don’t change it)
- upload2x.py: will upload two different files to the same filename
- upload3x.py: will upload three different files to the same filename
- upload_with_delay.py: will wait a little bit too long (40 seconds) between getting the URL and uploading the file
All these scripts have the same parameters. The first parameter is the URL to get the signed URL. The second parameter is the API Key (see before how to get this). The last parameter is the file (or are the files) you want to upload. For example:
python upload.py https://2lmht3jedj.execute-api.eu-west-1.amazonaws.com/Prod/getpresignedurl szeFFEKZV84viT9WsggEo47sdI8CLxgM97siRd9F myfile1.txt
You might also want to try to upload bigger files (up to 2 GB), and see that the upload will not fail when the upload starts within 30 seconds. This is even true when when the upload itself takes minutes.
Next time…
DynamoDB is a good solution when you don’t have that many files per day. When you want to upload a lot of files per day, then the costs of read capacity units (RCU’s) and write capacity units (WCU’s) might become too high. We don’t need to store the data on a disk anyway: the filename only needs to be stored for a small number of seconds. It might be cheaper to have a virtual machine running with an in-memory database. Next time, I’ll rewrite this solution to use an ElastiCache database.
This series…
This is the first blog in a series of three blogs. I solved this issue by using:
- DynamoDB (this blog)
- Memcached (next blog)
- S3 versioning (last blog)
Links
[1] Github repository: https://github.com/FrederiqueRetsema/AMIS-Blog-AWS , directory “One time presigned upload url DynamoDB”
[2] See also: https://technology.amis.nl/aws/aws-lambda-shared-libraries-and-sam/ and https://technology.amis.nl/aws/aws-shop-example-x-ray/