Stop DELETE_IN_PROGRESS when custom resources fail in AWS CloudFormation

Frederique Retsema

Script to stop the execution of CloudFormation when you made a small error in a custom resource (Lambda function).

Introduction

Most people who write CloudFormation templates will recognize this. When you made a small error in your Custom Resource Lambda function, the stack will either continue creating it forever or rollback the template and then you will wait forever for the execution of the custom resource to stop. Even though you can see in CloudWatch that the execution in Lambda has stopped, CloudFormation will just wait… The Lambda function will never report back to CloudFormation that the execution failed. And because of this, CloudFormation simply waits for up to an hour until… it time outs. Nothing can be done from the AWS console to speed this up…

There is a solution to this, which is described on this AWS site [1]. You have to search for some parameters in the CloudWatch logs and then you can use those parameters in a curl command. Curl informs CloudFormation that the custom resource is successfully deleted.

When you use the same Lambda function multiple times in a CloudFormation template or when you have a lot of logging, then doing this manually is not a very nice thing to do. But when you don’t do this, then you have to wait for up to an hour before the AWS console allows you to delete the stack again and then ignore these custom resources.

Script

Wouldn’t it be nice to have a script, that will crawl through your CloudWatch logging and get the information for the curl command? Where the script will execute the curl command automatically to inform CloudFormation that the deletion was successful? Well, I wrote such a script, you can find it in my GitHub repository [2].

The script assumes that you start any custom resource Lambda function with a print(event) (without any text in front of it):

          def lambda_handler(event, context):
                print(event)

Before using my stop script, you should change the variables in the script. These variables are:

export REQUEST_TYPE=”Create”
export RESOURCEGROUP_LIKE=”CreateDeleteEndpointFunction” 

In my case, it was a create in the CreateDeleteEndpointFunction. CloudFormation puts “/aws/lambda/” and the name of the CloudFormation template in front of this, and it puts a random character string behind this. An example of a log group name is “/aws/lambda/NetworkVPCPublicWrite-CreateDeleteEndpointFunction-taGOKBMYfYAC”.

When you have multiple different lambda functions that fail, then you can use “lambda” as a search string for log groups as well. This ends the execution of all the custom resources because all log groups of custom resources start with /aws/lambda/ .

The script searches through all log streams and searches for “RequestType”. Within those records it searches for the REQUEST_TYPE ‘Create’, ‘Update’ or ‘Delete’ that you specified in the variable. This is an example of such a record:

{'RequestType': 'Create', 'ServiceToken': 'arn:aws:lambda:us-east-1:615377834974:function:NetworkVPCPublicWrite-CreateDeleteEndpointFunction-taGOKBMYfYAC', 'ResponseURL': 'https://cloudformation-custom-resource-response-useast1.s3.amazonaws.com/arn%3Aaws%3Acloudformation%3Aus-east-1%3A615377834974%3Astack/NetworkVPCPrivateWriteAZb/2badc4e0-fdd2-11eb-8474-124aa02dc2ed%7CExecuteCreateDeleteEndpointFunctionSSM%7C4325a6ab-f286-4ff8-8de8-0539ed74630a?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20210815T140948Z&X-Amz-SignedHeaders=host&X-Amz-Expires=7200&X-Amz-Credential=AKIA6L7Q4OWT7UXIGK6R%2F20210815%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=27f8e6f697b2e20c3eaa58299f19f9e539952c762ba093aeead849c9180e6023', 'StackId': 'arn:aws:cloudformation:us-east-1:615377834974:stack/NetworkVPCPrivateWriteAZb/2badc4e0-fdd2-11eb-8474-124aa02dc2ed', 'RequestId': '4325a6ab-f286-4ff8-8de8-0539ed74630a', 'LogicalResourceId': 'ExecuteCreateDeleteEndpointFunctionSSM', 'ResourceType': 'Custom::ExecuteCreateDeleteEndpointFunctionSSM', 'ResourceProperties': {'ServiceToken': 'arn:aws:lambda:us-east-1:615377834974:function:NetworkVPCPublicWrite-CreateDeleteEndpointFunction-taGOKBMYfYAC', 'VpcId': 'vpc-084b8058b1f485342', 'ServiceName': 'com.amazonaws.us-east-1.ssm', 'SecurityGroupId': 'sg-0fb3dbc3a45b64398', 'PolicyDocument': '"{ \n    "Version": "2012-10-17",\n    "Statement": [{\n        "Effect": "Allow",\n        "Action": "*",\n        "Resource": "*",\n        "Principal": "*"\n    }]\n }"\n'}}

From these records, the fields RequestId, ResponseURL, StackId, LogicalResourceId and PhysicalResourceId (if present) are filtered. The script uses them for the curl command as described on the AWS site.

This script saved me hours of waiting. It also saved me dozens of minutes to dig in CloudWatch logs to get the right parameters for the curl command. I hope you like it as well!

Links

1) AWS site: how to solve DELETE_IN_PROGRESS: https://aws.amazon.com/premiumsupport/knowledge-center/cloudformation-lambda-resource-delete/

2) GitHub repository: https://github.com/FrederiqueRetsema/AMIS-Blog-AWS, directory “Stop wait for custom resource in CloudFormation”

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Next Post

Continuous Generation and Publication of Docstring Documentation on Azure – using Sphinx, Pydoc, Storage Account and App Service

In this blog I will explain how to generate static HTML pages from your projects Pydoc (docstring) comments with Sphinx. Then we are going to host it in an Azure Web App so that everyone in your team is able to access it. Because we use a Storage Mount, when […]
%d bloggers like this: