Introduction
Most people who write CloudFormation templates will recognize this. When you made a small error in your Custom Resource Lambda function, the stack will either continue creating it forever or rollback the template and then you will wait forever for the execution of the custom resource to stop. Even though you can see in CloudWatch that the execution in Lambda has stopped, CloudFormation will just wait… The Lambda function will never report back to CloudFormation that the execution failed. And because of this, CloudFormation simply waits for up to an hour until… it time outs. Nothing can be done from the AWS console to speed this up…
There is a solution to this, which is described on this AWS site [1]. You have to search for some parameters in the CloudWatch logs and then you can use those parameters in a curl command. Curl informs CloudFormation that the custom resource is successfully deleted.
When you use the same Lambda function multiple times in a CloudFormation template or when you have a lot of logging, then doing this manually is not a very nice thing to do. But when you don’t do this, then you have to wait for up to an hour before the AWS console allows you to delete the stack again and then ignore these custom resources.
Script
Wouldn’t it be nice to have a script, that will crawl through your CloudWatch logging and get the information for the curl command? Where the script will execute the curl command automatically to inform CloudFormation that the deletion was successful? Well, I wrote such a script, you can find it in my GitHub repository [2].
The script assumes that you start any custom resource Lambda function with a print(event) (without any text in front of it):
def lambda_handler(event, context):
print(event)
Before using my stop script, you should change the variables in the script. These variables are:
export REQUEST_TYPE=”Create”
export RESOURCEGROUP_LIKE=”CreateDeleteEndpointFunction”
In my case, it was a create in the CreateDeleteEndpointFunction. CloudFormation puts “/aws/lambda/” and the name of the CloudFormation template in front of this, and it puts a random character string behind this. An example of a log group name is “/aws/lambda/NetworkVPCPublicWrite-CreateDeleteEndpointFunction-taGOKBMYfYAC”.
When you have multiple different lambda functions that fail, then you can use “lambda” as a search string for log groups as well. This ends the execution of all the custom resources because all log groups of custom resources start with /aws/lambda/ .
The script searches through all log streams and searches for “RequestType”. Within those records it searches for the REQUEST_TYPE ‘Create’, ‘Update’ or ‘Delete’ that you specified in the variable. This is an example of such a record:
{'RequestType': 'Create', 'ServiceToken': 'arn:aws:lambda:us-east-1:615377834974:function:NetworkVPCPublicWrite-CreateDeleteEndpointFunction-taGOKBMYfYAC', 'ResponseURL': 'https://cloudformation-custom-resource-response-useast1.s3.amazonaws.com/arn%3Aaws%3Acloudformation%3Aus-east-1%3A615377834974%3Astack/NetworkVPCPrivateWriteAZb/2badc4e0-fdd2-11eb-8474-124aa02dc2ed%7CExecuteCreateDeleteEndpointFunctionSSM%7C4325a6ab-f286-4ff8-8de8-0539ed74630a?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20210815T140948Z&X-Amz-SignedHeaders=host&X-Amz-Expires=7200&X-Amz-Credential=AKIA6L7Q4OWT7UXIGK6R%2F20210815%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=27f8e6f697b2e20c3eaa58299f19f9e539952c762ba093aeead849c9180e6023', 'StackId': 'arn:aws:cloudformation:us-east-1:615377834974:stack/NetworkVPCPrivateWriteAZb/2badc4e0-fdd2-11eb-8474-124aa02dc2ed', 'RequestId': '4325a6ab-f286-4ff8-8de8-0539ed74630a', 'LogicalResourceId': 'ExecuteCreateDeleteEndpointFunctionSSM', 'ResourceType': 'Custom::ExecuteCreateDeleteEndpointFunctionSSM', 'ResourceProperties': {'ServiceToken': 'arn:aws:lambda:us-east-1:615377834974:function:NetworkVPCPublicWrite-CreateDeleteEndpointFunction-taGOKBMYfYAC', 'VpcId': 'vpc-084b8058b1f485342', 'ServiceName': 'com.amazonaws.us-east-1.ssm', 'SecurityGroupId': 'sg-0fb3dbc3a45b64398', 'PolicyDocument': '"{ \n "Version": "2012-10-17",\n "Statement": [{\n "Effect": "Allow",\n "Action": "*",\n "Resource": "*",\n "Principal": "*"\n }]\n }"\n'}}
From these records, the fields RequestId, ResponseURL, StackId, LogicalResourceId and PhysicalResourceId (if present) are filtered. The script uses them for the curl command as described on the AWS site.
This script saved me hours of waiting. It also saved me dozens of minutes to dig in CloudWatch logs to get the right parameters for the curl command. I hope you like it as well!
Links
1) AWS site: how to solve DELETE_IN_PROGRESS: https://aws.amazon.com/premiumsupport/knowledge-center/cloudformation-lambda-resource-delete/
2) GitHub repository: https://github.com/FrederiqueRetsema/AMIS-Blog-AWS, directory “Stop wait for custom resource in CloudFormation”