AWS Shop example: SNS duplicate messages 01 Shops with shops message ids 1

AWS Shop example: SNS duplicate messages

Introduction

Our shop example [1] is now in production, wohoo!!! When you are using our example program in production, you might see that some sales are updated multiple times in the database. This will not happen very often, but you want your sales to be processed once, not twice.

In this blog, I will show you how we might solve this. By doing so, I will show you two nice features of DynamoDB tables: the Time To Live (TTL) option and the condition expressions. But let’s first see where the duplicate messages are coming from.

When you look at the logs, you can see that when a message is sent to SNS, SNS might occasionally send it twice to a subscriber of the topic. AWS says in their FAQ [2]:

Although most of the time each message will be delivered to your application exactly once, the distributed nature of Amazon SNS and transient network conditions could result in occasional, duplicate messages at the subscriber end. Developers should design their applications such that processing a message more than once does not create any errors or inconsistencies.

Let’s do that. In our original shop example, we have two SNS topics: To_decrypt and To_update_db:

AWS Shop example: SNS duplicate messages Architecture without DNS

When we send a message ID from the sender and copy that ID over and over again to the update_db function, then the update_db function can check in the database if the message has been processed before.

In my shop example, the idea that all the cashing machines will send their data on the same day as the products have been sold is a fair assumption. The update_db function will first check if the sales are sold on the current day and, if so, it will check the message-ids table to see if the message ID is in that table. Only if this isn’t the case, the AMIS-shops table will be updated and the message ID is added to the AMIS-shop-message-ids table. When these assumptions are not correct, then the message is ignored.

AWS Shop example: SNS duplicate messages 01 Shops with shops message ids

Content of the message id

We could put anything in the message id. I choose to put in the time (in microseconds). I could have done this in the traditional way: year+month+day-hour+minute+second+microsecond. The disadvantage of this, is that when we would have large amounts of messages, the update_db table will always insert records at the same partition.

The advice from AWS is therefore, to use a wide range of values, to make writing to the table faster. I therefore switched the date and the time and also changed the order of the attributes in the time: the format is now microsecond+second+minutes+hour-year+month+day. 22 May 2020 07:02:22.235348 will look like 235348220207-20200522.

Time to live

In the AMIS-shops-message-ids table, you will see the following record:

AWS Shop example: SNS duplicate messages 02 Record in AMIS shops message ids

You see an extra attribute, the time_to_live (TTL) attribute. It is the epoch time (number of seconds since 01-01-1970) on which the record will become obsolete. It might be destroyed on that moment, it might be destroyed later. AWS will not charge you for the deletion of these records. AWS will delete the records within 48 hours (!) of expiration [3]. The attribute can have any name.

In my shop example, I put the TTL on 23:59 of the day that the records are received. When I tested this and I looked in this table the next morning, all the records were gone.

Condition expressions

When the update_db process adds the record to the AMIS-shops-message-ids table, this should be done in an atomic write. The reason for that, is that when messages are sent twice, they may be sent to different update_db instances, which try to write the same data on the same moment to the same tables. By using an atomic write, we will prevent both instances to look in the table, that both instances see that the record with the message-id is not there, both drawing the conclusion that they can go ahead with writing the same data. The put-item command will not mind about this: when it writes the same data twice, the data that was written first will be overwritten by the second put-item command. You will, by default, not get an error because of this.

There is a solution for this, which is called “condition expressions” [4]. You can use them to check some assertions before you will do the action. It can be used for atomic updates and deletes as well. The code for the put-item with the condition expression will become:

response = dynamodb.put_item (
    TableName = "AMIS-shops-message-ids",
    Item = {
        'shop_id'      : { "S" : shop_id },
        'message_id'   : { "S" : message_id },
        'time_to_live' : { "N" : time_to_live }
    },
    ConditionExpression = "attribute_not_exists(message_id)"
)

When the record was already in the table when this put-item command is run, then the insert is aborted and the program will get an ConditionalCheckFailedException.

Conclusion

TTL is a mechanism to delete records when they are not of use anymore. There are advantages and disadvantages on using it: the advantage is that it is available for free, you don’t pay for the deletion of the records. Using TTL also saves you the trouble to write and schedule a cleaning script.

The disadvantage is that the time is in epoch, so it is hard to see when a record will be deleted. In our shop example, this isn’t a big problem: you will know that the epoch time is 23:59 of the same day that the records are inserted for all the records in the table. In other situations, using your own field with (year+month+day-)hour+minute(+second) and writing a cleaning function that you can schedule in CloudWatch might be more appropriate.

Play along

You can play along with this blog. I changed both the scripts in the VM as well as the tests in AWS. Use the shop-3 directory from the repository [5]. As before, be sure to destroy all objects from previous versions before starting the installation of the shop-3 objects. Try it out: start a performance test, look in the new AMIS-shops-message-ids table, wait a night and see that next day all records are gone. Pure magic, I love it!!

Some end-of-year remarks about the AWS Shop Example

When I write this paragraph, it is about 6 months later than I first published this blog. This paragraph is written in December 2020. When I looked at some re:Invent presentations, I thought back on my AWS Shop example. Was sending a message id from the client the best thing to do? Look at the answer in this blog [6].

Links

[1] Previous blogs:

– Introduction: https://technology.amis.nl/2020/04/26/example-application-in-aws-using-lambda/

– Lambda and IAM: https://technology.amis.nl/2020/04/29/aws-shop-example-lambda/

– SNS: https://technology.amis.nl/2020/05/02/aws-shop-about-the-aws-simple-notification-service-sns/

– DynamoDB: https://technology.amis.nl/2020/05/05/aws-shop-dynamodb-the-aws-nosql-database/

– API Gateway (1): https://technology.amis.nl/2020/05/09/aws-shop-api-gateway-1/

– API Gateway (2): https://technology.amis.nl/2020/05/13/aws-shop-example-api-gateway-2/

– Unit tests: https://technology.amis.nl/2020/05/21/aws-shop-example-unit-tests/

– Smoke- and performance tests: https://technology.amis.nl/2020/05/28/aws-shop-example-smoke-and-performance-tests/

– Step functions: https://technology.amis.nl/2020/05/31/aws-shop-example-step-functions/

[2] AWS FAQ about SNS (search for “How many times will a subscriber receive each message”): https://aws.amazon.com/sns/faqs/

[3] https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/TTL.html

[4] https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Expressions.ConditionExpressions.html

[5] https://github.com/FrederiqueRetsema/AMIS-Blog-AWS , shop-3 directory

[6] https://technology.amis.nl/2020/12/22/some-end-of-year-remarks-about-the-aws-shop-example/