In the previous blog, I talked about unit tests of the AWS Shop example . Today, I will continue with a smoke test and a performance test.
When you follow along, you will have seen the smoke test several times: we used a smoke test from the Vagrant VM, we called it “encrypt_and_send.py”. The smoke test sends one message (with two records) to the API Gateway. Via the accept, decrypt and update_db Lambda functions (and two SNS topics), the DynamoDB table AMIS-shops is updated. In the shop-2 example, I added two records in the AMIS-shop database for smoke- and performancetests.
In this case, bringing the Python code to AWS isn’t enough. Up to now, all Lambda functions used libraries that are already in AWS: this is the case for system libraries like os, json, time etc. The boto3 library, which is the SDK for AWS, is also present by default. This isn’t the case, however, for the requests library. We use the requests library for sending a POST message to the REST-API of the API Gateway.
To include the request library, we need to do the following steps :
– start a virtual environment
– install the requests library in the virtual environment
– find out if there are dependent libraries and download these as well
– remove the virtual environment
– zip the library with the Python file
It is important to zip the contents of the directory with the Python file, not the directory itself. When we look at the contents of the smoketest function, we can see the directories with the request library and dependent libraries:
I’m using the AWS API Gateway URL (the “ugly” URL) for both the smoke test and for the performance test. In that way, you can follow along – even if you don’t have your own DNS domain in route53. When you would use the smoke test in a company environment, you would (of course) use the company domain name for this.
The performance test does the same as the smoke test, but then 100 times. It looks very much the same as the 100x.py script on the virtual machine.
After doing the test, it is nice to have some statistics. I created a Lambda function AMIS_perftest_get_stats for that. Let’s start the performance test, wait for about 3-4 minutes and then start the statistics function and look at the CloudWatch output:
You see that all 101 records (one from the smoke test, 100 from the performance test) are in. The first record was sent at 08:45:07 (GMT, so 10:45:07 in the Netherlands). The last record was sent at 08:53:23. The statistics are based on the REPORT records: when you look at the AMIS_shop_accept logging, you will see something like this:
You can see that this is the first record that has been sent: the duration was 931 ms, this is about 1 second. Records that followed are sent are faster, we can see that also in the statistics about the duration: the fastest record was processed in 177 ms, the slowest was the record we just saw: 931 ms. On average, the duration is 217.91 ms.
When you look at the billed duration, than the billed duration is about 4 seconds more than the duration we really used.
Accept is the first Lambda in the row. With our performance test, we didn’t send records in parallel, you can see this also in the statistics: we had 100 records (under_limit_duration) that were faster than 500 ms (limit_duration) and just 1 record that was slower. This means, that AWS had to get the function in memory once, and then could use the same instance for the next 100 records.
The memory that is used, is between 69 and 72 MB, with an average of 71.48 MB. The configured memory (this is the slider in the Lambda function) was 128 MB. This means, that we are safe: the way we use the function will never lead to problems because there is a large gap between the maximum amount of memory that we configured and the amount of memory that is actually used. We would like to be warned when the amount of used memory is higher than 100 MB, the limit is set to 28 MB (= 128 – 100). All of the records use less than 100 MB (over_limit_diff_mem = 101).
When we look at the statistics of the log group for decrypt, we see about the same values:
We see that this function takes a little bit more time than the accept function (which makes sense: the accept function just sends the event information that it gets to the SNS function, it doesn’t do anything with the data). The decrypt function does some checks on the event and then decrypts the data. On average, the duration of the decrypt function is 450.14 ms, where the average of the accept function was 217.91 ms.
You can see, that there were more decrypt functions active on the same moment: you can see that there were six times that the function took more than 500 ms to do its job (over_limit_duration).
The average memory usage is about the same as the memory usage of the accept function (71.76 MB vs 71.48 MB). I would have expected that the decrypt function would have used more memory to decrypt the data.
Let’s look at the statistics for the update_db function:
We see, that 8 functions have run at the same time and that functions used on average 456.55 ms. This is a little bit more than the decrypt function uses. It therefore makes sense that there are more parallel update_db functions than there are parallel decrypt functions.
In the blog about Lambda I told you that when you increase the amount of memory that is available for the Lambda, this will also increase the speed. Now we have a statistics function, we can ask ourselves: how fast will it be when we increase the memory to the maximum?
Let’s try this out: go to the Lambda functions AMIS-shop-accept, AMIS-shop-decrypt and AMIS-shop-update-db, scroll down to Basic settings and press the button Edit . Move the slider Memory (MB) to the maximum of 3008 MB and press “Save”:
After that, start the performance test. My outcomes for the accept function are:
The slowest message is processed in 119 milliseconds (instead of 931 ms), the fastest is processed in just 35 ms (instead of 177 ms). On average, the messages are processed in 51.49 ms (instead of 217 ms). You also see, that the limit of 500 ms to see if a new instance has started doesn’t make sense anymore, I should lower this limit to f.e. 100. When I do that, I see the following numbers:
For the decrypt function the statistics are:
The slowest message is processed in 170 milliseconds (instead of 1201 ms), the fastest is processed in 63 ms (instead of 383 ms). On average, the messages are processed in 88 ms (instead of 450 ms).
For the update_db function the statistics are:
The slowest message is processed in 173 milliseconds (instead of 1177 ms), the fastest is processed in 42 ms (instead of 335 ms). On average, the messages are processed in 69 ms (instead of 456 ms). It seems that less instances for the update_db function have started (5) than for the decrypt function (11). When I looked into this, it was very hard to see what a good limit value for the decrypt and the update_db should be. When the functions are slow, it is very clear where new instances have started, when the functions are very fast, this is less clear. When the differences in duration are smaller, these differences have become irrelevant.
Of course, you will pay a price for that: you pay per request and per GB-second. The number of requests is the same (100 per performance test). You will have much more GB’s (100 x 3.008 instead of 100 x 0.128), but the lead time is lower (for the update_db function: 100 x 88 ms instead of 100 x 450 ms). The costs per GB-second is on the moment of writing $0.0000166667 in Ireland. For my small test, the costs for processing at a higher speed are 100 (records) x 3.008 (MB) x 88 (ms on average) x $0.0000166667 = $0.441174 instead of 100 x 0.128 x 450 x $0.0000166667 = $0.096000. This difference might seem small, but if you have thousands of messages per day instead of the 100 messages in this test, you might decide to lower the number of MBs. You might also consider to increase it only for accepting the message but not for the decrypt and process functions.
You might have noticed that the way the lambda functions log their data is different to the way this is done in the shop-1 solution. Every line in the CloudWatch logs has an indication of the severity level. You can filter quite easily: click in the bar “Filter events” and then type INFO.
You will have the statistics without the DEBUG information.
In normal situations, you will have enough information when you read the INFO messages. When you are debugging, it can be very useful to have the exact responses from the calls to the SDK functions in the DEBUG entries of the log.
It’s nice to see how your functions behave under different conditions. I think it is very wise to have a lambda statistic function to see if the way your application is behaving in the production environment is the same as it is in the test environment.
When you want to play along you can use my git repository for that . You might want to add a sleep in the decrypt function: what will happen with the number of parallel decrypt functions functions when you add a sleep in the decrypt function? What will happen with the number of parallel running update_db functions? (You might guess wrong here, just try it and think what might have happened here!).
– Lambda and IAM: https://technology.amis.nl/2020/04/29/aws-shop-example-lambda/
– API Gateway (1): https://technology.amis.nl/2020/05/09/aws-shop-api-gateway-1/
– API Gateway (2): https://technology.amis.nl/2020/05/13/aws-shop-example-api-gateway-2/
 I learned these steps via this Linux Academy training, https://linuxacademy.com/course/aws-certified-developer-associate-2018/ video “Lambda Function Packages”
 If you need screen prints to help you with this, please look at the previous blog about Lambda, see https://technology.amis.nl/2020/04/29/aws-shop-example-lambda/
 https://github.com/FrederiqueRetsema/AMIS-Blog-AWS , directory shop-2