Spotlight
AWS Lambda Functions: Return Response and Continue Executing
A how-to guide using the Node.js Lambda runtime.
Fri, 06 Jan 2017
Its more formal and slightly less catchy name is Cloudwatch Events with a Scheduled Event Source and a Lambda Target… but we think “Lambda Cron” just rolls off the tongue a bit better.
Cloudwatch Events is a service that lets you automate actions from a variety of events inside your AWS environment and trigger a few different actions, among them an arbitrary Lambda function. One of the “event sources” is simply a rate expression or a schedule expressed as cron syntax. So put those two things together, and you get a Lambda function invoked on a schedule… a.k.a. Lambda Cron.
We ran a four month test of Lambda Cron reliability and have some interesting data. But first a little background.
This feature was released in the Fall of 2015 with a 5 minute minimum resolution and then quietly updated several months ago to a 1 minute resolution. At Trek10 we have found that, especially with the 1-minute resolution, “Lambda Cron” is a critical building block of many Serverless architectures. Some uses:
It’s a really compelling idea… very reliable cron without having to mess with crontab or keep a practically idle server sitting around just to run some periodic process. And it is much easier than trying to architect some highly available cron solution. It is also an easy way to get started with Lambda & Serverless, to migrate some background processes in your system that are easier to decouple from your legacy application.
At Trek10, we’re obsessed with building highly reliable systems and were wondering how Lambda Cron stacks up. So we put together a small experiment to gather some data about the consistency and reliability of the service.
To test reliability, we set up a scheduled Lambda Cron to run every minute in five different AWS regions and log the results to a DynamoDB table, and let it run for over four months. So we have almost 1 million executions logged.
We’re logging two different data points:
0/1 * * * ? *
, this value is the 0th second, on the minute, every time.We got some interesting results…
Out of almost 200,000 executions in each of five regions, we only had anywhere from 2 to 15 intervals where we didn’t log an execution. And we can’t say for certain that Cloudwatch Events failed to trigger… it may have been a Lambda function or Dynamo error. So it’s safe to say that Lambda Cron has at least 99.99% of reliability and perhaps as much as 99.999% or even, at least within a four month window, possibly 100%. Pretty solid!
… Actually, the time it runs can vary quite a bit
Though it is a bit buried in the docs, AWS actually states this very clearly:
Due to the distributed nature of the CloudWatch Events and the target services, the delay between the time the scheduled rule is triggered and the time the target service honors the execution of the target resource might be several seconds. Your scheduled rule is triggered within that minute, but not on the precise 0th second.
That said, AWS is being a bit optimistic when they say “several seconds”. Our data shows a different story. Below are the stats on almost one million executions: the difference between the “event time” (when the execution should have triggered) and the actual system time our function logged, in seconds.
Percentile | ||||||||
---|---|---|---|---|---|---|---|---|
Region | 1st | 25th | 50th | 75th | 95th | 99th | 99.9th | 99.99th |
Virginia us-east-1 | 39 | 40 | 40 | 40 | 41 | 43 | 585 | 2537 |
Oregon us-west-2 | 29 | 29 | 30 | 30 | 31 | 31 | 60 | 852 |
Ireland eu-west-1 | 11 | 12 | 12 | 12 | 13 | 14 | 23 | 1963 |
Germany eu-central-1 | 35 | 36 | 36 | 36 | 37 | 37 | 38 | 45 |
Tokyo ap-northeast-1 | 1 | 2 | 2 | 2 | 3 | 3 | 5 | 44 |
With almost 200,000 executions per region in over 4 months, the 99.99th percentile will happen about 20 times, or roughly once per week.
A few pretty interesting observations from this data:
So the bottom line is, Lambda Cron is a great system that you can rely on to give you very reliable cron execution with incredibly low effort. Just don’t rely on it to execute on the 0th second. And just like any good system design, especially any good design of a distributed system on AWS, it is critical to remember that Lambda Cron is not perfectly consistent. For that one-in-a-thousand or one-in-ten-thousand case, you should expect Lambda Cron to have major lag and build your system to respond gracefully to those failures.
A how-to guide using the Node.js Lambda runtime.