Spotlight
AWS Lambda Functions: Return Response and Continue Executing
A how-to guide using the Node.js Lambda runtime.
Depending on the application, Lambda costs could be zero (there is a free tier for AWS Lambda) or tens of thousands of dollars per month; it is necessary for individuals working with AWS Lambda to be aware of its pricing structure so that efficient systems can be designed that do not result in unnecessary costs for whichever organization is paying the AWS bill. Generally speaking, cost optimization can be rather uninteresting—moreover, the serverless nature of Lambda implies to some people that there is no need to worry about costs when using the service. However, the truth of the matter is that for AWS Lambda there are a few [mildly] interesting cost-related considerations that should not be ignored.
In this article we’ll share a few practical tips for optimizing your Lambda costs. But before that, it is necessary to fully understand how Lambda is priced/billed. The billing scheme for Lambda, as seen here, has nine variables:
There are only two options for the Architecture on which Lambda functions may run: x86 or Arm. Arm is the cheaper option, with “up to 34% better price performance compared to functions running on x86 processors” -- note that Arm support for Lambda was released in Fall 2021 so it is likely that the majority of existing Lambda functions have not yet been migrated to the Arm platform. In the vast majority of cases there is no downside to using Arm, however, I would expect x86 to continue to be chosen for new projects simply due to it being perceived as a “safe” choice as it has a very long history of working well [for AWS Lambda in particular].
This component of Lambda pricing is best understood as “Duration by Memory” or “Memory by Duration.” That is, one is not charged for either Duration or Memory in isolation; Lambda billing involves a combination of both Duration and Memory. For example, as seen on the official AWS Lambda pricing page, currently one might expect a Lambda configured with 1 GB of memory running for 60 seconds to cost 1/10th of a cent.
Take note that, as with most AWS services, unit costs decrease as total usage increases. So for Lambda, “Memory by Duration” unit costs [for x86 platform1] will drop by up to 20% once total usage hits a total of 15 Billion GB-seconds within a month2. This discount only applies to any usage past whatever the given pricing tier is. So any usage over 15 Billion GB-seconds is billed at the lower rate, whereas that first 15 Billion is billed at a higher rate. This would mean that running a Lambda configured with 1 GB of memory for 60 seconds would cost 8/100ths of a cent rather than 1/10th of a cent once 15 Billion GB-seconds of total usage has been reached for a given account/org. Note that AWS Organizations offers consolidated billing for several AWS accounts that effectively combines Lambda usage from all accounts within the organization. Therefore, this billing method will assist in lowering costs as it will be easier for the 15 Billion mark to be reached when AWS usage is pooled together from several accounts rather than each account having a separate bill.
A single “request” is defined as every start of a Lambda execution. In Amazon CloudWatch Logs for a Lambda function, there is usually a log statement indicating a start for each execution—each one of these “starts” counts as a single request. Currently, this is a flat rate of 20 cents per million requests.
Data transfer is a rather complicated billing item. There are several different types of data transfer costs:
OUT to Internet
OUT to a different AWS region
OUT to the same region
OUT to the same AZ
data transfer IN
Internet OUT is the most expensive traffic type, whereas data transfer into AWS or transfer between the same AZ is free. See https://aws.amazon.com/ec2/pricing/on-demand/ for a full breakdown of costs. Note that by default Lambda does not run inside a VPC, so same-AZ traffic is not possible unless the Lambda has explicitly been assigned to a single subnet.
Provisioned concurrency essentially keeps a given number of Lambdas “warm” and ready to be used immediately, which avoids the cold-start penalty that would otherwise be incurred while a lambda is preparing itself for execution. What is interesting about this feature from a cost perspective is that the “Memory x Duration” cost is actually 27% cheaper for provisioned Lambdas when compared to the cheapest pricing tier for non-provisioned Lambdas.
The cost for each Lambda “kept warm” by provisioned concurrency configurations is significant: for a 1 GB x86 Lambda, it will cost roughly $10 per month for only a single unit of provisioned concurrency. One might think that the cheaper “Memory x Duration” costs would compensate for this added charge, but that is not the case. Considering that the base price of an x86 Lambda with 1 GB of memory running for a second is $0.0000133334, the savings per second for a provisioned Lambda would be $0.00000360001. Divide 10 dollars by the savings per second and it can be determined that it takes 2777770 seconds of usage to compensate for the concurrency costs, which is about 32 days. This means that even when running a Lambda every second of every day, it is still cheaper to not have any provisioned concurrency.
Lambdas receive 512 MB of free ephemeral storage; AWS Lambda now allows for each serverless function to be optionally configured with up to 10 GB of ephemeral storage. As of writing, the pricing model for this feature is very simple: any additional storage for Lambda is billed at $0.0000000309 for every GB-second.
Note that this feature has its own “requests” pricing as well as memory-duration pricing. Namely, the requests pricing is currently $0.60 per 1 million requests and the memory-duration pricing is $0.00005001 per GB-second. While I have not used this feature, some others at Trek10 have used it and one particular story is as follows:
“I use Lambda@Edge to stop my website from being iframed. This keeps my customers from falling for fishing and hijacking scams, which would cost me in damage control if they did. The extra security headers also improve SEO ranking.” - Mike Hanney
Mike Hanney’s edge function achieves this via adding several headers such as “X-Frame-Options” to the request/response headers. Be aware that Mike developed this solution prior to the unveiling of CloudFront Functions, which would achieve the same desired result [of manipulating HTTP headers] at 1/6th of the cost of an edge function3. The general recommendation is to use CloudFront Functions rather than Lambda@Edge whenever possible, as CloudFront Functions are vastly cheaper. The caveat is that CloudFront Functions are heavily restricted and therefore do not have basic features such as network access or filesystem access.
Since Feb 2020, Lambda has been one of the compute services available for usage within a Compute Savings Plan. These savings plans involve committing (i.e. agreeing to pay for a particular amount of usage for 1 or 3 years) to a certain amount of Amazon EC2 / AWS Fargate / AWS Lambda usage in exchange for a discounted price on that usage. For example, if you have several EC2 instances running, and the cost per hour is consistently $10, and the discount rate for those instances is 20%, then your costs will drop to $8 per hour assuming a $10/hour savings plan was purchased.
Take into consideration that savings plans entail a commitment to paying for a certain amount of usage every hour of every day for 1 or 3 years. This means that a $10/hour plan becomes nearly $90,000 in costs per year. See the image below for a visual depiction of what this looks like in the AWS Console.
The previous example I mentioned involving EC2 and a fixed usage per hour is completely ridiculous as it is common for usage to be variable throughout the day as AWS Auto Scaling groups scale up and down; similarly, it is typical for Lambda usage to be highly variable throughout the day. This means that the hourly usage might have a lot of variance; any hour-long periods in which aggregate Lambda usage drops below the savings plan hourly commitment will result in wasted spend. Ernesto Marquez at Concurrency Labs has made a brilliant chart to describe this concept:
As seen in Marquez’ graph from above, aggregate usage below the Compute Savings Plan commitment will result in wasted spend, as the account/org has agreed to pay for usage regardless of whether or not Lambda hourly spend always hits that target.
Ideally, when purchasing a Compute Savings Plan, someone familiar with Finance should be involved, as finding the true optimal hourly commitment is not a simple calculation. For example, the increased savings offered by paying for the savings plan either upfront, partially upfront, or on an ongoing basis is not something technical staff should evaluate alone. It could be more beneficial from a business perspective to deploy capital into other business needs rather than allocating current funds toward future compute spend. This savings calculation is complicated even further by the fact that the savings plan will apply to EC2 first, then Fargate second, then Lambda last, because Lambda has the lowest discount rate. Therefore, while Compute Savings Plans do apply to Lambda, the decision to purchase the plan must also take into account EC2 and Fargate usage. In short, the recommendation here becomes to fully explore all of the relevant data prior to making a decision:
EC2 usage variability
Fargate usage variability
Lambda usage variability
Anticipated future growth in overall compute usage
Financial capability or desire to obtain higher discount rates via paying for compute upfront rather than on an ongoing basis
https://github.com/alexcasalboni/aws-lambda-power-tuning offers a fantastic tool to “power tune” your Lambda memory configuration. This tuning process involves invoking Lambda(s) numerous times, while adjusting memory configs periodically, in order to generate data on the relationship between memory configs and time needed for the function to complete. Because Lambda costs are a product of both duration and memory allocation, it can often be the case that a Lambda function with 512 MB will cost less per invocation than a 128 MB Lambda because the higher memory settings will allow for the function to complete much more quickly.
When using the power tuning tool, it is important to consider what the application’s performance requirements are. Generally speaking, more memory will entail faster performance for the function, as the “memory” configuration in AWS Lambda increases available CPU, network, and memory resources for the function. If there is some requirement for the function to finish within 500ms, then of course an appropriate memory configuration should be chosen that achieves that requirement even if it is not the most efficient from a cost perspective.
I believe in the vast majority of circumstances there is no reason to use x86 platform for Lambda. See this post for some very minor caveats. I have personally been involved in a migration from x86 lambdas to Arm architecture. This migration was highly successful despite it involving two binaries that were included in the deployment package of the Lambdas. We simply compiled those two binaries for Arm rather than x86 as part of our build process and everything worked perfectly. This was a moderately large project with at least a dozen separate functions, which demonstrates that the Arm platform is a viable option for “real” projects.
In the words of Michael Hart, known for his open source work such as alpine-node and LambCI, “the init duration [for AWS Lambda] isn’t included in the billed duration at all. The init stage is free [for the first 10 seconds4].” I have observed this behavior recently, so I can confirm that this quirk still exists. Should this behavior be exploited to get nearly free Lambda usage? Probably not, as Hart has indicated that the init code will not have access to the handler’s “event” object, which in my opinion destroys one of the core reasons to use serverless functions: the ability to easily build event-driven architectures without any headaches or frustrating complications. Being forced to manually obtain inputs during the init phase of execution is enough of a drawback to severely hinder the usefulness of this trick. However, if there are any portions of your application that would comfortably fit into this init duration, such as initializing connections / instrumentation / etc, then cost savings can easily be obtained via simply moving those steps outside of the function handler.
As mentioned earlier in the “Lambda Pricing” section of this post, Provisioned Concurrency will never reduce Lambda costs and should therefore only be seen as a tool to increase performance.
For mature applications with meaningful data available on the volume of production traffic, it is advisable to consider purchasing a Compute Savings Plan. Even if you are not comfortable with truly finding the optimal commitment to purchase yourself, the process of exploring this feature will help to highlight the approximate cost savings that could be obtained. With the help of someone more experienced or inclined to evaluate financial concerns, Lambda costs can in theory be reduced by up to 17%.
While there are nine different dimensions to Lambda pricing, in truth, there are not many ways to optimize Lambda costs. By far, the most beneficial approach is to use the “power tuner” as it is relatively easy to use and will offer obvious indicators as to which memory settings are optimal from a cost perspective. After this has been done, consider using Arm architecture and/or purchasing a Compute Savings Plan. Both of these two options will involve a considerable amount of discussion and planning, as there is no guarantee that the application will seamlessly migrate to an Arm platform or that a rational Compute Savings Plan can be purchased without an extensive review of past, present, and expected future usage.
A how-to guide using the Node.js Lambda runtime.