Serverless
Replacing Amazon S3 Events with Amazon S3 Data Events
How to synthesize an (almost) identical payload using Amazon EventBridge rules.
Tue, 14 Feb 2017
In a quickly changing technology landscape of immutable Docker deployments{:target=”blank”} and Lambda-based Serverless architectures{:target=”blank”}, IPsec VPN tunnels are far from sexy… but they certainly are critical.
At Trek10, we use the IPsec VPN functionality of Cohesive Networks’ VNS3{:target=”blank”} controller to connect our customers’ AWS networks to everything from 3rd party payment providers to financial market feeds to the classic connection back to the corporate network. For all of our customers, any downtime for an IPsec tunnel results in significant business impact. This is why we built a Lambda function in Python that leverages the VNS3 API to check all IPsec tunnels for any outages and then posts a custom Datadog metric to notify our 24/7 CloudOps{:target=”blank”} team of any IPsec tunnel issues. In this blog post, we’ll will provide you with all of the steps and code to implement this monitor.
AWS Lambda{:target=“_blank”} is a serverless compute service released by AWS which executes code based on events or time triggers. For this use case, we leverage Lambda to execute a function each minute to confirm the status of IPsec tunnels. The high level process flow of the function is as follows:
api.Metric.send
), the script posts a 1 if the status of the tunnel is connected and a 0 if the tunnel is disconnected.As of November 2016, Lambda added support for environment variables. We use four different environment variables to configure the function across our customers’ VNS3 VPN implementations. The variables are the DDAPIKEY, DDAPPKEY, VPN_, VPN (optional) and VPNENV. More details on these environment variables can be found in the README.
Reaching the Datadog API requires internet access, so your Lambda function must be configured in subnets with internet access, either directly through the IGW (in a public subnet) or through a NAT Gateway (in a private subnet).
In order to securely allow the Lambda function to access VNS3 VPN controller, you should allow traffic over port 8000 from a source of the security group you associated with your Lambda function. Once you have configured your security group, you should receive an “OK” upon testing your Lambda function. Now that we have the appropriate metric posted to Datadog, we next need to configure the monitor.
As mentioned previously, the script uses api.Metric.send
to post the custom metric to Datadog (a 1 for connected tunnels and a 0 for disconnected tunnels). Follow the steps below to configure your Datadog monitor. This section of the script is where most of the Datadog logic resides:
vpn.tunnel.status
metric. The vpn.tunnel.status
metric is the custom metric being imported into Datadog (a 1 or a 0). We want the min by
metric, which means that Datadog will take the minimum value across each of the tunnel’s metrics. For example, if the controller has 30 tunnels configured, each minute has 30 different data points of 1 or 0. If any of those 30 tunnels post a 0, Datadog uses that metric when evaluating the alert conditions.vpn_environment:production
tag. The Lambda function posts the custom metric to Datadog and creates the tag key of vpn_environment
. The value of this tag is equal to the VPNENV
environment variable in the Lambda function config. If you are using multiple Lambda functions to monitor different VNS3s, then you would change this tag to the appropriate value based on which controller you are monitoring.Simple Alert
. If you would like to receive an alert for each individual tunnel (with the {{tunnel.name}} value pulled into the alert), you can choose Multi Alert
. The downside to Multi Alert is that if the connection to your IPsec peer(s) drop, you will receive a noisy, separate alert for each tunnel that goes down vs. one generic alert.Below is a screenshot of what a monitor might look like in the Datadog console:
Datadog also supports webhooks, which we leverage to generate support tickets for our 24/7 CloudOps{:target=“_blank”} team. Using Lambda is a great way to monitor infrastructure, and we use it religiously across Trek10 for many of our systems. We suggest you give it a shot as well!
You can find the code, along with the README, which explains the implementation in more detail, on our GitHub page{:target=“_blank”}.
How to synthesize an (almost) identical payload using Amazon EventBridge rules.