Spotlight
AWS Lambda Functions: Return Response and Continue Executing
A how-to guide using the Node.js Lambda runtime.
Fri, 16 Nov 2018
In this post I’ll outline a generalized approach to building an event-driven, serverless architecture on AWS when you rely on data from a legacy system. If you are trying to figure out how to build new cutting edge architectures in AWS but are reliant on data in a legacy database, this is for you.
In the world of serverless, it’s important to think differently about architecture: goodbye monolith, hello event-driven, or even further, stream-based architecture. Your various components of compute (i.e. Lambda), data storage (i.e. DynamoDB or S3), and routing (i.e. SNS or API Gateway) need to relate to each other through events. Event-driven thinking can also help you understand your core business processes and make it easier to extend your architecture. If you fall back on old monolithic constructs you miss many of the benefits of serverless architectures: scalability, efficiency, and flexibility.
That said, rarely are new architectures designed in a vacuum. Almost always there is some other data in your enterprise that is important for your new application. Usually that data is sitting in a SQL database; often that database is on-premise. At a high level, here is the problem:
This new challenge is one that we have dealt with many times at Trek10. How do you get your on-premise data into your new serverless application in near real-time? The approaches for addressing this are all over the map and always custom: on-premise agents querying data and pushing it into AWS, custom replication to a database in AWS, or vendor-specific solutions. These all have their flaws, but more importantly, they are always custom built from scratch, increasing project time and cost.
A generalized pattern to solving this challenge could be applied quickly and easily to the vast majority of scenarios. This would result in faster implementation and a more battle-tested solution. Here is a concept for one such pattern…
With this architecture, all changes in your legacy SQL database (supporting a wide variety of DB vendors) are pushed onto a Kinesis event stream, ready to be consumed. The only thing left to build is your custom business logic that consumes these events… for example to use Kinesis Analytics to run alerting, process the data with a Lambda function, or archive it into an S3 data lake with Kinesis Firehose.
All of your database changes will come onto the stream with a representation of the old data (if any) and the new data, both in JSON format. The beauty of this is that if you choose to replicate *
columns in your tables, schema updates automatically flow through the system: it automatically identifies new columns and adds keys to the JSON.
So at the end of the day, your row-oriented SQL data will end up in a Kinesis stream for Lambda functions and other consumers as JSON elements with “old” and “new” records for every record added, edited, and deleted.
(If you are familiar with these components, you may be wondering, why not just drive Lambda off of a DynamoDB stream? We’ll explain more below, but the short answer is: unifying multiple tables into a single stream.)
Let’s go through some of the system’s components in more detail.
This is our key to extracting data in a generalized way. As some at AWS will tell you, this service has been misnamed: it was originally intended to focus on one-off migrations, but so many customers have begun to use it for long-term replication that it should change its name to “Database Replication Service”. AWS has embraced this, adding features needed for production-ready replication like multi-AZ failover. It is now a common choice for long-term replication.
Of course, DMS supports a diverse list of sources, including all the legacy favorites like Oracle and SQL Server. Critically, because it is using Change Data Capture, it adds very minimal load onto the source servers. This approach is far more efficient than building a custom agent that is adding reads to your database.
DynamoDB provides us a convenient waypoint for the data for a few reasons:
This feature is our link to the next component of the system.
If your processing needs are simple, you could skip this and just do your data processing in a Lambda function off of DynamoDB Streams. However in our reference architecture, we use a Lambda function as glue between DynamoDB Streams and Kinesis, by having this simple function receive the events from the DynamoDB stream and drop them on to the Kinesis Stream. This has a few benefits:
A few important things to keep in mind:
All of the components here support CloudFormation, so a great next step would be to template out this architecture so it could be applied repeatedly. There is some added complexity in this case because DMS dynamically creates the destination DynamoDB tables, so the automation would have to be flexible enough to adapt to dynamically created tables. But once that is complete, this is a system that could be up and running in minutes, allowing the builder to focus on source system configuration and business logic to process the incoming events. Trek10 hopes to open source a solution along these lines in the coming months.
We’d love to hear your thoughts about this proposed solution. See any holes in it? Have you tried something like this and can report on your experience? Let us know @Trek10Inc, and keep building!
A how-to guide using the Node.js Lambda runtime.