Cloud Native
Control Tower: Then vs Now
Control Tower today is not the same Control Tower that you may have been introduced to in the past.
I won't start this article saying there is "one true-way" for building SaaS on cloud providers, specifically AWS. I will confidently say that there are many wrong ways. At Trek10, we find ourselves helping clients that have seen their AWS usage skyrocket and need to organize the chaos of an organic, home-grown crop of AWS Cloud. This article is distilled from years of working with folks at various points in their AWS journey in an effort to help guide you away from the wrong ways, and towards a successful path.
In fact, Trek10 has helped enough folks build and maintain SaaS on AWS that we are pleased to be an inaugural Launch Partner for the brand new AWS SaaS Competency in both the Design Services and Builders categories.
You know the story, and you may even have a part in the story yourself. A company is pivoting to the Software-as-a-Service model to modernize its offerings. Someone at the top hears that the cloud "accelerates the pace of innovation" and proclaims, "we must get on board or get left behind; this company won't be the laggards of the adoption curve!" If you are really on the cutting edge, an engineer or two has heard about this new "serverless" thing and is just plain tired of ssh-ing in and patching their fleet of "totally-automated everything away boss" EC2 instances.
Based on what I have seen and learned over the years, and some discussion with trusted colleagues, I want to talk about how I would (and in fact have) set up new organizations (products, SaaS, whathaveyou) from day one for future success. Here are a few guiding beacons to help you make the right long-term decisions.
Let's dive in.
This may seem silly, especially if your organization has been around for a century. However, in new products and old companies alike the approach of building as if you may sell at any time has a similar effect; it forces you to build with best practices and isolation.
To achieve this goal, I’d lean on a few well known best practices, and perhaps some lesser-known tooling.
With the introduction of AWS Organizations, AWS made it clear that the multi-account strategy is the cut path in the deep jungles of account management. Stack on Control Tower and various Landing Zone offerings and you can rest assured that in this day and age you can easily leverage accounts for isolation in a practiced manner in even your most trivial projects.
These days, one of my favorite tools to orchestrate things is nothing more than a thin wrapper around native AWS tooling is AWS OrganizationFormation. OrgFormation works with a few simple tagging and logic schemes to set up accounts and deploy/maintain CloudFormation templates across them.
AWS Organizations also simplifies your technical auditing needs by centralizing AWS Config and GuardDuty.
The AWS account is also one of the most effective tools in your kit for blast radius limiting.
Finally, AWS Organizations centralizes your billing. Any sufficiently large cloud operation ends up requiring a Cloud Economist to make any sense of the madness, but with Organizations you can pretend to keep things under control for a while with the centralized billing as well as some simple alarms or dashboards.
A basic organization setup looks something like this.
Following this structure, you will note that each of the “products” is actually distributed into its own “organization unit” and each environment is broken up into its own AWS account. Billing, CI/CD and even Security & Auditing are centralized which can help maintain insight across the company.
This one may seem a bit odd, but let’s take a direct quote from my friend Ben Kehoe, Cloud Robotics Research Scientist and AWS Community Hero.
Move your development environment towards the cloud, do not try to move the cloud down to your dev environment.
There are an undeniable amount of tools and initiatives to bring familiar development cycles to local developer environments. Some are better than others. Some achieve more parity than others. As good as some of these are, inevitably you end up in a situation where something is not quite perfect and you spend days or weeks trying to work around that issue for your team. Even worse, you spend cycles debugging when something isn't quite right in your deployed cloud version.
These days, I personally do the majority of my development on AWS Cloud9. The fast, reliable internet on the box and what I'd call "good enough" feature set and language support are truly sufficient about 90% of the time.
In addition to Cloud9, I would also highly suggest either a shared "developer" AWS Sandbox account, or for more mature organizations a Sandbox per developer. If you are feeling extremely ambitious, maybe even ephemeral AWS accounts.
I'd ask most developers to start their day in Cloud9, and only eject to their local machines if they really need to. I would expect that they are doing unit tests or simple mocks locally to get their tests into a place where they are rapid and valuable locally, but that their real development and testing is pushing to their sandbox AWS accounts.
Simply put, there is no sufficient simulation or substitute for actual cloud resources.
This also encourages and requires good practice around integration and end-to-end testing.
This one is a bit more nuanced but let’s think about the decisions it forces us to make. We need to be a bit more introspective on our internal practices and scrutinize our codebase as if it is open to the world.
This means that security by obscurity, while never a good practice, is a definite no-go. We can’t rely on people not knowing we moved our admin
endpoint to /unfindable-except-by-everyone
. It also means no secrets strewed about in code, and no cert files stored in the repository.
Your project dependencies need to be regularly audited and remain up to date. You need to be able to roll-out new patched versions of your software soon after critical Common Vulnerabilities and Exposures (CVE) are released against those dependencies. The only way this is safe and feasible is with automated and comprehensive pipelines for deploying your code.
There is also the implied "embarrassment" angle. Sure, a little bit of sloppy coding is evident even in many open source projects. But knowing (or pretending) that some external force is looking over your shoulder may help you take the extra hour or two to properly pull out modules or add those tests you know should be done.
The final bit of advice that falls under this principle is documentation. An open source project is only as good as its documentation. You could have the most elegant API in the world, but if there isn't enough documentation to communicate that point to new-comers there won't be anyone using your project. Good documentation extends beyond just your code or API. It encompasses your infrastructure practices, your guiding principles as a project, your on-boarding for new developers, and yes, your code and APIs.
Look ma’ my biases are showing! Clearly, I am an advocate for building with the tools provided by my platform of choice.
If AWS doesn't have it (or really, if CloudFormation doesn't support it) does it really exist? I get that this is a pretty aggressive, if not flippant, statement. However, any time I stray from a platform-native offering it's a reasonably high likelihood that I will regret it.
This also means leaning heavily into all the service offerings and orchestration tooling that is afforded to you by your platform. Don't be afraid to set some boundaries for your teams, but don't dogmatically enforce them.
As Alex DeBrie put it for me...
Provide standards but allow experimentation. AWS is a broad ecosystem, and there are some holes. Your company will choose some services and patterns that you prefer and others that you don't. Help your engineers understand what the preferred and supported patterns are. Make it clear that they can go off those paths, but they're going to be more on their own. This is akin to Charity's 'Golden Path' approach, but it's not really AWS specific.
To expand, there are certainly cases where staying within the platform isn't the most optimal solution. There are cases where the market is a few steps ahead of your platform provider. For instance, AWS doesn't have anything quite as tuned to fast frontend search experiences like Algolia. But the point remains that going outside of the platform should be an exception and something you do only when truly needed. While going out to the market isn't my first option, but it definitely fits with how I think about serverless still in 2020.
My thinking on serverless these days in order of consideration.
— Jared Short (@ShortJared) February 27, 2019
- If the platform has it, use it
- If the market has it, buy it
- If you can reconsider requirements, do it
- If you have to build it, own it
Now, going out to the market and buying something doesn't always work out great. But my regret is lower than if I tried to build something in its place. To illustrate, I'll lean on my good friend Forrest Brazeal for the next thousand words.
While writing for this piece, I reached out to 5 or so trusted folks to ask their guidance for organizations building on the cloud. Every single person explicitly stated that Infrastructure as Code (IaC) is an essential piece of the puzzle.
If you are not managing your infrastructure via CloudFormation, Terraform, or any of the other myriad of ways to model and deploy your infrastructure in repeatable ways you will eventually come to regret it. Whether you pay for it via needing to painstakingly rebuild a new environment as your application grows, or if you forget to check a box during a 3 hour manual deployment from staging to production and inexplicably bring down your application for a couple harrowing hours.
If you take nothing else away from this post, please let it be that IaC is a cornerstone of a health product lifecycle on cloud.
These 3 "guiding principles" are just that. Guidelines. They are not tenets that followed dogmatically guarantee success. They are in fact generalized snippets boiled out of some years of working with clients and on various AWS projects.
Evaluate every technology decision as a long term decision of partnership. Sure, you are billed by the second/day/month, but really you are electing to take on that technology as a partner in the growth of your SaaS. Some decisions are easier replaced than others, some choices you outgrow. That's natural and expected. In the same vein, don't evaluate your past choices in the current context. You will always know more now than you did then.
When it comes to those decisions, give your team authority and power to build and innovate. Don't shy away from the strengths of the cloud to augment your teams, and don't shy away from trusting your teams.
As Richard Boyd so eloquently put it for me...
Push authority down (don't push the CEO down a flight of stairs). A Software team that is responsible for the operational requirements of their application will drive the ops burden down. Typically this is done at an organizational level because Dev and Ops are separate teams/org. By forcing dev teams to own the ops of their applications, they have the responsibility (and the authority) to make changes in it that make the application more stable. *Note, this is true in many places but it much more pronounced in the cloud where most software runs as a service*
All said I'd build a new SaaS product without hesitation on AWS, and others should strongly consider it, but do so with a plan of action for your tech decisions and your teams.
Control Tower today is not the same Control Tower that you may have been introduced to in the past.