Data and Analytics
Building a Simple AWS Data Warehouse Solution with Data Streaming
Easy and affordable data storage and analysis on AWS.
Mon, 04 Aug 2014
You don’t hear a lot about AWS Redshift in the marketplace, and frankly I think it is the most under-appreciated service on AWS. Simply put, Redshift is a revolution in data warehousing infrastructure. Column-oriented databases, which are a type of database optimized for read-heavy or OLAP (online analytical processing) workloads, have been around for a long time, but innovation in the field has accelerated in the last 10 years with disruptive newcomers such as Vertica and ParAccel, which added a scale-out model that has allowed for petabyte-scale SQL-based data warehouses.
However, even with that disruption, the barrier to entry remained high: not only was the cost of hardware and software easily nearing six-figures for a basic solution, but the complexity of installation and maintenance greatly increased total cost of ownership. As a result, these technologies were mostly inaccessible by small and mid-sized businesses.
However, SMBs still have a critical need for answers from their data. And in these days of data accumulation and online activity, many SMBs have pretty “big” data that quickly outgrows the capabilities of traditional OLTP systems such as MySQL and SQL Server. Even if they are “getting by” with a few pre-built reports that run overnight and take an eternity, there is massive untapped potential. The power of analytics comes when the user can quickly and interactively explore the data, build new reports, and easily expose dashboards to decision makers. A fast-responding data warehouse is critical infrastructure for such discovery.
Enter Redshift. The underlying technology powering Redshift is not new, but the economics and service delivery are truly revolutionary. Like all AWS services, it is pay-by-the-hour… but it gets better… the lowest-cost, single-node deployment is priced at only $6/day. If you reserve for a year and pay part up front, the total annual cost is only $1400! This is a far cry from the six-figure deployments of competitors. No one in the marketplace comes anywhere close to this cost.
What’s more, the total cost of ownership is not much higher, due to Redshift’s service delivery model. It is a “Platform as a Service”, meaning that when you provision a Redshift node, you manage the details of your data warehouse through a simple interface, and connect to the database to load data, run queries, or connect to a BI/analytics/reporting tool. There is no server to manage, no software to install, no patches to update. It “just works.""
Now, all that said, I want to sound a note of caution… Redshift is not trivial. A trained DBA is important for setting up your data warehouse, and there are some Redshift-unique features that the technical team must learn to properly and optimally use it.
Still, I think you will agree… when you click that button, launch a cluster, load some data, and start serving reports… it sure feels like a revolution!
Easy and affordable data storage and analysis on AWS.