Is your data secure? Find out with our free IBM security assessment! Learn More →

Services
Focus Areas

Areas of Expertise
Engagements

Discover

Build

Support
Areas of Expertise

App Modernization

Public Sector

Serverless

IoT

DevOps

Migration

Data and Machine Learning (ML)

Enterprise Architecture

24/7 Monitoring

Team Support

Datadog

Overview

Are you taking advantage of modernizing your AWS apps to protect your cloud investments?

Overview

Our mission is to accelerate high-quality cloud adoption across the Public Sector.

Overview

Whether you are new to serverless or looking to scale, Trek10 allows you to focus on building applications, not managing servers.

Related Content

AWS Lambda

With AWS Lambda, you can run code without the need for managing servers in a cost-effective manner.

Blog

What is Serverless and Why Does it Matter?

Overview

Whether you’re looking to gain visibility into plant floor machinery or seeking to enhance process efficiency, Trek10 can help.

Related Content

Blog

Serverless Architectures: IoT

Blog

Is IoT Device Shadow Right for You?

or should you build-your-own with DynamoDB?

Overview

Shorten the development lifecycle, increase reliability, and release software faster.

Related Content

AWS CloudFormation

AWS CloudFormation helps you save time and money by configuring and managing resources for you.

Containers on AWS

Containers on AWS makes managing container registries easy, autonomous, reliable, and safe from anywhere.

Overview

At Trek10, we rapidly migrate your applications with a focus on cost-effectiveness

Related Content

Amazon WorkSpaces

Amazon WorkSpaces allows you to quickly scale according to your virtual desktop needs.

Containers on AWS

Containers on AWS makes managing container registries easy, autonomous, reliable, and safe from anywhere.

Overview

Uncover insights from your data no matter where you are in your analytics journey.

Related Content

Machine Learning Ops

MLOps constitute best practices for developing, deploying, and monitoring high precision Machine Learning models.

Amazon SageMaker

Amazon SageMaker enables developers and data scientists to easily build ML models.

Overview

Enterprise Architecture (EA) combines business and technology in a proven industry recognized framework to deliver business focused results based on your industry, environment, competition and the ever increasing capabilities of cloud technologies.

Related Content

Developer Acceleration

A series of in-person architect-led training modules designed to help your team develop the necessary skills and best practices to modernize your applications.

Overview

Maximize the uptime and security of your most critical applications.

Related Content

Amazon CloudWatch

Amazon CloudWatch makes performance monitoring simple for you and your business.

Disaster Recovery

Prevent downtime, strengthen resilience, and avoid unanticipated costs with a comprehensive Disaster Recovery Plan.

Overview

Experienced solutions architects and developers at your service, on-demand.

Related Content

Amazon CloudWatch

Amazon CloudWatch makes performance monitoring simple for you and your business.

Disaster Recovery

Prevent downtime, strengthen resilience, and avoid unanticipated costs with a comprehensive Disaster Recovery Plan.

Overview

Let Trek10 help you hit the ground running with Datadog.

Related Content

AWS Premier Partner

Discover

Cloud-Native Immersion Day

Developer Acceleration

Retail | Industry Overview

SaaS on AWS

Serverless Workshop

Overview

Trek10's Cloud-Native Immersion Days are focused, high impact training sessions that will drench your teams in knowledge of the latest tech and best-practices.

Overview

Trek10’s expert-led Developer Acceleration workshops help enterprise teams quickly and safely jump-start their serverless journey.

Overview

Leveraging the vast capabilities of the AWS ecosystem, Trek10 provides retail businesses with solutions tailored to their unique needs, enabling them to innovate at speed and scale.

Overview

Trek10 helps companies migrate and build their SaaS offering on AWS with a cloud-native approach.

Overview

Whether it’s a greenfield project or re-architecting legacy, Trek10 is your guide to adopting cloud native architectures.

Build

DevOps Transformation

Internet of Things (IoT) Applications

Security

Overview

At Trek10, we leverage the best AWS native and third party tools for code-defined infrastructure, continuous integration, and automated deployment pipelines.

Overview

Trek10 helps you deliver on the promise of IoT by guiding you through the process of connecting your devices to AWS and by designing, implementing, and fully supporting your AWS cloud infrastructure.

Overview

Trek10’s security solutions and services will secure your AWS APIs and infrastructure. Schedule a meeting today to see if you qualify for a free security scan and report.

Support

CloudOps 24/7 Monitoring & Support

CloudOps Team Support

Overview

Trek10 brings managed services to the cloud. Our team works hard to reduce noise and maximize uptime in every AWS environment we manage.

Overview

Trek10 Team Support augments your team’s skills with access to a team of experienced and focused AWS solutions architects and cloud developers that specialize in leveraging AWS to the fullest.

Overview

Everyone who moves to AWS wants to secure their environment, but knowing where to start is hard. That is where Trek10 can help.
Case Studies
About
Careers
AWS Premier Partner
Community
CloudProse Blog

Spotlight

Serverless

Cost and Pricing Analysis

Cloud Native

Developer Experience

Databases

News

IoT

Monitoring, Ops & DevOps

Containers

Security and IAM

Generative AI and Machine Learning (ML)

Search Trek10

Building a Simple AWS Data Warehouse Solution Hero

Data and Analytics

Building a Simple AWS Data Warehouse Solution with Data Streaming

Easy and affordable data storage and analysis on AWS.

Kelly Briceno | Apr 05 2023
10 min read

What is a Data Warehouse?

A data warehouse is simply a system that receives data from multiple sources for the purpose of data analysis. Usually, it is used as a key component of business intelligence by integrating data from multiple sources so that the business can make better decisions.

Data warehouses are optimized for low-cost storage & querying large data sets. In AWS, you can build a simple and low-cost data warehouse with Amazon S3, querying it with Amazon Athena, and building reports with Amazon QuickSight. (S3 is also the preferred location for a larger-scale enterprise data lake.)

On the other hand, transactional databases used for user-interactive applications need to be optimized for fast reads & writes of individual rows of data. In AWS, Amazon DynamoDB is a great choice for this use case. However, DynamoDB is very inefficient for data warehouse workloads. So you need to get your data out of DynamoDB and into a data warehouse to derive business intelligence value.

But how do you get the data from your transactional database to your data warehouse? That’s where data streaming comes in. In this post, we’ll walk you through how to get your data from DynamoDB into a data warehouse in Amazon S3 and then build reports to derive business value.

Maximize your ROI with Data and Machine Learning on AWS

Why Would a Client Want a Data Warehouse Solution?

In general having a data warehouse solution in place improves accessing data from multiple sources, saving time on retrieving data and adding time to improving business intelligence.
Another side to a data warehouse would be to store historical data that may need to be accessed in the future.
Data warehousing can also be used to create metadata helping users understand the data.

A data warehouse solution can range from simple to more advanced depending on the number of sources you need to pull data from. For the purpose of this article, I will be giving a guide to a simple data warehouse that can easily be done as a first project while keeping costs low for the beginner looking to work on a project.

In this example, we will be utilizing Amazon DynamoDB as the database service and an AWS Lambda function using the AWS Serverless Application Model (SAM) and AWS CloudFormation. Below we are creating a CloudFormation template that defines the resources needed for our solution. This is assuming an AWS account has already been set up and you have access to all the services AWS provides.

Here is an example of what a CloudFormation template for a simple data warehouse solution might look like:

AWSTemplateFormatVersion: '2010-09-09'
Transform: ‘AWS::Serverless-2016-10-31’
Description: Simple data warehouse solution using DynamoDB and Lambda

Resources:
  # Define an S3 bucket for storing data
  DataBucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: my-datalake-bucket
  
  # Define a DynamoDB table for storing data. This is where your data will be processed and queried.
  DataTable:
    Type: AWS::DynamoDB::Table
    Properties:
      TableName: my-data-table
      AttributeDefinitions:
        - AttributeName: id
          AttributeType: S
      KeySchema:
        - AttributeName: id
          KeyType: HASH
      BillingMode: PAY_PER_REQUEST
      PointInTimeRecoverySpecification:
        PointInTimeRecoveryEnabled: true 
      StreamSpecification:
        StreamViewType: NEW_IMAGE

  # Define a Lambda function for loading data out of the DynamoDB table to s3 bucket
  DataLoaderFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: index.lambda_handler
      Runtime: python3.9      
      Timeout: 60
      CodeUri: src/
      Environment:
        Variables:
          DataBucket: !Ref DataBucket
          DataTable: !Ref DataTable
          TABLE_NAME: 'my-data-table'
      Policies:
        - DynamoDBCrudPolicy:
            TableName: !Ref DataTable
        - S3CrudPolicy:
            BucketName: !Ref DataBucket 
      Events:
        DynamoDB:
          Type: DynamoDB
          Properties:
            Stream: !GetAtt DataTable.StreamArn
            StartingPosition: LATEST
            BatchSize: 100

Outputs:
  DynamoDBArn:
    Value: !GetAtt DataTable.Arn 
  FunctionArn:
    Description: 'DynamoDB handler function ARN'
    Value: !GetAtt DataLoaderFunction.Arn  
  S3BucketNameArn:
    Description: Arn of the S3 bucket
    Value: !GetAtt DataBucket.Arn

This SAM template defines an S3 bucket for storing data, a DynamoDB table as the database, and a Lambda function for loading data out of the DynamoDB stream and into the S3 bucket.

To use this template, you would deploy this as a SAM template using the AWS Management Console, AWS CLI, or AWS SDK. This would create all the resources defined in the template, including the Lambda function that loads data from the DynamoDB table to the S3 bucket. The code for the Lambda function itself is below.

import json
import boto3
import datetime
import os
from decimal import Decimal
from boto3.dynamodb.types import TypeDeserializer

#Fixes convert decimal to float error
class DecimalEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, Decimal):
            return float(obj)
        return json.JSONEncoder.default(self, obj)
        
td = TypeDeserializer()
s3 = boto3.client('s3')
""" below will grab each record in the dynamodb table and iterate over each to pull the version and set the date correctly in s3. """
def lambda_handler(event, context):
    bucketName = os.environ['DataBucket']
    DataSource = os.environ['DataTable']

    for record in event['Records']:
        print("Processing Record", record['dynamodb'])
        if record['dynamodb'].get("NewImage", None) is not None:
            print("Processing NewImage")
            raw = record['dynamodb']['NewImage']
            newImage = {k: td.deserialize(v) for k,v in raw.items()}
            eventSource = record['eventSource']
            version = newImage.get("Version", "0")
            currentTs = datetime.datetime.now()
            key = "{}/{}/{}/{}/{}/{}.json".format(DataSource, version, currentTs.date().year, str(currentTs.date().month).zfill(2), str(currentTs.date().day).zfill(2), currentTs.isoformat())
            
# export JSON to s3 bucket
    body = json.dumps(newImage, cls=DecimalEncoder)
    print(bucketName, key)
    response = s3.put_object(Bucket=bucketName, Key=key, Body=body)

You can schedule the Lambda function to run on a time interval in order to retrieve any new data coming into the database. Or you can manually launch the Lambda function whenever you need to pull and store the data. By following these steps, you can set up a simple data warehouse solution on AWS that uses an S3 bucket as your warehouse, DynamoDB as your database, and a Lambda function to move the data over. This can easily be adjusted if you have an outside data warehouse source.

A quick and easy way to get value out of the S3 data is by creating a report with Amazon QuickSight. QuickSight must be granted access to the S3 bucket you want it to read from first. To launch an Amazon QuickSight report with S3 data you would log into your QuickSight account. Click “New Analysis”, and choose “New Data Set”.

Select “Amazon S3” as the data source.

From there you will enter your S3 bucket information and select the data file you want to use. Continue creating the data set by following the instructions to set up the report. Once the data is loaded, you can easily use QuickSight’s visualizations and tools to analyze and present the data in the report. You can then publish and create a dashboard off of the report.

This is a simple way to quickly visualize the S3 data we pulled above and to allow business intelligence to have real-time insights and analysis.

This solution can be easily expanded and customized to meet the needs of your business. With this solution, you can deliver historical data to a location that is easily accessible in the future or create metadata for business intelligence.

Author

Kelly Briceno

Go to Stories by Kelly

Similar Blog

Spotlight

Demoing the Blues Wifi + Cell Communication Module

Explore the Blues Cell + Wifi communication module on a Raspberry Pi Zero, Notehub, and thoughts on the pros and cons of utilizing Blues in your IoT project.