Your browser doesn't support the features required by impress.js, so you are presented with a simplified version of this presentation.

For the best experience please use the latest Chrome, Safari or Firefox browser.

Pain Points in Big Data

    And

Lambda Architecture

 

DFW-Clojure Meetup

Pain Points in Big Data

Pain Point:

Data Persistence

Data Persistence

Disk failure

Data Persistence

Human failure

Data Persistence

Overwritten data (design failure)

Fixing Persistence

Pain Point:

Data Size

Data Size

Too much data to fit on one drive

Data Size

Not helped by append-only policies

Data Size

Sharding

Data Size

Inconsistency after sharding/replication

Fixing Data Size

Pain Point:

Queries

Queries

Data lives in many places

Queries

Data is too big to process in real time

Queries

Data keeps coming in

Fixing Queries

 

Fixing Queries

???

Options:

Give Up?

Go back to smaller data

Press Forward

Gird your loins

Lambda Architecture

The Solution to Our Problems

 

Caveat: This is how I understand the Lambda Architecture from Nathan Marz's excellent, half-completed book Big Data.

Batch Layer

Serving Layer

Speed Layer

Data Ingestion

Answer Queries

Batch Layer

Data Format

Data Storage

Data Processing

Batch Layer

Serving Layer

Speed Layer

Data Ingestion

Answer Queries

Serving Layer

Batch Layer

Serving Layer

Speed Layer

Data Ingestion

Answer Queries

Speed Layer

Batch Layer

Serving Layer

Speed Layer

Data Ingestion

Answer Queries

Data Ingestion

Batch Layer

Serving Layer

Speed Layer

Data Ingestion

Answer Queries

Answer Queries

Batch Layer

Serving Layer

Speed Layer

Data Ingestion

Answer Queries

The Plan

Pain Points in Big Data

    And

Lambda Architecture