Building blocks of Cloud Data Platform

Layers of Data Platform

Ingestion Layer
Storage Layer
Processing Layer
Serving Layer

Ingestion Layer

It’s all about getting data into data platform. Data sources coming from or reaching to Relational or NoSQL databases, files storage, Internal or 3^rd party API’s to extract data.

This layer should not modify or transform the incoming data in any way. This is RAW, unprocessed data is always available in data lake for data lineage tracking and re-processing.

Storage Layer

After acquired different data sources, it should be stored on this layer. Here it is coming data lake storage, which is scalable and inexpensive. It’s been without any restrictions on any types of files ( Text, csv, JSON, binary files like parquet, images, video and Avro).

Processing Layer

After data stored into cloud Data Lake, It’s ready to process to meaningful. Traditional Data warehouse has many tools to process like SQL and ETL tools.

Data processing framework can handle any amount of data with cloud compute resource. Few are most notable are:

Apache Spark
Glue
Python

Important thing to keep in mind that,

For Batch processing – Which is mentioned in diagram for ingestion layer to stored layer and then Processing layer is works well.

For Streaming Process – Instead of stored layer directly process the streaming data from Ingestion layer to Process layer. Here, cloud stored layer used as Archive process.

Processing data in data platform includes many distinct steps of Schema management, data validation, data cleansing, Production of data products.

Serving Layer

This layer used for preparing data for end-users or systems. Power users and analysts run ad-hoc sql queries, Data Scientists and Developers use the programming language where they most comfortable with to prototype new data transformation or build machine learning models.

Cloud makes it easy to access different task by using single architecture like load data from data lake into cloud data warehouse.

Provide Data Lake access to other applications, load data from Lake into a fast KEY/VALUE or Document Store and point the application to that.
For Data Science and engineering teams, Cloud Data lake provides an environment where they can work with the data directly in cloud storage by using a processing framework like Spark or Flink.

Cloud Data Platform – Handling 3 V’s (Variety, Volume and Velocity)

Variety

Cloud platform accepts any form of data or data types which is no limitations or restrictions.

There are plug-and-play ingestion tools for storing into cloud platform like Kafka Connect, Apache NiFi.

Volume

Cloud storage is elastic which is pay for use method which makes inexpensive for data volume.

Velocity

When data scientists are users for the data systems, volume and variety challenges come into play all at once.

Two more V’s

Veracity

Maintain data governance for accuracy in big data platforms .

Value

Turning data into value only by data users or systems, when get timely and effectively access to data.

Contract Information

Building blocks of Cloud Data Platform

Layers of Data Platform

Ingestion Layer

Storage Layer

Processing Layer

Serving Layer

Cloud Data Platform – Handling 3 V’s (Variety, Volume and Velocity)

Variety

Volume

Velocity

Two more V’s

Veracity

Value

Leave feedback about this Cancel Reply

Contract Information

Building blocks of Cloud Data Platform

Layers of Data Platform

Ingestion Layer

Storage Layer

Processing Layer

Serving Layer

Cloud Data Platform – Handling 3 V’s (Variety, Volume and Velocity)

Variety

Volume

Velocity

Two more V’s

Veracity

Value

Follow Me:

Leave feedback about this Cancel Reply