🚀 E-commerce Data Pipeline (Lakeflow / DLT)

This project implements a declarative ETL pipeline using Delta Live Tables (DLT). It demonstrates the Medallion Architecture by processing raw JSON data from cloud storage into high-quality Materialized Views for analytics.

📚 Project Source

This project is part of the Databricks Data Engineer Learning Plan.

Course: Build Data Pipelines with Lakeflow / Spark Declarative Pipelines
Source: Databricks Academy - Learning Plan

🏗️ Architecture Overview

The pipeline transforms data through three stages, utilizing both Streaming Tables (for incremental processing) and Materialized Views (for final aggregations).

1. ☁️ Ingestion (Bronze Layer)

Ingests raw JSON Files from Cloud Storage.

orders_bronze (Streaming Table)
status_bronze (Streaming Table)
customers_bronze (Streaming Table)

2. ⚙️ Transformation & CDC (Silver Layer)

Cleans data and handles history.

orders_silver & status_silver: Cleaned streaming tables.
Customer CDC Logic:
- customers_bronze_clean: Preliminary cleaning.
- type1_customers_silver: Applies Change Data Capture (CDC) logic to handle inserts, updates, and deletes, ensuring the table always reflects the current state of the customer (SCD Type 1).

3. 📊 Analytics (Gold Layer)

Business-level aggregates and joins exposed as Materialized Views.

full_order_info_gold: Joins Orders and Status to provide a complete view.
gold_orders_by_date: Aggregates order volume over time.
Filtered Views:
- cancelled_orders: Subset of cancelled transactions.
- delivered_orders: Subset of successful deliveries.

🛠️ Tech Stack

Platform: Databricks (Data Intelligence Platform)
Orchestration: Delta Live Tables (Declarative Pipelines)
Format: Delta Lake
Languages: Python (PySpark) / SQL

🔄 Data Flow

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Deploy a DLT Pipeline to Production		Deploy a DLT Pipeline to Production
01- Pipeline Setup.ipynb		01- Pipeline Setup.ipynb
03 - Deploying a Pipeline to Production.ipynb		03 - Deploying a Pipeline to Production.ipynb
04- Change Data Capture with AUTO CDC.ipynb		04- Change Data Capture with AUTO CDC.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 E-commerce Data Pipeline (Lakeflow / DLT)

📚 Project Source

🏗️ Architecture Overview

1. ☁️ Ingestion (Bronze Layer)

2. ⚙️ Transformation & CDC (Silver Layer)

3. 📊 Analytics (Gold Layer)

🛠️ Tech Stack

🔄 Data Flow

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀 E-commerce Data Pipeline (Lakeflow / DLT)

📚 Project Source

🏗️ Architecture Overview

1. ☁️ Ingestion (Bronze Layer)

2. ⚙️ Transformation & CDC (Silver Layer)

3. 📊 Analytics (Gold Layer)

🛠️ Tech Stack

🔄 Data Flow

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages