Skip to content

manelmech/Databricks-Spark-Declarative-Pipelines

Repository files navigation

Module Course

🚀 E-commerce Data Pipeline (Lakeflow / DLT)

Databricks PySpark Delta Live Tables

This project implements a declarative ETL pipeline using Delta Live Tables (DLT). It demonstrates the Medallion Architecture by processing raw JSON data from cloud storage into high-quality Materialized Views for analytics.

image

📚 Project Source

This project is part of the Databricks Data Engineer Learning Plan.

🏗️ Architecture Overview

The pipeline transforms data through three stages, utilizing both Streaming Tables (for incremental processing) and Materialized Views (for final aggregations).

1. ☁️ Ingestion (Bronze Layer)

Ingests raw JSON Files from Cloud Storage.

  • orders_bronze (Streaming Table)
  • status_bronze (Streaming Table)
  • customers_bronze (Streaming Table)

2. ⚙️ Transformation & CDC (Silver Layer)

Cleans data and handles history.

  • orders_silver & status_silver: Cleaned streaming tables.
  • Customer CDC Logic:
    • customers_bronze_clean: Preliminary cleaning.
    • type1_customers_silver: Applies Change Data Capture (CDC) logic to handle inserts, updates, and deletes, ensuring the table always reflects the current state of the customer (SCD Type 1).

3. 📊 Analytics (Gold Layer)

Business-level aggregates and joins exposed as Materialized Views.

  • full_order_info_gold: Joins Orders and Status to provide a complete view.
  • gold_orders_by_date: Aggregates order volume over time.
  • Filtered Views:
    • cancelled_orders: Subset of cancelled transactions.
    • delivered_orders: Subset of successful deliveries.

🛠️ Tech Stack

  • Platform: Databricks (Data Intelligence Platform)
  • Orchestration: Delta Live Tables (Declarative Pipelines)
  • Format: Delta Lake
  • Languages: Python (PySpark) / SQL

🔄 Data Flow

image

About

End-to-end ETL pipeline on Databricks using Delta Live Tables (DLT) and Medallion Architecture. Based on the Databricks Learning Plan.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Contributors