A practical guide to building efficient and scalable data solutions
With step-by-step instructions and examples, this book will teach you the skills needed to build and deploy complex data pipelines on Kubernetes, resulting in efficient and scalable big data solutions.
This book covers the following exciting features:
- Install and use Docker to run containers and build concise images
- Gain a deep understanding of Kubernetes architecture and its components
- Deploy and manage Kubernetes clusters on different cloud platforms
- Implement and manage data pipelines using Apache Spark and Apache Airflow
- Deploy and configure Apache Kafka for real-time data ingestion and processing
- Build and orchestrate a complete big data pipeline using open-source tools
- Deploy Generative AI applications on a Kubernetes-based architecture
Part 1: Docker and Kubernetes
- ✅ Getting Started with Containers
- 📖 Kubernetes Architecture
- ✅ Getting Hands-On with Kubernetes
Part 2: Big Data Stack
- 📖 The Modern Data Stack
- ✅ Big Data Processing with Apache Spark
- ✅ Building Pipelines with Apache Airflow
- ✅ Apache Kafka for Real-Time Events and Data Ingestion
Part 3: Connecting It All Together
- ✅ Deploying the Big Data Stack on Kubernetes
- ✅ Data Consumption Layer
- ✅ Building a Big Data Pipeline on Kubernetes
- ⏸️ Generative AI on Kubernetes
- 📖 Where to Go from Here
