This is the course project folder of M1522.006300 Distributed Systems of Group 17.
The goal of this project is to deploy and manage a prototype cloud cluster running batch processing WordLetterCount applications. There are two WordLetterCount applications implemented in different ways: one used the Spark API, the other used WordCount API and a self-designed resource scheduler.
Refer to the docs folder for useful guides.
``
The project specification is specified in Specification.md.
Refer to GCP guide for a detailed tutorial on how to configure, access and use your GCP clusters.
Our project ID is peaceful-fact-294309, you can use the web-based dashboard GCP Console to view our cluster, VMs and Pods.
- Deploy Google Dataproc on GKE (ref: Dataproc on Google Kubernetes Engine)
- Install
WordCountlocally to test - Test
WordCounton GKE- Deploy
Hadoopon GKE - Tweak
Hadoopdeployment, integration with GCS
- Deploy