Waste Segmentation and detection AIDL project

This repository contains the code developed by Martí Fabregat, Rafel Febrer, Ferran Miró-Gea and Miguel Ortiz as part of the AI with Deep Learning postgraduate course w24/25 at the UPC (Universitat Politècnica de Catalunya). Supervised by Amanda Duarte.

Our project aims to use AI to address environmental challenges, specifically waste management, by leveraging accessible tools such as cameras and deep learning. We fine tuned and compared models for both image classification and instance segmentation, evaluated different architectures and deployed the best performing models in the cloud with API access.

Getting started

Requirements

Create a conda environment by running

conda create --name waste-management python=3.12.8

Then, activate the environment

conda activate waste-management

To install the required python packages simply type

pip3 install -r requirements_TACO.txt

Datasets

TACO Dataset

The model has been trained using TACO Dataset by Pedro F Proença and Pedro Simões. For more details check the paper: https://arxiv.org/abs/2003.06975

To download the dataset images simply use:

python -m scripts.download

In case of image classification the taco dataset is divided in its annotations and the background is removed, so that the waste is surrounded by back pixels. This allows to use the Taco Dataset as a waste image classification dataset.

For image segmentation the Taco dataset is used "as it is".

The categories that will be used for classification and segmentation depend on the subversion of Taco Dataset selected.

Taco28 - Contains the complete Taco Dataset using the 28 supercategories as labels.
Taco5 - Contains a subsample of the images of the original Taco Dataset. (5 labels only). It is an "easier" task.
Taco1 - Contains the complete Taco Dataset using only 1 category as label ("waste").

As can be seen taco is a unbalanced dataset for waste image segmentation:

Viola77

The Viola77 dataset is used as well for classification. Under Apache 2.0 License.

Viola is a close to perfectly balanced dataset:

Exploratory data analysis

TACO Dataset

Explore the notebook eda_taco.ipynb, modified version of the original notebook from the TACO Repository that inspects the dataset. The dataset is in COCO format. It contains the source pictures, anotations and labels. For more details related with the datasource please refer to TACO Repository.

Viola77 Dataset

Explore the notebook eda_viola.ipynb. This Notebook contains and exploratory data analysis for the viola dataset and its classes, labels and distribution. For mor details related with the dataset please take a look into: https://huggingface.co/datasets/viola77data/recycling-dataset.

Image Classification

ResNet-50

We have trained a Residual Network 50 (ResNet50) model based on torchvision.models.resnet50, which was pretrained on ImageNet-1k:

Split dataset

To split the annotations for training and evaluation on ResNet-50 use split_dataset.py according to this explanation. It has several optional flags.

python -m scripts.split_dataset --dataset_dir=data --dataset_type=classification [--test_percentage=0.1] [--val_percentage=0.1] [--seed=123] [--verbose=False]

Indicate the annotations directory using --dataset_dir.
Indicate the dataset to use with --dataset_type flag. In case of classification use:
- --dataset_type=classification (Use it to run classification in taco28, taco5 or taco30viola11).

Optional:

Use --test_percentage if you want to use a test split different than default 0.1 (10%).
Use --val_percentage if you want to use a validation split different than default 0.1 (10%).
Use --seed if you want to have a different random output. Default 123.
Use --verbose (bool) if you want to have printed text on the console during execution.

Dataset classes

ResNet-50 for Viola77

This repository provides an implementation of ResNet-50 for waste classification using the Viola77 dataset. It includes scripts for hyperparameter optimization using Optuna, model training, and evaluation.

Prerequisites

Ensure you have the following dependencies installed:

pip install torch torchvision transformers optuna numpy pandas matplotlib seaborn scikit-learn tqdm datasets

Scripts Overview

1. `viola_dataset_resnet.py`

Description: Custom PyTorch dataset class for the Viola77 dataset.
Inputs:
- Hugging Face dataset
Outputs:
- Preprocessed dataset for training and testing

2. `optuna_resnet_hparams.py`

Description: Uses Optuna to find the best hyperparameters for training the ResNet-50 model.
Inputs:
- Search space includes learning rate, dropout rate and optimizer type.
Outputs:
- Saves the best hyperparameters in hparams.json.

3. `train_resnet_classification_opt.py`

Description: Trains a ResNet-50 model on the Viola77 dataset.
Inputs:
- Dataset from Hugging Face (viola77data/recycling-dataset)
- If enhanced_hparams=True, executes optuna_resnet_hparams.py first to determine the best hyperparameters before training.
Outputs:
- Saves the trained model as best_resnet50.pth
- Generates training metrics and confusion matrices

4. `test_resnet_classification.py`

Description: Evaluates the trained ResNet-50 model on the test dataset.
Inputs:
- Loads best_resnet50.pth
- Uses the Viola77 test dataset
Outputs:
- Accuracy, classification report, confusion matrix

How to Run

Train the Model

Without optimized hyperparameters:

python -m scripts.train_resnet_classification_opt

Optional:

Use --enhanced_hparams if you want to enhance the hparameters with the optuna library. This option will exectute the script: optuna_resnet_hparams.py
```
if args.enhanced_hparams:
  os.system("python -m utilities.optuna_resnet_hparams")
```
Use --lr if you want to use a different learning rate than default 0.00002.
Use --dropout if you want to use a dropout percentage different than default 0.2674 (26,74%).
Use --epochs if you want to use different epochs value than the default 15.

Test the Model

python -m scripts.test_resnet_classification

Results

Include plots and metrics here:

Training Loss & Accuracy:
Confusion Matrix (Train & Test):
- Train:
- Test:
Evaluation Histogram:
Classification Report:

                   precision    recall  f1-score   support

        aluminium       0.57      0.88      0.70        26
        batteries       0.91      0.88      0.89        24
        cardboard       0.74      0.93      0.82        15
disposable plates       0.88      0.92      0.90        24
            glass       0.85      0.63      0.72        35
     hard plastic       0.48      0.52      0.50        25
            paper       0.79      0.68      0.73        34
      paper towel       0.84      1.00      0.91        31
      polystyrene       0.80      0.71      0.75        34
    soft plastics       0.68      0.52      0.59        33
    takeaway cups       0.90      0.90      0.90        30

         accuracy                           0.76       311
        macro avg       0.77      0.78      0.76       311
     weighted avg       0.77      0.76      0.76       311

Overall Accuracy: 0.7621

ViT

We have trained a Vision Transformer (ViT) model based on google/vit-base-patch16-224-in21k, which was pretrained on ImageNet-21k:

Split dataset

To split the annotations for training and evaluation on ViT use split_dataset.py following the same procedure as in ResNet-50.

python -m scripts.split_dataset --dataset_dir=data --dataset_type=classification [--test_percentage=0.1] [--val_percentage=0.1] [--seed=123] [--verbose=False]

Dataset classes

ViT for Viola77

The Viola77Dataset class in custom_datasets/viola77_dataset.py provides functionality to load the Viola77 dataset for image classification.

ViT for Taco + Viola77

The Viola77 dataset for Image classification in custom_datasets/viola77_dataset.py has the functionality to load the Viola77 Dataset in for Image Classification.

Train

python -m scripts.train_resnet_classification

Test the model

python -m scripts.test_vit_classification --model_path [model_path] --image_path [image_path]

model_path is the path to the model checkpoint (mandatory)
image_path is an optional parameter to the path where the image is located. If not set, then an evaluation with the whole dataset is done.

Results

Training Loss & Accuracy: Here we can see the loss and accuracy curves during training for the train and validation datasets.
Test Accuracy and F1-score

Class	Accuracy (%)	F1 Score
Overall	82.64%	0.8248
aluminium	88.46%	0.7931
batteries	91.67%	0.9362
cardboard	93.33%	0.8750
disposable plates	95.83%	0.9583
glass	80.00%	0.8485
hard plastic	44.00%	0.4889
paper	88.24%	0.8696
paper towel	96.77%	0.9677
polystyrene	79.41%	0.8182
soft plastics	69.70%	0.6667
takeaway cups	86.67%	0.8667

Test Confussion Matrix

Comparing ResNet50 and ViT Models

The comparison between ViT and ResNet for image classification was based on accuracy, measured as the proportion of correctly classified images. On average, ViT achieved 82% accuracy, outperforming ResNet's 76%:

The higher performance was achieved at a significant cost. ViT has over three times more parameters and required nine times the computational resources and training time. With a relatively small dataset, this highlights a tradeoff: while ResNet is less accurate, it is significantly more efficient in terms of storage, computation, and environmental impact:

Instance segmentation

Training

Both models have a very similar number of parameters and have similar backbone freeze parameters. Thus, they have similar sizes.

However, Mask2Former is faster at training and has a lower computational and environmental footprint.

Mask R-CNN

Split dataset

To split the annotations for training and evaluation in Mask R-CNN use split_dataset.py.

python -m scripts.split_dataset --dataset_dir=data --dataset_type=taco1 [--test_percentage=0.1] [--val_percentage=0.1] [--seed=123] [--verbose=False]

Indicate the annotations directory using --dataset_dir.
Indicate the dataset to use with --dataset_type flag. It depends on the task to do. It can be:
- taco28 for instance segmentation in taco28 dataset (Taco dataset with 28 categories, includes all data)
- taco5 for instance segmentation in taco5 dataset (Taco dataset with a subsample of 5 categories)
- taco1 for instance segmentation in taco1 dataset (Taco dataset only segmenting waste from background, includes all data)

Optional:

Use --test_percentage if you want to use a test split different than default 0.1 (10%).
Use --val_percentage if you want to use a validation split different than default 0.1 (10%).
Use --seed if you want to have a different random output. Default 123.
Use --verbose (bool) if you want to have printed text on the console during execution.

Dataset classes

The Taco Dataset for mask R-CNN class in custom_datasets/taco_dataset_mask_r_cnn.py has the functionality to load the Taco Dataset in for Instance Segmentation.

Train

For train Mask R-CNN we use Resnet-50 as backbone and freeze all layer except Feature Pyramid Network (FPN). The other layers that train too are Region Proposal Network (RPN) and all the layers that are in Roi Heads that include the predictors for bounding boxes and maks. To train mask-rcnn in any of the dataset, do:

python -m scripts.train_mask_r_cnn

Checkpoints will be saved in the results/mask_r_cnn folder.

Evaluate

To evaluate the model in the test set of the dataset, do:

python -m scripts.test_mask_r_cnn --checkpoint_path your_checkpoint_path

Results

The metric we use is mAP of torch.metrics. The results are not good. The hipotesi is that the code that ajust the model make it that don't predict well and as mAP are sensible to bag predictions so the result are low.

Taco1

Description: Train Mask R-CNN in taco1 dataset. Overfiting begin at epoch 12 so stop the train and evaluate at 12 checkpoint.
Outputs:

Metrics

Metric	Value
bbox_map	0.21
bbox_map_50	0.40
bbox_map_75	0.20
bbox_map_small	0.14
bbox_map_medium	0.34
bbox_map_large	0.23
bbox_mar_1	0.17
bbox_mar_10	0.34
bbox_mar_100	0.39
bbox_mar_small	0.30
bbox_mar_medium	0.51
bbox_mar_large	0.40
segm_map	0.25
segm_map_50	0.40
segm_map_75	0.26
segm_map_small	0.11
segm_map_medium	0.39
segm_map_large	0.34
segm_mar_1	0.21
segm_mar_10	0.39
segm_mar_100	0.43
segm_mar_small	0.29
segm_mar_medium	0.57
segm_mar_large	0.50
Avg. IoU (respect labels)	0.09
Avg. IoU (no respect labels)	0.09
Avg. False positives rate	0.00
Avg. False negatives rate	0.01

predicted classes | 1

Taco5

Description: Train Mask R-CNN in taco5 dataset.
Outputs:

Metrics

Metric	Value
bbox_map	0.16
bbox_map_50	0.42
bbox_map_75	0.07
bbox_map_small	0.20
bbox_map_medium	0.19
bbox_map_large	0.19
bbox_mar_1	0.28
bbox_mar_10	0.31
bbox_mar_100	0.31
bbox_mar_small	0.20
bbox_mar_medium	0.49
bbox_mar_large	0.30
segm_map	0.31
segm_map_50	0.38
segm_map_75	0.34
segm_map_small	0.08
segm_map_medium	0.37
segm_map_large	0.43
segm_mar_1	0.39
segm_mar_10	0.48
segm_mar_100	0.48
segm_mar_small	0.15
segm_mar_medium	0.72
segm_mar_large	0.43
Avg. IoU (respect labels)	0.27
Avg. IoU (no respect labels)	0.31
Avg. Complete IoU (respect labels)	0.06
Avg. Complete IoU (no respect labels)	0.12
Avg. False positives rate	0.58
Avg. False negatives rate	0.10

predicted classes: [1, 2, 3, 4, 5]

Taco28

Description: Train Mask R-CNN in taco28 dataset.
Outputs:

Metrics

Metric	Value
bbox_map	0.04
bbox_map_50	0.09
bbox_map_75	0.04
bbox_map_small	0.03
bbox_map_medium	0.09
bbox_map_large	0.09
bbox_mar_1	0.10
bbox_mar_10	0.13
bbox_mar_100	0.14
bbox_mar_small	0.07
bbox_mar_medium	0.20
bbox_mar_large	0.21
segm_map	0.07
segm_map_50	0.11
segm_map_75	0.07
segm_map_small	0.03
segm_map_medium	0.14
segm_map_large	0.15
segm_mar_1	0.14
segm_mar_10	0.18
segm_mar_100	0.18
segm_mar_small	0.09
segm_mar_medium	0.26
segm_mar_large	0.29
Avg. IoU (respect labels)	0.24
Avg. IoU (no respect labels)	0.05
Avg. False positives rate	0.83
Avg. False negatives rate	0.32

predicted classes: [1, 4, 5, 6, 7, 8, 9, 10, 13, 14, 15, 17, 18, 20, 21, 22, 23, 26, 27, 28]

Mask2Former

Split dataset

To split the annotations for training and evaluation on Mask2Former use split_dataset.py following the same procedure as in Mask R-CNN.

python -m scripts.split_dataset --dataset_dir=data --dataset_type=taco1 [--test_percentage=0.1] [--val_percentage=0.1] [--seed=123] [--verbose=False]

Dataset classes

Train

The mask2former model has been finetunned using the weights from facebook/mask2former-swin-tiny-ade-semantic.

To train mask2former model in any of the datasets, do:

python -m scripts.train_mask2former_segmentation --dataset_type=taco1 [--batch_size=1] [--checkpoint_path=your_checkpoint_path]

Checkpoints will be saved in the results folder.

Evaluate

To evaluate the model in the test set of the dataset, do:

python -m scripts.test_mask2former_segmentation --checkpoint_path=your_checkpoint_path

parser.add_argument('--checkpoint_path', required=False, help='Checkpoint path', type=str, default="")

Results

In general Mask2former learns to segment well the background from the waste but fails mostly in classifying the waste. Notice that, because of the unbalance of the dataset some classes may never apear in the test dataset and therefore the mIoU could be 0.

Taco1

Description: Training of the Mask2former using the taco1 dataset in the task of instance segmentation.
Outputs: Train of the Mask2former in taco1 for 12 epochs:

Best model is found at epoch 20. Despite loss bottomed at epoch 18 valuation metrics (mIoU) kept going up and therefore epoch 20 looks still a bit more promissing. Maybe further learning could be conducted to improve results.

Metrics:

Categories	mIoU
Background	0.991115391254425
Waste	0.6704526543617249

Taco5

Description: Training of the Mask2former using the taco5 dataset in the task of instance segmentation.
Outputs: Train of the Mask2former in Taco5 for 20 epochs:

In this case, the best model is not that simple to determine. Despite loss bottomed at epoch 13 valuation metrics (mIoU) kept going up for most of the categoies (except class 3 - Cup) and therefore epoch 20 looks more promissing. Maybe further learning could be conducted to improve results.

Metrics:

Categories	mIoU
Background	0.9847108721733093
Bottle	0.05829409137368202
Carton	0.07693199068307877
Cup	0.12679843604564667
Can	0.1528954803943634
Plastic film	0.0

Taco28

Description: Training of the Mask2former using the taco28 dataset in the task of instance segmentation.
Outputs: Train of the Mask2former in Taco28 for 20 epochs:

Best model is found at epoch 20. Despite loss bottomed at epoch 17 valuation metrics (mIoU) kept going up and therefore epoch 20 looks still a bit more promissing. Maybe further learning could be conducted to improve results.

Metrics:

Categories	mIoU
Background	0.9908596873283386
Aluminium foil	0.0
Battery	0.0
Blister pack	0.0
Bottle	0.09910181164741516
Bottle cap	0.043327733874320984
Broken glass	0.0
Can	0.060843873769044876
Carton	0.05095602571964264
Cigarette	0.015832440927624702
Cup	0.03592093661427498
Food waste	0.0
Glass jar	0.0
Lid	0.012202137149870396
Other plastic	0.020540520548820496
Paper	0.016484104096889496
Paper bag	0.0
Plastic bag & wrapper	0.1564932018518448
Plastic container	0.0
Plastic glooves	0.0
Plastic utensils	0.0
Pop tab	0.0
Rope & strings	0.0028102947399020195
Scrap metal	0.0
Shoe	0.0
Squeezable tube	0.0
Straw	0.00902826339006424
Styrofoam piece	0.008733779191970825
Unlabeled litter	0.0053437091410160065

MLOps

Running the application with Docker

Build the image with: docker build -t waste-detection-app .

Run specific Python file: docker run --rm waste-detection-app <FILE_NAME.py>

Google Cloud

This repository automates the setup of the GCP infrastructure. It contains the following Bash scripts:

./setup_gcp_infrastructure.sh for setting up a VM, pull a Git repository and run the startup_script.sh.
./delete_gcp_infrastructure.sh for deletting the infrastructure.
./upload_model_checkpoint.sh to upload checkpoint files to a shared Google Cloud Storage.
./download_model_checkpoint.sh to download checkpoint files from a shared Google Cloud Storage to the local instance.

Further details on Google Cloud setup and utilities can be found in the GCP Utils Documentation.

API

To evaluate images outside the dataset an API with WebApp has been developed.

API configuration:

Before launching the API you need to create a .env file in the project root folder.

The .env file contains 3 variables, for example:

FLASK_SECRET_KEY=secret_key
MODEL_NAME=MASK2FORMER
CHECKPOINT=checkpoint_epoch_7_mask_rcnn_taco1.pt

The MODEL_NAME can be MASK2FORMER or MASK_R-CNN. The API has not been abilitated yet for classification models.

The checkpoints of the models to test for the app should be placed in the folder app/checkpoint.

Running the API

To run the api in local and in debug mode, do: python -m app.app The flask app will be launched in your localhost.

Demo

By using the developed API the user can detect waste using the API or by using the web app (GUI). In the following points is explained how to interact with.

Test images with API

Make sure the API is running in your localhost. Then, use the example code provided in the file test_api.py to create you own request to the API. Just run: python -m app.test_api The API will process the pictures sent and will return the detections and the images with the segmentation. Check app/resultsfolder after running test_api.py to see the results.

WebApp GUI

The WebApp allows to process images in a more user frendly approach. To use the user interface open your prefered browser and connect to your localhost port 8000: http://localhost:8000.

If the API is running you should get to the home page: Select a picture and click on Upload.

You should see your picture in the web. Click on Predict to generate the segmentations.

The output will be different according to the model used in the API. Here an example: To try another picture use the Try a different image button.

Name		Name	Last commit message	Last commit date
Latest commit History 192 Commits
.vscode		.vscode
app		app
custom_datasets		custom_datasets
data		data
gcp_utils		gcp_utils
metrics/resnet/test		metrics/resnet/test
model		model
notebooks		notebooks
readme_images		readme_images
scripts		scripts
utilities		utilities
.DS_Store		.DS_Store
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Waste Segmentation and detection AIDL project

Table of Contents

Getting started

Requirements

Datasets

TACO Dataset

Viola77

Exploratory data analysis

TACO Dataset

Viola77 Dataset

Image Classification

ResNet-50

Split dataset

Optional:

Dataset classes

ResNet-50 for Viola77

Prerequisites

Scripts Overview

1. viola_dataset_resnet.py

2. optuna_resnet_hparams.py

3. train_resnet_classification_opt.py

4. test_resnet_classification.py

How to Run

Optional:

Results

ViT

Split dataset

Dataset classes

ViT for Viola77

ViT for Taco + Viola77

Train

Test the model

Results

Comparing ResNet50 and ViT Models

Instance segmentation

Training

Mask R-CNN

Split dataset

Optional:

Dataset classes

Train

Evaluate

Results

Taco1

Taco5

Taco28

Mask2Former

Split dataset

Dataset classes

Train

Evaluate

Results

Taco1

Taco5

Taco28

MLOps

Running the application with Docker

Google Cloud

API

API configuration:

Running the API

Demo

Test images with API

WebApp GUI

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

1. `viola_dataset_resnet.py`

2. `optuna_resnet_hparams.py`

3. `train_resnet_classification_opt.py`

4. `test_resnet_classification.py`

Packages