Yolo_2D_Segmentation

This project demonstrates the capabilities of the YOLO model for real-time instance segmentation. Specifically, the YOLOv8m architecture is trained on a custom dataset to evaluate its performance in accurately detecting and segmenting objects under application-specific conditions.

The project documentation outlines the complete workflow adopted in this study, including:

generation of segmentation annotations for ground truth labels
preprocessing of the image and corresponding label datasets
training of the YOLO-based segmentation model
quantitative and qualitative evaluation of the trained model.

Dataset Preparation

The objective of this project is to detect butterflies in images and generate precise instance-level segmentation masks delineating their boundaries. The dataset comprises a diverse collection of butterfly images captured under varying orientations, scales, and visual conditions.

The figures below illustrate a representative sample from the image dataset used for generating segmentation annotations.

The Python package manager (pip) provides access to the annotation tool LabelMe, which is used to create ground truth annotations and associate semantic labels with each image. The tool generates a structured JSON file that stores the polygon coordinates defining the segmentation mask, along with the corresponding class label.

The figure below illustrates the command used to install the LabelMe annotation tool via pip.

The annotation tool is then used to generate ground truth data for the image dataset. The images are loaded by specifying the paths to the input and output directories via the File menu. Polygonal segmentation masks are manually drawn around the object of interest—specifically, the butterfly using the Create Polygon option.

Once a closed polygon is completed, a corresponding class label can be assigned to the annotated object.

For each annotated image, a JSON file is automatically generated, containing the associated label information and the normalized coordinates of the segmentation mask. The screenshot below illustrates an example of the generated annotation data.

It is important to note that the manual generation of ground truth annotations is a time-intensive process that requires significant attention to detail. Furthermore, the JSON files produced by LabelMe are not directly compatible with the YOLO training pipeline. To address this limitation, the Python pip ecosystem provides an additional conversion utility, labelme2yolo.

The labelme2yolo tool converts the annotation data into YOLO-compatible text files, enabling their use in both object detection and instance segmentation tasks. The figure below illustrates the command used to install the labelme2yolo package via pip.

The label and ground truth coordinate files are generated using the following command:

It should be noted that the --output_format=polygon argument is required to produce annotation files suitable for segmentation tasks. By default, the tool generates output formatted for object detection only and does not include polygon-based segmentation information.

This concludes the dataset preparation phase. The YOLO model can now be downloaded and trained using the custom-prepared dataset.

YOLO Training

The YOLOv8m-seg architecture is employed in this project to detect the presence of butterflies and generate corresponding instance-level segmentation masks for each detected object.

The file paths for the training and validation image datasets are specified and managed through a YAML configuration file, ensuring a structured and reproducible training setup.

The table below summarizes the key software libraries installed and used for training the model.

Library	Description
Ultralytics	Ultralytics provides the instance to download the YOLO model and the arguments necessary to initiate the training
Pytorch	Pytorch provides the backbone framework to support the YOLO training

Although the Ultralytics framework automatically installs the required PyTorch dependencies, the default installation does not include CUDA support for GPU acceleration. As a result, the model may be restricted to CPU-based training unless additional configuration is performed.

To enable GPU-accelerated training, it is therefore recommended to install PyTorch directly from the official homepage. This installation provides the necessary CUDA support, allowing the model to leverage available GPU resources and significantly reduce training time.

The table below provides the information on training parameters:

Parameter	Value
task	segment
mode	train
epochs	100
batch size	8
imgsz	640

The table below presents the number of samples used for training and validation, providing insight into the dataset distribution.

Mode	Number of samples
Training	300
Validation	100

Upon successful completion of the training process, the pretrained model checkpoints best.pt and last.pt are generated in the runs directory, along with additional training logs and performance metrics. Model evaluation is conducted using the best.pt checkpoint, which corresponds to the model state that achieved the highest validation performance.

YOLO Inferencing

This section presents the inference results produced by the model trained on the custom dataset. The trained YOLOv8 segmentation model is evaluated using both still images and video sequences to assess its detection and segmentation performance under different input modalities.

Inference using an image

The image shown below is provided as input to the YOLO model during the inference phase.

The corresponding prediction generated by the network is illustrated below. The model successfully identifies the object of interest by producing a bounding box and an associated instance segmentation mask, along with the predicted confidence score.

In addition to image-based evaluation, the detector is assessed using a video sequence. The output video generated by the YOLO model is shown below.

butterfly_video_output_screenrecording.mp4

The model demonstrates the ability to detect and segment multiple butterflies within a single frame. The resulting video output highlights the efficiency of the detector in handling multiple instances simultaneously.

butterfly_video2_output_screenrecording.mp4

References

Dataset: https://github.com/ayoolaolafenwa/PixelLib/releases
Youtube tutorial: https://youtu.be/DMRlOWfRBKU?si=pDfLNM-utDjEmnFV

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Pictures		Pictures
README.md		README.md
dataset.yaml		dataset.yaml
predict.py		predict.py
yolo_format_change.py		yolo_format_change.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Yolo_2D_Segmentation

Dataset Preparation

YOLO Training

YOLO Inferencing

Inference using an image

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Yolo_2D_Segmentation

Dataset Preparation

YOLO Training

YOLO Inferencing

Inference using an image

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages