As task not contain dataset, i suggest to use kaggle water meter dataset.
P.S. Because of not simple install of some packages there is no .py files. Programm can be tested step by step with notebooks. .zip archive contains requirements.txt, but pipreqs can not detect detectron2 and it's may not work with pip also.
- Semantic segmentation model to allocate the numbers of the water meter and then crop it.
- OCR model to predict exact numbers on the cropped image.
- Combine it together.
Data is not prepared for segmentation due to not same shape and not normalized, also it's useful to check if there is an empty masks or image id is not the same as mask id. Seg_train.ipynb contains some visualization of the provided data for each step. After checking we rezize it into shape (256, 256) via opencv and, as usual, we should not forget to use augmentations with albumentations.
Now splititing dataset into validation and training parts and we are ready to train our model.
- Architecture: UNet & EfficientNetB0
- Loss function: FocalLoss (DiceLoss or DiceBCELoss might be better)
- Metric: Dice coefficient
- Optimizer: Adam (lr=1e-3, decay=1e-6)
- learning scheduler: ReduceLROnPlateau(factor=0.5, patience=5)
| Architecture | dice_coef | Input & Mask Resolution | Epochs | steps per epoch |
|---|---|---|---|---|
| UNet & EfficientNetB0 | 0.6283 | (256x256) | 50 | 200 |
Before starting cropping dataset, should to say, i will use masks from dataset, not output of the segmentation model because it's better to train ocr model on the best dataset. Naturally i will create a prediction program for an single image using mask from segmentation model.
After using bitwise opencv operation on images with help of the masks, we need to care about rotation of the new cropped image:
After rotating:
Our objective is to input cropped photos into our OCR model utilizing the segmentation model's generated masks and associated images. Then, using the 200-meter sample of manually labeled images, we will train a Faster RCNN model. Our objective is to create a Faster RCNN model that can identify meters' digits with accuracy and forecast their values. We will parse the data and reformat the predictions using the output data from such a model on test photos so that the predictions appear in order from left to right. The digits will then be properly combined to get the final meter reading
Already created and ready for ocr model dataset can be downloaded via link: download
Our photos are split and labeled with the appropriate label for each digit, as shown above. Dataset is divided into training (70%), validation (20%), and testing (10%) datasets, to train a special Detectron2 Faster RCNN model, according to article
To specify, should to say, we will use faster_rcnn_X_101_32x8d_FPN, but Detectron2 allows you many options in determining your model architecture, which you can see in the Detectron2 model zoo
After trainig there is a raw output, here are steps to extract predicitions:
- Read image and get output information.
- Find predicted boxes and labels.
- Obtain list of all predictions and the leftmost x-coordinate for bounding box.
- Sort the list based on x-coordinate in order to get proper order or meter reading.
- Get final order of identified classes, and map them to class value.
- Add decimal point to list of digits depending on number of bounding boxes.
- Combine digits and convert them into a float.
To try yourself this model (UNet&EfficientNetB0 + faster_rcnn_X_101_32x8d_FPN + post proccesing logic) we need create a program that:
Before start to describe steps we need 3 files: water_meter.h5, model_final.pth, Water counter.jpg
- Resize (256,256)
- Normalize data (/255.)
- Load model (Not forget about custom loss and metrics)
- Each pixel is a probability [0,1] of a mask, we can cluster them: pixelvalue > 0.5
- Cropping using bitwise operation.
- Use model_final.pth and repeate steps to extract predicitions
Example:
Meter reading: 1137.075
Task 1.zip contains notebooks for:
Seg_train.ipynb for training segmentation model (part 1).
Colab_links.txt links for part 2 and 3.


