Traffic Sign Detection using YOLO
We will be discussing the problem statement from exploratory data analysis to pipeline deployment.
Table Of Content:-
- Introduction
- Business problem
- Mapping to DL problem
- Understanding the Data
- EDA
- Preprocessing of Data
- Understanding YOLO
- Model
- Results
- Post-training Quantization
- Deployment of Model
- Future work
- Profile
- References
1.Introduction:-
Traffic sign detection is a challenging real-world problem of high industrial relevance. Even autonomous vehicle companies are recently working on upgrading their traffic lights and stop signs detecting techniques. The objective of this case study is to detect traffic signs and classify them.
2.Business problem:-
The objective is to detect traffic signs that are present inside an image. There is a latency requirement for this problem. we need to build a solution such that the model should be able to detect the signs within seconds.
3.Mapping to DL problem:-
We will use object detection models to detect the traffic signs in this problem statement. We will use IOU as an evaluation metric here. It helps us benchmark the accuracy of our model predictions. Using it, we can figure out how well does our predicted bounding box overlaps with the ground truth bounding box. The higher the IoU, the better the performance.
4.Understanding The Data:-
The dataset consists of 900 images in ppm format. The dataset also contains a text file that is in CSV format and consists of the ground truth for all traffic signs in the images. The ground truth file contains the image filename, bounding box coordinates which contain the left, top, right, and bottom coordinates of traffic signs present in the image, and the class id of the traffic sign.
5.EDA:-
We will convert the txt file into pandas dataframe.
There are a total of 600 images in training data and a total of 852 rows in the dataset. Multiple rows for an image refer to multiple traffic signs in the same image.
Let's visualize the ppm image file and check its size.
Now we will check the distribution of different class IDs in the dataset.
The class id 1,2,10,12,13,38 present high in number where as 0,19,24,27,31,37 are fewer in number.
Now we will categorize 43 classes into 4 classes. ( prohibitory, mandatory, danger, other).
Now again we will check the distribution of these newly created classes.
6.Data Preprocessing:-
We will create a file where we will store the png images from the ppm files.
Here we removed those images which don't contain any traffic sign.
We will now create an annotation file for each image which will be stored in XML format.
Now we will create TFRecord files which will be used while training our YOLO model. The TFRecord format is a simple format for storing a sequence of binary records. We will use the XML files to generate the TFRecord files. we will split the images into train TFRecord and test TFRecord files.
7.Understanding YOLO:-
We will use YOLOV3 for training our custom object detection model. It is a single-shot detector that also runs quite fast and predicts the output in real-time.
Let's understand some terminology related to YOLO.
- Grid cell:-
YOLOv3 devices the images into three granularity levels such as 52*52,26*26,13*13 grid cells.
2.Anchor box or bounding box:-
YOLO can work well for multiple objects where each object is associated with one grid cell. But when one grid cell contains a center point of two different objects then we can use bounding boxes to allow one grid cell to detect multiple objects. For each anchor boxes, three values are associated with it. The location of the bounding box(x,y,w,h), the objectness score, class probability which is a vector of size equal to the number of classes. Here we are predicting the 4+1+4 size of vector for each anchor box.
3.Non-maximal supression:-
Yolo uses NMS to only keep the best bounding box. The First step in NMS is to remove all the predicted bounding boxes that have a detection probability less than a given NMS threshold.
YOLO architecture:-
YOLO v3 uses a variant of Darknet, which originally has a 53 layer network on IMageNet.For the task of detection, 53 more layers are stacked onto it, giving us a 106 layer fully convolutional underlying architecture for YOLO v3.In YOLO v3, the detection is done by applying 1 x 1 detection kernels on feature maps of three different sizes at three different places in the network.The shape of detection kernel is 1 x 1 x (B x (5 + C)). Here B is the number of bounding boxes a cell on the feature map can predict, ‘5’ is for the 4 bounding box attributes and one object confidence and C is the no. of classes.YOLO v3 uses binary cross-entropy for calculating the classification loss for each label while object confidence and class predictions are predicted through logistic regression.
YOLO loss:-
The YOLO loss for each box prediction is comprised of the following terms-
- Coordinate loss — due to a box prediction not exactly covering an object,
- Objectness loss — due to a wrong box-object IoU prediction,
- Classification loss — due to deviations from predicting ‘1’ for the correct classes and ‘0’ for all the other classes for the object in that box.
More can be read about YOLO here.
8.Modelling:-
The training is done by using the train TFrecord file and the validation TFrecord file is used for validation.
different hyperparameters used need to be passed for training.
- learning rate-0.001
- threshold value-0.4
- image size-416*416*3
The training was done till validation loss reached 15.76. Training code can be referred from my Github repository.
9.Results:-
Now we will test our model’s output.
10.Post-training Quantization:-
Post-training quantization is a conversion technique that can reduce model size while also improving CPU and hardware accelerator latency, with little degradation in model accuracy. You can quantize an already-trained float TensorFlow model when you convert it to TensorFlow Lite format using the TensorFlow Lite Converter.
Above code convert model to TFlite format.After quantization model size has been reduced to 50MB from 240MB.It's a 4x size reduction.
The above piece of code will load the TFlite weight and predict the output.
As we can see there is no major difference in the output of both the TFlite model and the original model.
11.Deployment of model:-
We will use Streamlit for deploying our DL model. Here we will display the predicted image with a bounding box and then we will crop the detected images from the predicted image and display those separately.
The above code will crop the detected sign from the image.
The above code snippet is a sample used for the deployment of the model. The output of the webpage looks like this.
Some images of deployment:-
12.Future Work:-
- we can use increase the dataset size by using some augmentation technique and retrain the model.
- we can do some post-training data analysis on which images the model did not perform well and do some retraining for those images with more data points.
13.Profile:-
If u have any query feel free to contact me on Linkedin. You can find my full project here.
14.References:-
- https://www.appliedaicourse.com/
- https://machinelearningmastery.com/how-to-perform-object-detection-with-yolov3-in-keras/
- https://github.com/zzh8829/yolov3-tf2
- https://medium.com/@anirudhsr97/german-traffic-signs-detection-using-yolov3-ab7b974fca4e
- https://afrozchakure.medium.com/yolov3-you-only-look-once-12de76ad74d5