Bounding Box Formats for Models like YOLO/SSD/RCNN/Fast RCNN/ Faster RCNN

3 min readJun 2, 2021

What are Bounding Boxes?

Bounding boxes are the coordinates of the rectangular border that fully enclose a digital image when it is placed over a page, a canvas, a screen or other similar bi-dimensional background.

One of the most prominent uses of bounding boxes is to use them in tasks like object detection where we need to detect and point to where an object lies at.

Look at the following example. We initially have an image of a Dog and a Cat and we want to perform object detection on it.

For this,we will have to draw bounding boxes on the images to train an Object Detection model like YOLO/RCNN/SSD etc.

This is the following result after drawing the bounding boxes on the above image.

But to draw the bounding boxes, we need to know the format of these bounding boxes/ coordinates, which will help us debug/create custom datasets for any model.

SSD/ RCNN/ Fast RCNN/ Faster RCNN

SSD/ RCNN/ Fast RCNN/ Faster RCNN use the same format while training an object detection model. They use the Pascal VOC dataset format.

In this format, the bounding box is represented in as follows

[x_min, y_min, x_max, y_max]

where, x_min (x-minimum), y_min (y-minimum) are the coordinates of the top-left corner and x_max (x-maximum) , y_max (y-maximum) are coordinates of the bottom-right corner of the bounding box.

In the following image, the coordinates of the bounding box in the Pascal VOC format will be [30, 15, 395, 440] as they are the x_min, y_min, x_max, y_max coordinates respectively.

YOLO

In YOLO, the bounding box is represented as

[x_center, y_center, width, height]

where, x_center, y_center are the normalized coordinates of the center of the bounding box and width, height are the normalized width and height of the image.

The normalization is as follows:

For x_center, y_center : Find the center of the bounding box and divide by the width of the image for x_center and height of the image for y_center.
For width, height : Find the width and height of the bounding box and divide by the width and height of the image.

For example in the below image is in this particular format

[ ((30+395 ) / 2) / 580, ((410 + 15) / 2) / 440, (395–30)/ 580, (410–15)/ 440 ]

which are

[0.366379, 0.482954, 0.629310, 0.897727]

CreateML

For CreateML, the the bounding box is represented as

[x_center, y_center, width, height]

where, x_center, y_center are the coordinates of the center of the bounding box and width, height are the width and height of the image.

The only difference between CreateML and YOLO is that, YOLO uses normalized values for bounding boxes whereas CreateML uses values without normalizing them.

In the above example, the bounding boxes would be as follows:

[ ((30+395 ) / 2) , ((410 + 15) / 2) , (395–30) , (410–15) ]

which are

[212.5, 212.5, 365, 395]

In this way different formats can be used for creating custom bounding boxes on images for object detection.

In the next article, we will look at how we can create these bounding boxes using labelImg.