Given a convolutional feature map, output the same sized feature map with a different channel size, where the channel size is 6*(num_classes + 4). (6 is number of bounding boxes).
If we perform such operation on different sizes of feature maps, we can obtain bounding boxes for both relatively bigger and smaller sized objects.
This gives higher accuracy than that of YOLO and higher FPS than that of Faster R-CNN. SSD is a good trade off between Faster R-CNN and YOLO.