Review: PR-284-End-to-End Object Detection with Transformers (DETR)

Mar 29, 2022

This is the first try that uses a transformer for object detection.

Originally, object detection has been done with convolution layers and major architectures such as Faster R-CNN, YOLO are not fully differentiable mainly due to NMS operation.
DETR utilized a transformer to increase large object detection ability and is fully differentiable by removing NMS operations.
DETR made use of Hungarian loss for bipartite matching between label and predicted boxes. Furthermore, DETR can also be naturally extended to be used for instance segmentation by adding a small mask head on the output of decoder.

Written by Joonsu Oh