Review: PR-297-DeiT

Joonsu Oh
Mar 30, 2022
  1. ViTs are not as efficient as EfficientNet in terms of its number of parameters.
  2. DeiT suggested a new simple architecture that simply added an additional distillation token on top of ViT and utilized knowledge distillation technique to boost performance of ViTs. The effect is so significant that DeiT is now as efficient as EfficientNet.
  3. One interesting thing to note is that CNNs work better than transformers as a teacher model.

--

--