Review: PR-021-Batch Normalization

Joonsu Oh
1 min readMay 31, 2021

Introduction

  • Training a DNN is difficult — overparametrized model is sensitive to many factors
  • There is one problem in training a DNN — Internal Covariate Shift (A recent paper discusses the success of BN is not due to ICS but this is not the topic of discussion here)
  • The author’s precise definition of ICS is:

We define Internal Covariate Shift as the change in the distribution of network activations due to the change in network parameters during training.

Mechanism

  • We obtain sample mean and sample variance along the batch dimension.
  • Then, we normalize the given batch using the obtained sample mean and variance.
  • After, using learnable-parameters γ and β, we send the normalized distribution into some “desired (this is why they are learnable)” distribution that is easier for model to learn.

Effects

  • Make training converge faster by smoothening the loss surface. Smoothening effect is desirable as it diminishes mal-effect of various types of surface such as saddle points.
  • Also, we can be less careful of initialization as well.
  • This also gives a slight regularization effect. (Believe or not!)

--

--