Review: PR-021-Batch Normalization

1 min readMay 31, 2021

--

Introduction

Training a DNN is difficult — overparametrized model is sensitive to many factors
There is one problem in training a DNN — Internal Covariate Shift (A recent paper discusses the success of BN is not due to ICS but this is not the topic of discussion here)
The author’s precise definition of ICS is:

We define Internal Covariate Shift as the change in the distribution of network activations due to the change in network parameters during training.

Mechanism

We obtain sample mean and sample variance along the batch dimension.
Then, we normalize the given batch using the obtained sample mean and variance.
After, using learnable-parameters γ and β, we send the normalized distribution into some “desired (this is why they are learnable)” distribution that is easier for model to learn.

Effects

Make training converge faster by smoothening the loss surface. Smoothening effect is desirable as it diminishes mal-effect of various types of surface such as saddle points.
Also, we can be less careful of initialization as well.
This also gives a slight regularization effect. (Believe or not!)

Joonsu Oh

Written by Joonsu Oh

Help
Status
About
Careers
Blog
Privacy
Terms
Text to speech
Teams