Review: PR-116-Glow: Generative Flow with Invertible 1x1 Convolutions

Joonsu Oh
4 min readAug 22, 2021

What is flow based generative models?

Figure 1
  • Auto-encoder tries to find two functions f and g where they satisfy x = g(f(x)). VAE adds some condition by which z is approximated using Gaussian distribution.
  • Flow-based models are different from them. They try to learn a sequence of transformations that are invertible and try to solve this using a simple negative log-likelihood.

How does flow based generative models work?

Figure 2

According to Figure 2:

Figure 3

Continuing from Figure 3:

Figure 4

Figure 4 can be explained by the following equation:

Figure 5

Using this fact, we can express a sequence of transformation as follows:

Figure 6

Finally, we need to have a criterion in order to train a deep learning model. This can be simply achieved by defining an NLL loss since we have a simple expression for log p(x).

Figure 7

Glow

The Glow (Kingma and Dhariwal, 2018) model extends the previous reversible generative models, NICE and RealNVP, and simplifies the architecture by replacing the reverse permutation operation on the channel ordering with invertible 1x1 convolutions.

Fig. 3. One step of flow in the Glow model. (Image source: Kingma and Dhariwal, 2018)

There are three substeps in one step of flow in Glow.

Substep 1: Activation normalization (short for “actnorm”)

It performs an affine transformation using a scale and bias parameter per channel, similar to batch normalization, but works for mini-batch size 1. The parameters are trainable but initialized so that the first minibatch of data have mean 0 and standard deviation 1 after actnorm.

Substep 2: Invertible 1x1 conv

Between layers of the RealNVP flow, the ordering of channels is reversed so that all the data dimensions have a chance to be altered. A 1×1 convolution with equal number of input and output channels is a generalization of any permutation of the channel ordering.

Say, we have an invertible 1x1 convolution of an input h×w×ch×w×c tensor hh with a weight matrix WW of size c×cc×c. The output is a h×w×ch×w×c tensor, labeled as f=conv2d(h;W)f=conv2d(h;W). In order to apply the change of variable rule, we need to compute the Jacobian determinant |det∂f/∂h||det∂f/∂h|.

Both the input and output of 1x1 convolution here can be viewed as a matrix of size h×wh×w. Each entry xijxij (i=1,…,h,j=1,…,wi=1,…,h,j=1,…,w) in hh is a vector of cc channels and each entry is multiplied by the weight matrix WW to obtain the corresponding entry yijyij in the output matrix respectively. The derivative of each entry is ∂xijW/∂xij=W∂xijW/∂xij=W and there are h×wh×w such entries in total:

log∣∣∣det∂conv2d(h;W)∂h∣∣∣=log(|detW|h⋅w|)=h⋅w⋅log|detW|log⁡|det∂conv2d(h;W)∂h|=log⁡(|detW|h⋅w|)=h⋅w⋅log⁡|detW|

The inverse 1x1 convolution depends on the inverse matrix W−1W−1. Since the weight matrix is relatively small, the amount of computation for the matrix determinant (tf.linalg.det) and inversion (tf.linalg.inv) is still under control.

Substep 3: Affine coupling layer

The design is same as in RealNVP.

Fig. 4. Three substeps in one step of flow in Glow. (Image source: Kingma and Dhariwal, 2018)

--

--