Review: PR-024-Pixel Recurrent Neural Network

Joonsu Oh
2 min readJun 14, 2021

Intuition

Figure 1
  • Auto-regressive model!
  • Simple and stable training process
  • Tractable likelihood

Masked Convolution

  • To computed the center pixel, the masked convolution is as follows:
Figure 2
Figure 3
  • In the first layer, channels cannot be connected to themselves because they have not been generated yet.

Architecture

Figure 4

Row LSTM

Figure 5
  • One thing to note here is c_i-1.
  • Since receptive field looks like an inverted triangle, if we don’t make use of c_i-1, we cannot provide any information about pixel’s left side.
  • This goes the same for Diagonal LSTM, which will be followed by soon.

Diagonal LSTM

Figure 6
Figure 7
  • Note we skew the feature maps so it can be parallelized.

Pixel CNN

  • LSTMs are too slow for generation.
  • Why not only use CNN?
Figure 8

Some Details

  • Treat pixels as discrete variables: to estimate a pixel value, do classification in all channels (256 classes indicating pixel values 0–255)
  • This is implemented by a final softmax layer.

--

--