Review: PR-024-Pixel Recurrent Neural Network

2 min readJun 14, 2021

--

Intuition

Figure 1

Auto-regressive model!
Simple and stable training process
Tractable likelihood

Masked Convolution

To computed the center pixel, the masked convolution is as follows:

Figure 2

Figure 3

In the first layer, channels cannot be connected to themselves because they have not been generated yet.

Architecture

Figure 4

Row LSTM

Figure 5

One thing to note here is c_i-1.
Since receptive field looks like an inverted triangle, if we don’t make use of c_i-1, we cannot provide any information about pixel’s left side.
This goes the same for Diagonal LSTM, which will be followed by soon.

Diagonal LSTM

Figure 6

Figure 7

Note we skew the feature maps so it can be parallelized.

Pixel CNN

LSTMs are too slow for generation.
Why not only use CNN?

Figure 8

Some Details

Treat pixels as discrete variables: to estimate a pixel value, do classification in all channels (256 classes indicating pixel values 0–255)
This is implemented by a final softmax layer.

Joonsu Oh

Written by Joonsu Oh

Help
Status
About
Careers
Blog
Privacy
Terms
Text to speech
Teams