Review: PR-024-Pixel Recurrent Neural Network
Intuition
- Auto-regressive model!
- Simple and stable training process
- Tractable likelihood
Masked Convolution
- To computed the center pixel, the masked convolution is as follows:
- In the first layer, channels cannot be connected to themselves because they have not been generated yet.
Architecture
Row LSTM
- One thing to note here is c_i-1.
- Since receptive field looks like an inverted triangle, if we don’t make use of c_i-1, we cannot provide any information about pixel’s left side.
- This goes the same for Diagonal LSTM, which will be followed by soon.
Diagonal LSTM
- Note we skew the feature maps so it can be parallelized.
Pixel CNN
- LSTMs are too slow for generation.
- Why not only use CNN?
Some Details
- Treat pixels as discrete variables: to estimate a pixel value, do classification in all channels (256 classes indicating pixel values 0–255)
- This is implemented by a final softmax layer.