1. Train a DCGAN to properly approximate the normal (not Gaussian!) distribution of images of interest. With this, we have one generator that can map a latent space to normal images and a discriminator that can discriminate real and fake images.
  2. Now, we have a trained DCGAN. Say, we have a new image that is unknown whether the image is normal or abnormal. How can we know if this image is normal or abnormal using the trained GAN? We can do the next.
  3. Find the z that gives the most visually similar G(z) to the given image. How? Set up a…

What is flow based generative models?

Figure 1
  • Auto-encoder tries to find two functions f and g where they satisfy x = g(f(x)). VAE adds some condition by which z is approximated using Gaussian distribution.
  • Flow-based models are different from them. They try to learn a sequence of transformations that are invertible and try to solve this using a simple negative log-likelihood.

How does flow based generative models work?

One of the challenges in the study of generative adversarial networks is the instability of its training

  • Spectral Normalization is a novel weight normalization technique to stabilize the training of discriminator of GANs by enforcing Lipschitz constraint on the discriminator of GANs. It only requires tuning Lipschitz constant in order to produce satisfactory performance.


  • WGAN and WGAN-GP tried to address the problem — instability of training of GANs.
  • WGAN: by clipping weights of discriminator
  • WGAN-GP: by introducing gradient penalty on its loss function
  • However, even WGAN-GP cannot impose regularization on the entire function (discriminator) space outside of the supports of…

Depthwise Separable Convolution

  • MobileNetV1
Figure 1

Linear Bottlenecks

  • For an input set of real images, we say that the set of layer activation forms a “manifold of interest”.
  • It has been long assumed that manifolds of interest in neural network could be embedded in low-dimensional subspaces.
  • The authors have highlighted two properties that are indicative of the requirement that the manifold of interest should lie in a low-dimensional subspace of the higher-dimensional activation space.
  1. If the manifold of interest remains non-zero volume after ReLU transformation, it corresponds to a linear transformation.
  2. ReLU is capable of preserving complete information about the manifold only if the input manifold lies…

Non-local Layer

Figure 1
  • Where similarity could be one of four options (as in Figure 2):

Hyperparameter Optimization in NN

  • Hyperparameter tuning often leads to huge performance gain
  • Too big space to optimize by hands
  • How should we go about automating hyperparameter optimization?

Grid Search vs. Random Search

Figure 1
  • Try various hyperparameter settings and take the best one!
  • Can we do better than this? → Yes, apply bayesian optimization.

Bayesian Optimization

  • We need two design choices
  • Surrogate Modelling Function — Gaussian Processes
  • Acquisition Function — Probability of Improvement, Expected Improvement


  • How can we generate a lip-sync video from text?
  • This is challenging. Why? Basically we are going from a lot lower dimension (text) to a lot higher dimension (video)
  • To solve this issue:

→ Focus on synthesizing the parts of the face that are most correlated to speech (around the mouth)


Figure 1

Keypoints Generation

  • It takes a lot of space to store NN weights and memory to infer from it.
  • How can we make this situation better? → Model Compression
Figure 1
  • Exactly what kind of model compression technique should we use?
  • This paper can be an answer to one of them.


The First Neural Architecture Search (NAS)

Figure 1

So, How Does That Controller RNN Work?

Joonsu Oh

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store