1. Train a DCGAN to properly approximate the normal (not Gaussian!) distribution of images of interest. With this, we have one generator that can map a latent space to normal images and a discriminator that can discriminate real and fake images.
  2. Now, we have a trained DCGAN. Say, we have a…

What is flow based generative models?

Figure 1
  • Auto-encoder tries to find two functions f and g where they satisfy x = g(f(x)). VAE adds some condition by which z is approximated using Gaussian distribution.
  • Flow-based models are different from them. …

One of the challenges in the study of generative adversarial networks is the instability of its training

  • Spectral Normalization is a novel weight normalization technique to stabilize the training of discriminator of GANs by enforcing Lipschitz constraint on the discriminator of GANs. …

Depthwise Separable Convolution

  • MobileNetV1
Figure 1

Linear Bottlenecks

  • For an input set of real images, we say that the set of layer activation forms a “manifold of interest”.
  • It has been long assumed that manifolds of interest in neural network could be embedded in low-dimensional subspaces.
  • The authors have highlighted two properties that are indicative of the…

Non-local Layer

Figure 1
  • Where similarity could be one of four options (as in Figure 2):

Hyperparameter Optimization in NN

  • Hyperparameter tuning often leads to huge performance gain
  • Too big space to optimize by hands
  • How should we go about automating hyperparameter optimization?

Grid Search vs. Random Search

Figure 1
  • Try various hyperparameter settings and take the best one!
  • Can we do better than this? → Yes, apply bayesian optimization.

Bayesian Optimization

  • We need two design choices
  • Surrogate Modelling Function — Gaussian Processes
  • Acquisition Function — Probability of Improvement, Expected Improvement


  • How can we generate a lip-sync video from text?
  • This is challenging. Why? Basically we are going from a lot lower dimension (text) to a lot higher dimension (video)
  • To solve this issue:

→ Focus on synthesizing the parts of the face that are most correlated to speech (around the mouth)


Figure 1

Keypoints Generation

  • It takes a lot of space to store NN weights and memory to infer from it.
  • How can we make this situation better? → Model Compression
Figure 1
  • Exactly what kind of model compression technique should we use?
  • This paper can be an answer to one of them.


The First Neural Architecture Search (NAS)

Figure 1

So, How Does That Controller RNN Work?

Joonsu Oh

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store