- Long time no see! Had a bit of a break and came back finally.
- I recently started working on training Korean version of GPT-3 so got myself into NLP. …

- Long time no see! Had a bit of a break and came back finally.
- I recently started working on training Korean version of GPT-3 so got myself into NLP. …

- Train a DCGAN to properly approximate the normal (not Gaussian!) distribution of images of interest. With this, we have one generator that can map a latent space to normal images and a discriminator that can discriminate real and fake images.
- Now, we have a trained DCGAN. Say, we have a…

- Auto-encoder tries to find two functions f and g where they satisfy x = g(f(x)). VAE adds some condition by which z is approximated using Gaussian distribution.
- Flow-based models are different from them. …

One of the challenges in the study of generative adversarial networks is the instability of its training

- Spectral Normalization is a novel weight normalization technique to stabilize the training of discriminator of GANs by enforcing Lipschitz constraint on the discriminator of GANs. …

- MobileNetV1

- For an input set of real images, we say that the set of layer activation forms a “manifold of interest”.
- It has been long assumed that manifolds of interest in neural network could be embedded in low-dimensional subspaces.
- The authors have highlighted two properties that are indicative of the…

- Where similarity could be one of four options (as in Figure 2):

- Hyperparameter tuning often leads to huge performance gain
- Too big space to optimize by hands
- How should we go about automating hyperparameter optimization?

- Try various hyperparameter settings and take the best one!
- Can we do better than this? → Yes, apply bayesian optimization.

- We need two design choices
- Surrogate Modelling Function — Gaussian Processes
- Acquisition Function — Probability of Improvement, Expected Improvement

- How can we generate a lip-sync video from text?
- This is challenging. Why? Basically we are going from a lot lower dimension (text) to a lot higher dimension (video)
- To solve this issue:

→ Focus on synthesizing the parts of the face that are most correlated to speech (around the mouth)

- It takes a lot of space to store NN weights and memory to infer from it.
- How can we make this situation better? → Model Compression

- Exactly what kind of model compression technique should we use?
- This paper can be an answer to one of them.