Vibhu Bhatia|Portfolio

Every blog post i have come across regarding this topic always starts with this quote be it from open ai or medium posts so i figured might as well :
“What I cannot create, I do not understand.” —Richard Feynman

Very recently, researchers at UC Berkley came up with this paper , appropriately titled, everybody dance now where machine can look up at any dance style and make the target person do that dancing style, which one can look at in this video

Generative adverserial networks were something which i didnt even know did exist until i did and then was something that i saw literally everywhere in the deep learning community. Yann le cun, the director of facebook ai and research described gan as the most interesting idea in machine learning in the last 10 years and if we look at how they work it seems like something so simple yet so complex and filled with intricacies. We have 2 networks- generator and discriminator and we keep on training both of them till we reach the point where generator starts generating images which seem real to the discriminator.

As opposed to other architectures, i found this quite easy to implement, after i watched this video of Ian Goodfellow and taking a look at the model, i decided to implement my own version of gan using tensorflow eager, which was recently introduced by google, which is "an imperative programming environment that evaluates operations immediately, without building graphs: operations return concrete values instead of constructing a computational graph to run later".

Original or vanilla DCGAN(the prodigal son):

After viewing through a few implementations of dcgan, it became evident to me that MNIST had become the standard for evaluating a gan model, so i wanted to try a different dataset and lo and behold i found 1. Being a dc fan it was clear in my mind i wanted to make a superhero generator but one quick view through dc comics website quickly proved it wasnt possible for dc superheroes, but marvel had a bunch of characters on their website (around 2000) so that became an ideal dataset that i used in my experiments.

The architecture for dcgan is quite an easy one and using the tf.keras.layers api i was able to construct a model for both generator and discriminator quite easily and trained both the networks using the marvel dataset.

The dcgan architecture uses sigmoid cross entropy loss for both the generator and discriminator which represents the kl divergence to measure the divergence of generated examples from real examples

The main goal of a generator in a gan is to keep on generating fake examples and passing them to a discriminator till it starts to detect them as real. The main goal of a discriminator is to not get fooled by a generator and keep on predicting whether the image it receives is fake or not.

during my training i also implemented some of the techniques to improve the results as described by openAI which further helped in improving my results.

The training process was quite fast on a google colab notebook and each epoch took around 30 seconds, and i was able to see results quite quickly at around 600 epochs.

Wasserstein Gan(the stable one):

when i read the algorithm for wgan, it seemed more natural to me as opposed to the vanilla dcgan, it can be because i understood its loss function, the earth mover distance quickly as opposed to the kl divergence function. The training process was a bit stable than dcgan as it was less sensitive to changes in the learning rate, however the process took quite longer than i had expected and took a lot of iterations before giving me some observable results, i used the same architecture as dcgan but halved the number of kernels in every convolutional layer and the model still took longer(around 600 iterations) to give some observable results. The wgan was preferred over vanilla dcgan basically due to 2 reasons:

earth mover distance loss function provided a much stable training curve as opposed to the original implementation.
weight clipping the critic's weights to between -0.01 and 0.01 reduced the value space for the weights and they were more tightly constrained and less prone for a very radical change.
the critic was trained for more iterations per pass than the generator because the critic should be trained to optimality at each step.

Present and Future work on gan:

implementing wgan-gp, which uses WGAN-GP penalizes the model if the gradient norm moves away from its target norm value 1(the prodigal son, part 2).
generating drugs for diseases using a generative approach (currently working on this problem).

[INTELLIGENT MACHINES]

Generative networks(generating superheroes using deep learning).

Description

Original or vanilla DCGAN(the prodigal son):

Wasserstein Gan(the stable one):

Present and Future work on gan: