Hello Muppets! I hope you enjoyed the transformers from the last post. With this blog, we go a little further back in history, to 2014, the year of Ebola, Comet Landings, Germany… and GANs. Ian Goodfellow and his team came up with a new deep learning architecture, called Generative Adversarial Network (GAN) . What is a GAN, you ask? Do you remember the weird cats and people who didn’t exist? You must have wondered how it was actually done. This post will answer all those burning questions- by simplifying Mr Goodfellow’s groundbreaking work, which has inspired a huge number of applications.
note: The Generative Adversarial Networks paper has been cited 30843 times at the time of writing
A Generation Gap
Till 2014, the most common work in deep learning was discriminative models- to predict correct labels of high dimensional data. The current models were inadequate for discovering new information from the available one and generating information instead of classifying it A new ruler emerged from the shadows.
A generative model is an unsupervised machine learning problem, in which the model discovers new patterns from the data so that it can generate completely new examples from the dataset, which are not already present in it. This is done by pitting two models in battle, the first model tries to create fake data from the existing data, while the second model tries to expose the fake data. These two models battle each other until the first model vanquishes the second model so that it can’t distinguish the real data from the made-up data. The name of the two combatants? The generator and the discriminator.
Breaking it Down
If you remember the CNN post, the discriminator model took images and predicted what they were. We were classifying dogs and cats labels according to their images. With GANs, we go one step further. Now that we have the cat and dog labels, we also try to recreate the cat and dog images from the labels. These new images are not from the original dataset but are created by the model itself. The GAN is made of two models, a generator and a discriminator. The generator will learn all the images that are in the dataset, and try to create new images from the data. Let’s say, we have the cats and dogs dataset again, but we are taking only cats, for now. The generator will learn all the features from the cat images, and try to create a new cat on its own, like this:
Does this look like a real cat to you? Not quite. The discriminator will agree with you. Its job is to identify the images made by the generator and make a prediction whether it is really a cat or not. With this image, it would be a hard NO. The model would be trained further until the generator becomes so good at creating new images, that the discriminator cannot identify if the image generated is fake or not.
Is this cat real or fake? It is fake, generated by the generator model. The discriminator will have a hard time guessing whether it is real or not. At that point, we can safely say that the GAN is working well.
If you read the previous blogs, you will notice some similarities. Yes, the discriminator is really a standard convolution network that we already studied. The generator is an inverse convolution neural network.
Fig. A generator and discriminator
As a word of caution, if you are a new reader, this might be too heavy for you; feel free to skip to the Experiments section and avoid the maths. Now we get into the maths, as presented in the original paper. Both the models are multilayer perceptrons. To learn the distribution of the data for the generator, a prior is defined on the input noise variables. A second multilayer perceptron for the discriminator is created, which outputs a single scalar. This perceptron declares the probability that the input came from the data, rather than the one generated by the generator. The next step is to now maximise the probability of the discriminator to assign correct labels to both training samples, and the examples from the generator. Congruently, the generator is trained to minimise the loss.
G(z, Thetag), where G is a differentiable function represented by a multilayer perceptron with parameters Thetag.
D(x, Thetad) is a multilayer perceptron, where D(x) represents the probability that x came from data, rather than the generator’s distribution pg.
A prior on input noise variable is also described as pz(z)
G is trained to minimize log(1 – D(G(z))
The training is done by simultaneously updating the discriminative distribution so that it classifies between samples from the data (px) and those from the generator (pg).
The image above shows GAN training. The dashed blue line is the discriminative distribution being constantly updated. The black dotted line is the actual data distribution (which does not change). The green solid line is the generative distribution. The domain from which data space is sampled is represented by the bottom line z, while the actual data is represented by the line above, shown as x. As the training continues, the generator keeps updating, to converge near the actual distribution. The discriminator is trained to discriminate the samples from the data. After updates, the gradient of the discriminator guides the generator to regions that are more likely to be classified as data. After several steps of training, the generative distribution and actual data distribution converge so that the discriminator is unable to differentiate between the two distributions. Ideally, a good estimator would achieve this global optimum of pg = pdata.
These networks were trained on three datasets, to verify the range of applications of the network; the MNIST handwritten dataset, the Toronto Face Database and the CIFAR10 dataset. The images predicted by the generator were extremely close to the original images and fooled the discriminator. The images highlighted with yellow are those generated by the generator, while the rest are from the original dataset. The GAN architecture is recreated on GitHub .
GAN: One model to rule them all?
GANs have a wide range of applications, mainly in image generation and video synthesis. In the years since this paper was first published, so many applications of GANs have been developed: from generating completely fictional people by learning facial attributes, producing realistic fashion models for testing clothing lines, to deepfakes, and generating 3D model shapes of actual objects, like cars and furniture. Now, GANs can generate original videos by learning YouTube videos, generate paintings brilliant enough to be auctioned at Christie, and even generate music and speech. The following figure shows just how rapid the development of GAN has been.
Improvements in GANs: Taken from Malicious Use of Artificial Intelligence: Forecasting, Prevention and Mitigation 
It seems that GANs are the future of deep learning, pervading every application space, hoodwinking even the most trained experts into thinking that GAN generated content is real. There are a few issues with GANs though, skirting around the edges of the generated content. One of the most important issues is the ethical concerns around people’s identities. DeepFakes are fast becoming a menace, where one’s face can be morphed in a video to show that they’re speaking words that they actually never uttered. With GANs becoming increasingly powerful, real and fake videos could be nearly indistinguishable. There are possibly terrifying consequences, especially when thinking in terms of political leaders or influencers. There is also worry about false content generated by AI. With AI content being very hard to distinguish, it would become easy to push out entirely AI generated content suiting private agendas. A framework still needs to be developed to ensure the protection of an unassuming populace.
In the battle between generators and discriminators, we need to ensure that humans don’t suffer the same fate as that of the citizens of King’s Landing.