Life’s OrigAImi; demystifying AlphaFold’s protein folding

A deep neural network developed by Google’s DeepMind has made a huge leap in the quest to solve the protein folding problem, one of the largest challenges in biology. The results of the algorithm, called AlphaFold, were revealed on 30 November during the Critical Assessment of protein Structure Prediction (CASP) conference. To explain what AlphaFold does and what makes it unique, we are first going to dive into the protein folding problem and understand why scientists are so eager to solve it. 

Building a protein

First, I am going to explain what proteins do. Proteins are the building blocks of life. They are big, complex molecules and are involved with every biological process that takes place in a living organism. The instructions for proteins are coded in DNA. I will not go into detail how exactly this works as this is out of the scope of this article. But if you are interested, I have included some nice articles that explain how DNA is used to build proteins. The important point for this article is that proteins are made up of amino acids and the sequence of amino acids in the protein is stored in DNA. 


When a new protein is formed, it is unfolded and is just a long 2D string. After folding, the protein is 3D, and it can perform its function in the cell. What the protein does depends on its 3D structure. And the 3D structure depends on its DNA sequence. However, knowing the DNA sequence of a protein is not enough to determine its eventual 3D shape and functionality. DNA can only give us the amino acid sequence, not how the protein will fold. Modeling all the possible ways that the protein could fold would take longer than the age of the universe. But somehow proteins manage to fold themselves within milliseconds. Trying to model the 3D structure of a protein using its DNA and amino acid sequence is known as the ‘protein folding problem’. 

Importance of protein folding

You might wonder: ‘Why do we care?’ If proteins manage to fold themselves just fine, why bother modelling them? The reason is twofold; to cure protein diseases and to aid in drug discovery. A mutation in DNA can lead to the wrong amino acid being placed in the protein, which can result in malfunctions in protein synthesis and folding. Some well-known diseases that involve misfolded proteins are Alzheimer’s disease, Sickle cell disease, and Cystic fibrosis. Modelling protein folding could help identify what part of the protein is causing the folding malfunction. 

The second reason is that the shape of a protein dictates what molecules it can bind to. If we can model the 3D structure of a protein, we can devise a molecule that can bind to the protein. This can be used for developing drugs based on the structure of the protein. This is much faster and cheaper than the current method of drug discovery, which often consists of trying many drugs and hoping that one of them does something that can be used as medicine. 


Now that we have all the background information covered, we can go back to the real topic of this article; AlphaFold. As I have said before, AlphaFold aims to solve the protein folding problem. Like all neural networks, AlphaFold needs a lot of data to train. The data used by AlphaFold is genomic data, this is data obtained by sequencing DNA. The amount of genomic data has increased considerably in the past few years due to the rapid decrease in the cost of genetic sequencing. AlphaFold uses this vast amount of data to train its deep neural network and predict properties of proteins. 

AlphaFold can predict two properties of proteins: the distance between amino acid pairs, and the angles of the chemical bonds between those amino acids. This information is produced by a neural network that predicts the probability of different distances between two amino acids. The resulting network is then trained on the genomic dataset. A second neural network uses the probabilities produced by the first network to assess how close a proposed protein structure is to the correct structure. 

Improving the score

AlphaFold also uses two further methods that collaborate to improve the accuracy of the protein predictions. The first method is a generative neural network that replaces pieces of the proposed protein with other protein fragments, creating a new protein. If this new protein has a better score than the original protein, it becomes the new proposed protein structure. 

The second method uses gradient descent to improve the protein structure. This is a common machine learning technique that makes small, stepwise improvements to gradually ‘walk’ towards the optimal solution. This was done on the entire protein chain instead of subunits of the protein to simplify the prediction process. 

High accuracy

AlphaFold was able to easily beat the competition and take first place at the CASP competition. This competition was founded in 1994 to catalyse research in the field of protein folding, and it is now the gold standard for assessing folding prediction techniques. Some of the predictions AlphaFold made were close to the quality of experimental results, something that other submissions to CASP were not able to achieve. It is possible that the team of AlphaFold got lucky and the CASP challenge only presented problems that happened to be well suited to AlphaFold. But even then, the fact that AlphaFold was able to beat the competition with such a big margin is unique. 

AlphaFold is a big step for the field of protein biology, but it did not solve the protein folding problem. While researching AlphaFold I found quite a lot of blog posts that claimed that AlphaFold had ‘solved protein folding’, but this is not true! There is still a lot unknown in the field of protein folding and more progress is needed before we can claim that the protein folding problem is solved. Even though AlphaFold has a higher than ever accuracy, its accuracy is still too low to predict novel protein structures and to be useful in drug discovery. Also, the CASP competition only uses standalone proteins that have little interaction with other proteins. Most proteins are part of a big web of interacting proteins and depend on each other for chemical stability and functioning. AlpaFold’s predictions were accurate for simple, standalone proteins but not for interacting proteins. As most proteins belong to the latter category, AlphaFold will need to improve its accuracy on complex proteins to be correct on new data outside of the CASP competition. Furthermore, what AlphaFold solves is a protein prediction problem, it helps in understanding the protein folding problem but it does not solve it. Even if AlphaFold can accurately predict what a protein looks like given the DNA sequence, we still do not understand how protein folding occurs in nature and why misfolding occurs.  

AlphaFold is the first big step in understanding the protein folding problem as it shows that there is some underlying structure in genomic data that AlphaFold is able to find. This has given researchers the hope that the protein folding problem can be solved within their lifetime, something that seemed impossible just twenty years ago. But AlphaFold alone is not enough to solve protein folding, to solve protein folding we need to understand its mechanisms and that is something AlphaFold is not able to do yet. 

Nice articles: 

DNA seen through the eyes of a coder (or, If you are a hammer, everything looks like a nail)

DNA and RNA Basics: Replication, Transcription, and Translation 


AlphaFold GitHub:

It’s the attention, stupid!

For more than 50 years, the Muppets from Sesame Street taught children their language. Now, it’s time for Elmo, Bert and their friends to teach their wisdom to adults. As Sesame Street ended, they had to find a new job. With a CV that many starters would drool over with more than five decades of experience, the obvious step was staying in the education sector. ELMo [1], BERT [2], Big BIRD [3], ERNIE [4], and KERMIT [5] have all found their place in Natural Language Processing (NLP). Unfortunately, Cookie Monster had a different passion.

These are not just names of the Muppets, but also names for a new type of deep learning model called transformers [6]. Transformers have caught the scene by storm. Since 2017, they have been the go-to for language models to work on. And researchers like to have a little fun by naming their models to muppets. Transformers are interesting, because it uses many techniques that soften the issues that previous language models suffered from. In this blog, we’ll go over how we got to this point, how transformers work and why transformers are hyped up so much.


For a history rundown on how we got to this point, there’s plenty of resources. But, because it is important to know why the steps that were taken to get us to transformers, I will give a little more context. Let’s go through a sentence together:

“Big Bird is a character from Sesame Street. He has yellow fur and is quite big”

What does “He” in the second sentence refer to? For us humans it’s obviously Big Bird, but it’s more difficult for a computer. Recurrent neural networks (RNN) are very attractive to use for NLP over convolutional neural networks (CNN), because of its sequential nature. The efficiency was increased so much, Count von Count lost track of counting. Recurrent neural networks are networks that propagate information in a sequential fashion, but also have loops in them to retain information during a computation. By retaining the information from the previous sentence, we can make a link between ‘Big Bird’ and ‘He’. To show how a recurrent neural network handles text, we can go through an autocompletion task. Let’s have another example:

“Big bird is a character from Sesame ______”.

We want to predict the word “Street” here. We start with the first word in this sentence and put it into the RNN and compute a value together with the hidden state. It then propagates this value to the next word in the sequence and uses that value together with the word “bird”. It does this until it arrives at “Sesame”. We now have information of all the words preceding Sesame.  In the above figure, the colour blocks represent that information. The previous information we have is Big bird, and Sesame. So we can presume that it is about the show, and thus sesame will follow with ‘Street’. RNNs are great for such usage where dependencies are important, but there is a catch. RNNs have a hard time dealing with longer sentences. Let’s change the first example sentence to:

“Big Bird is a character from Sesame Street. Sesame Street is a TV show for children. He has yellow fur and is quite big”

If we now want to find out that “He” refers to Big Bird, we must go all the way back to the first two words. As we go back further into a sequence, the earlier words have less weight, while more recent words have a higher weight. You can see that in the previous figure. The yellow block that represents the word “Big” is hard to see in the layer for Sesame. This phenomenon becomes even worse for longer sentences. Then, when the weights are backpropagated, the gradients that update the information ‘vanish’. Thus, the information from the beginning is lost. This is called the vanishing gradient problem. So RNN’s greatest advantage also becomes its biggest enemy. It’s hard for RNNs to retain information, even more so with longer sentences. It will mostly remember just the near past, and less so the far past. 

To mitigate this problem, long short-term memory (LSTM) came bursting into the scene. LSTMs are a type of RNN that solve the vanishing gradient problem, to an extent. They forget what they think is not as important and remember what it deems important. This way, the network can remember earlier words more vividly. While LSTMs can be great and still have viability for some tasks, they do not fully solve the vanishing gradient problem. Another problem is that parallel computations are not possible. With RNNs, you must sequentially compute the input. But I just told you that the advantage of RNNs is its sequential nature, why is that bad? If you want to calculate the probability of a word in a sentence, you have to process every word before it, because the values propagate back. RNNs and LSTMs could not live with their own failure, and where did that bring us? Back to convolutional networks. 

You just want attention

In comes the transformer, putting us out of our misery by giving the solution to these issues. Transformers combine the parallelization of CNNs and the forgetting and remembering idea of LSTMs. A transformer consists of an encoder-decoder architecture. The encoder receives an input string and creates word embeddings from it. Word embeddings are numeric vector representations of words. The decoder generates an output string. When opening the bonnet of the encoder, we can see the engine of the transformer: multi-headed attention. Before we get to what multi-headed attention is and visualise the transformer architecture, let’s first take a step back and look at attention in general.

One of the crucial ingredients that makes transformers so good and why Swedish Chef approves of this ingredient, is attention. The main takeaway from the seminal paper “Attention is all you need” by Vaswani et al. that introduced the transformer model, is that we don’t need to focus on everything. This sounds familiar to the idea of LSTMs. A Transformer is like Sherlock Holmes; it knows where it should look. This will solve the problem with longer sentences because loss of information from the far past is no longer an issue. It is also the main part of a transformer.

We should take our big furry friend again as an example:

“BigBird is friends with Snuffleupagus”

What the attention mechanism can do for us here, is give more context to “BigBird”. On its own, Big Bird could just mean a bird that is big. But with a little more context, the embedding can change such that it knows that we are talking about the muppet from Sesame Street, since Snuffleupagus is also mentioned in the sentence. So how is this done? Through the magic of self-attention. Remember that the first step in a transformer is to convert strings of words to vectors (i.e. word embeddings). These vectors have all kinds of information in them. Since these word embeddings are vectors, they span a space. While the numbers in these vectors don’t have direct meaning, we do know that vectors that cluster together in the space, can have similar meaning:

words close together in the vector space have something in common | credits: google

So we can expect that Big Bird and Snuffleupagus are close to each other in the vector space. But now we want to calculate new vectors for these words by including context. Although Big Bird consists of two words, for the sake of simplicity I will combine Big Bird as one vector:

To recalculate the vector for BigBird, we multiply its vector with the other vectors in the sentence. We can call the outcomes dp, short for dot-product. These dp numbers are then normalized (softmax), multiplied with the original vectors and then summed up to calculate the new vector VBigBird. The same happens for the other vectors in the sentence. We see with this self-attention mechanism that the sequential nature of previous models is gone. Words that are close to each other in the sentence don’t matter. 

In RNNs you have weights that are calculated in the hidden state. These weights contribute to how important a word can be. As it currently stands in our journey with transformers, there are no weights we can play around with. What if we do want some parameters? Why do we even want parameters? If we add weights, we can maybe add weights to the vectors, so we can hope that the meaning comes out better, or we can find patterns easier. We can make it more risky or risk-free. Looking at picture 3, there are three points where vectors have a role, so we can add three parameters that are called: queries, keys, and values. The query is the word vector that you want more context for (vector 1). The keys are all the word vectors in the sentence (vector 2). The values, depending on the task, can have different inputs. For this task the values are the same as the keys. Now that we are adding parameters, it’s starting to look like a neural network. We can see the similarities from before. This also means that we can use backpropagation for the attention mechanism to learn.

Now that we have an understanding of self-attention, we can move on to the final piece of the puzzle: multi-headed attention. In the figure below, you can see the original illustration from Vaswani et al [6] of the Multi-head attention:

But first, another example:

“Bert gifted Ernie a jumper for his birthday”

We want to calculate the attention of the word ‘gifted’. Which words should have some attention? Bert does the gifting, Ernie is the receiver, and the jumper is the gift, so those three words should have high importance, with maybe birthday as the reason to a lesser extent. This would mean that we would have to split up the attention. Why split it up when there is no sequential structure in transformers? Behold! The power of parallelization! Instead of splitting up the attention mechanism, just add more layers of self-attention mechanisms. These linear layers are also called ‘heads’. That’s where the name of multi-headed attention comes from. This multi-headed attention mechanism is the main engine of a transformer. With the engine good to go, we can close the bonnet and look at the encoder structure:

This figure comes from the paper of Vaswani et al [6]. 

We can now see how all these things interact together. In the original picture, the decoder is also shown. The decoder is mostly used for translating sentences, so to keep it simple I’m only showing the encoder. One thing that is new in this picture is the positional encoding. I mentioned that the advantage of transformers is that the sequence of the words doesn’t matter, making them able to parallelize. But sometimes the position that the word is in can be important. For this reason, we add this positional encoding to make sure the transformer also doesn’t forget about the sequence position a word is in.

Highway to Sesame Street

Before you set your destination to Sesame Street and want to plunge yourself in transformers, there are some considerations for you to think about before you take this exit. Transformers need a lot of memory, time, and power to train. 

Furthermore, there are bias issues with transformers [9]. While this issue of gender and racial bias in older language models are known (e.g. word2vec [7,8]), this bias is amplified even more in transformers. If the data you’re working on is sensitive and has a probable chance that it will suffer from such problems, think very carefully if you want to use this. While there are techniques that try to remove bias, they merely hide the bias and not remove it.

Nonetheless, the hype on transformers is still justified as it makes many NLP tasks much easier and accurate to work on. Bert might nag, but the station at Sesame Street will become busier.


[1]M. Peters et al., “Deep Contextualized Word Representations”, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2018. Available: 10.18653/v1/n18-1202 [Accessed 21 April 2021].

[2]J. Devlin, M. Chang, K. Lee and K. Toutanova, Proceedings of the 2019 Conference of the North, 2019. Available: 10.18653/v1/n19-1423 [Accessed 21 April 2021].

[3]M. Zaheer et al., “Big Bird: Transformers for Longer Sequences”, Advances in Neural Information Processing System, vol. 33, pp. 17283–17297, 2020. [Accessed 21 April 2021].

[4]Z. Zhang, X. Han, Z. Liu, X. Jiang, M. Sun and Q. Liu, “ERNIE: Enhanced Language Representation with Informative Entities”, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019. Available: 10.18653/v1/p19-1139 [Accessed 21 April 2021].

[5]W. Chan, N. Kitaev, K. Guu, M. Stern and J. Uszkoreit, “KERMIT: Generative Insertion-Based Modeling for Sequences”, arXiv preprint arXiv:1906.01604, 2019. [Accessed 21 April 2021].

[6]A. Vaswani et al., “Attention is All You Need”, in Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, pp. 6000–6010.

[7]Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. Proceedings of Workshop at ICLR, 2013.

[8]Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. Proceedings of NIPS, 2013.

[9]K. Kurita, N. Vyas, A. Pareek, A. Black and Y. Tsvetkov, “Measuring Bias in Contextualized Word Representations”, Proceedings of the First Workshop on Gender Bias in Natural Language Processing, 2019. Available: 10.18653/v1/w19-3823 [Accessed 26 April 2021].

Unboxing the Black Box

We all have had experience with neural networks being black boxes. With artificial intelligence and machine learning becoming so prevalent in seemingly every new technology, it is hard to keep track of how it actually works. With image recognition being used in a wide variety of applications- from face tracking to disease detection to prediction of climate, let’s dive into one of the most widely used models used- convolution neural networks (CNNs). 

* This article is meant for readers who have some experience with Neural Networks and starting out with CNNs

I am not going to dive into too much history of CNNs, seeing as there are so many resources available on the internet to get acquainted with them. CNNs were first developed on recognising handwriting, they have become more advanced and are now used in facial recognition, medical imaging and augmented reality. But what does CNN do exactly? So briefly, convolution neural networks are used for image data; they take input images, and assign some importance to the various features of the image. These particular features can be a particular shape in the image, or edges, or objects in the image. The image is just a series of intensities of light (represented by pixels) arranged in a grid-like fashion. Similar to the human eye, a CNN, with the help of multiple layers, detects the shapes of the images, simpler patterns first, and more complex patterns later. They try to guess what a given input image actually is. The model used in the attached notebook will guess if we are giving it a cat or a dog. 

The image above shows our model cat; the image is divided into a grid-like structure of varying intensities. 

The CNN is typically made up of 3 blocks: the convolution layers, the pooling layers, and the fully connected layers. Let’s go through them one by one. I am not going into the maths of each layer, just the intuition behind them. Interested readers can find the maths in the linked resources at the end of the article. 

The convolution layer: downsampling the data

Immediately after saying that we won’t delve into mathematics too much, it rears its head in the next sentence. The convolution layer is a series of matrix multiplication operations. The image grid with the pixel intensities is partitioned by an operation called stride. These matrices are multiplied with another matrix, called kernels. These kernels contain weights which will be multiplied by the image pixel intensities (remember the Neural Network formula: output = input*weight + bias? This is exactly what is happening here, but with image data). The figure below shows how the strides work to form matrices for multiplication. Note that each square corresponds to pixels with some intensity values. Did you notice how there are empty squares around the output we created from the matrix multiplication? That happens because we effectively reduce the dimensions of the matrix through the multiplication operation. This is what convolution does. If we want to preserve the shape of the matrix, we fill the border values with zeros; this operation is called padding. We have shown a 2D array, for the RGB colour system, the same operation is done, but for 3 dimensions. 

The pooling layer: Survival of the fittest

Imagine you have a 7-course meal, all laid out in front of you. You want to taste it all without getting full. What you do is instead of eating a course first and then moving on towards the others, you take a small bite of each course. You take sips of water/wine in between the bytes to have a fresh taste in your mouth for your next bite. You are basically sampling all the courses. That small bite you took from each dish is representative of the entire dish. Different bites of the same dish are going to taste the same. You can judge how the food tasted just by those representative bites you took. That way, you can taste all the dishes in front of you without getting too full and leaving some out. This is exactly what the pooling layer does, but with image pixel intensities. It starts from one region of the image, takes a few representative values (either maximum or average from a region), and uses them in the final matrix. 

Fully Connected Layer: It’s all coming together

We have downsampled the data, applied our own operations on it so that the neurons in the network contain real information about the images, but they are all individuals, right? We still need to bring them together to produce outputs. Just like in a neural network, we connect everything together to the final layer. This is the job of the fully connected layer. The output from the convolution and pooling layers now contains vectors of values, each value representing the probability that a certain feature belongs to the wanted label. Remember the 7-course menu? The fully connected layer will probably contain an Apple pie and contain features like ‘apple-flavour’, ‘sweet’, ‘cinnamon’. If you eat another dessert without knowing what it is, and taste these features, you can safely say that what you’re eating is probably an apple pie. Good job, you’ve connected it all together!

But what about the activation functions?

The activation functions convert the linear operation of convolution, into a nonlinear operation. This is because the data we have, (the features of the image), is not linear. The activation functions are used after the convolution operation is done. The mathematics behind them can be found in the resources listed below.

When dealing with image data, we repeat the convolution and pooling layers multiple times. This helps in decomposition of the input values. We use a variety of filters, some filters operate on shapes and lines, while some filters work directly on the pixel intensities. Using multiple convolution layers helps in using a  variety of filters, to better classify the features of the images. But remember, deeper is not always better! 

The basic architecture of a convolution neural network can be something like this

Input Image -> Convolution1 -> Activation -> Pooling1 -> Convolution2 -> Activation -> Pooling2 -> Fully Connected Layer -> Output

Convolution Neural Networks Explanation visualised 

The post talked about the architecture of convolutional neural networks and their working. The next part will show how each layer works with the help of feature maps. The code used to visualise this can be found in the references. 

I have created a CNN using a series of 3 convolution layer-pooling layers with a final fully connected layer, which classifies cats and dogs. Through the series of images next, you will see how CNN understands the features of images. But first, a picture of a cute cat. 

This is the cat model I am going to use for the CNN* 

Remember how the first few layers of the CNN detect the edges and very basic shapes? This is what the cat looks like, to the very first channel for CNN. 

Notice the thin lines? Yes, those are the borders in the pictures recognised by the architecture. The picture doesn’t tell us very much, because it’s just the first filter. The layers closer to the input images show us a lot of detail. As we go deeper, it becomes difficult for the human to perceive the shapes recognised by the architecture. In other words, the first few layers capture a lot of details and small patterns. The deeper layers can identify the general patterns in the images. The following image, of the 3rd layer of the CNN, is almost indiscernible to humans eyes. This cat almost looks like the horror-show from the thiscatdoesnotexist page. But now, the CNN is learning the general patterns of the image, not focusing on the details.


The fifth layer of visualisation is even further away from our original cat. By now, we can barely see the ears and the paws of the cat. But the general shape of the cat is important to the CNN. 

With multiple layers, and a deep architecture, the model learns these generalised features and can reproduce these features very well when identifying image labels. In the notebook, there are some black filters as well, that just means that those filters were not activated. As we go deeper, the activations are much less discernible visually, but still retain the major features. 

This post was intended to open the black box of image classification algorithms a little. The visualisations really tell us step by step how the convolution neural network reads and understands an image. Of course, there is a lot of mathematics involved in the construction of CNNs, which I did not explain, to preserve the actual motivation of this article. The list of references will help if you are more interested in the background.  

Notebook used for this article

Visualizing filters for CNN using VGG19

Picasso, a free open source visualiser for CNNs 

Deep Visualization Toolbox

The maths behind Activation Functions

*No animals were harmed during the writing of this post. 

On Cybernetics and its influence on AI

In the first blog post of the “History of AI” series, we covered the founding event of Artificial Intelligence. In 1956, Artificial Intelligence was born as a discipline. Many events and discoveries have led to this, one of which is undoubtedly the publication of Wiener’s Book: “Cybernetics: Or Control and Communication in the Animal and the machine” in 1945. 

This blog post will give a short introduction into the topic of cybernetics, where it comes from, what it means and of course, how it relates to AI.

‘Norbert Wiener’s now-classic work entitled The Human Use of Human Beings: Cybernetics and Society reveals the thesis that “society can only be understood through a study of the messages and the communication facilities which belong to it”

Broadhurst, A. R., & Darnell, D. K. (1965). An introduction to cybernetics and information theory.

What is Cybernetics?

The term cybernetics was coined by Wiener’s publication, but the words origins are Greek and the idea it describes is also not fundamentally new. So, let us start at the beginning: the Greeks. The term “cybernetics” comes from the Greek word “kubernetes”, which translates to “steersman”. It is the same word root as in the word “govern”. Plato already talked about the concept of government in terms of control. In the end, a government is steering the country.

The term cybernetics comes from the Greek word kubernetes, to steer. We are constantly steering ourselves through our environment, much like a helmsman steers their ship.

Cybernetics is a school of thought about control (hence the relation to “steering”) and communication; a philosophy, a language that describes systems that have a goal. Notice the vague formulation of “system” here. The crucial part of this theory is that it does not differentiate between human or animal. Not even between animal and machine. Cybernetics sees everything that pursues a goal to be essentially the same. The components may differ, but the mechanisms are the same.

Action, Sensing, Comparing

But what are these mechanisms which appear in both human, animal and machine? It is a loop of action, sensing and comparing which every goal-driven system follows. Any action we do can be broken down into these three fundamental steps. Take for example the action of putting bread in the toaster. We grab the toast and move it over the toaster (action). With our eyes, we sense if the current position of our hand allows us to drop the toast in the toaster (sensing). If yes, we drop it into the toaster and press down the switch. If not, we correct our hand position and drop it. We compared our sensory input (what we see) to our goal (putting the bread into the toaster so we can finally have breakfast).

Now comes the interesting part. Once you press down that toast, the toaster starts acting by browning the toast. Through metal wires and current flow, it senses how much time has passed yet and releases the toast if the perfect-toast-time is reached or continues to brown it if not. Once again: different components, same mechanism. 

Communication is the key

The main point of cybernetics is that all of this is only possible through communication. Without communication, no goal could be reached. We all have a basic grasp of what happens in our body that allows us to put the toast into the toaster. 

But also inside the toaster, communication happens to achieve the goal of a perfectly browned toast, just through current and metal wires instead of neurons and muscles.

If you see the world through a cybernetic lens, you will notice that basically everything can be seen in a cybernetic way. It is not only the living world, humans and animals. Beyond that, cybernetics can also be applied to social systems. During a conversation, we act by speaking. We sense the answer of our conversation partner, verbal and non-verbal, and compare if the conversation is going in the direction we want it to. If not, we try to steer it in the desired direction through our next action.

In our blog post about the Turing test, we discussed already what controversial topic the combination of machinery and intelligence was. And Turing’s paper was published five years after Wiener’s book. His claim that humans and machinery are not fundamentally different, was revolutionary, if not scary for people at that time. But it was also part of a shift, which brought the whole field of Artificial Intelligence alive. 

From Cybernetics to AI

Once the idea that humans and machines are essentially the same became more popular in the researching community, scientists began to seriously consider what it would need to develop machines with human-like intelligence. 

And the groundwork for this was done already. Prior to Wiener’s publication, Turing introduced the Turing machine, also known as the universal computing machine (1937, read more in our post about him here), one of the most influential works in computer science and mathematics. And that before the first digital computer was built. In 1943, McCulloch and Pitts proposed an artificial neuron, which by some is considered to be the first work in artificial intelligence (we briefly touched on this in the most recent history of AI blog post, read it here). It was the first theory of mind and brain in mathematical notions. Once it was agreed that both systems have the same expressive power, the barriers between machine and mind were broken.  

From there onwards, the interdisciplinary exchange continued, mind and machine were brought together closer and closer. The focus of cybernetics on communication was addressed in Information Theory by Claude Shannon already in 1948. Paired with other advances in the field of computer science, the path was paved for the Dartmouth conference, AI became its own discipline. Exploring ways to achieve human-like intelligence with machines was now its own field of research. 

But also the other direction was explored. Scientists started to see computers as a way to understand the brain. Not only did Wiener’s theory reduce the barrier between human and machine, but also helped to demystify the basic mechanisms of our brain and all purpose-driven systems in general. Cybernetics and control theory finds, to this day, wide application in various disciplines, including environmental, social, learning and management. 

Wiener on the future

Wiener expected that machine to machine and human to machine messages would be steadily increasing in our society. According to him, it does not matter for my own goal achievement whether I give my instructions to a machine and await its output or a human. All that matters is communication. Inside me, between me and my potentially mechanical communication partner and inside them. The basic principle of cybernetics is simple, yet effective and its implications have been immense. It was part of a major shift of thinking which contributed to the digital revolution. And in my eyes, time proved him right. 


Heart Wiring

Why we should strive to make robots as empathic as humanly possible

Slowly but surely robots are moving from an industrial setting into more interactive and social environments. To navigate social environments and social interactions, robots must successfully be able to interact with humans. Often these robots are modeled taking human cognition into account, both because we have to share some common ground and understanding of each other to a certain extent for interaction, and because for a lot of cognitive capabilities humans are the best example we have. After all, humans generally consider themselves to be the smartest and most social species on earth [6].

Empathy: what is it?

One of these fundamental human capacities is the ability to empathise with one another. Empathy has been thoroughly studied. However, there is no consensus over the definition of empathy. Generally, one can speak of three different kinds of empathy. Cognitive Empathy, Affective (or Emotional) empathy, and a combination of both [7,12,13]. Cognitive empathy can be roughly defined by people’s ability to understand other people’s emotions and behaviour through mental perspective taking. Affective empathy, also known as the sharing of vicarious emotions, can be defined as follows: “considering it as a process that makes a person (or agent) have “feelings that are more congruent with another’s situation than with his own situation.”” [12]. A. Tapus and M. Mataric made guidelines of the needed capacities of a social robot to emulate empathy [14]. In their paper they acknowledge machines cannot feel and express empathy, however, they focus on how a social robot can appear empathic. 

Also, B. Duffy states that due to the embodied differences between humans and robots, we cannot ensure robots will have the same empathic emotions. However, he argues that for successful social interactions it would be enough for a robot to appear intelligent or have social capabilities. Within human-human social interactions people do not use standardised metrics as measures for intelligence and social capacities. We simply observe them. When a robot appears to be (socially) intelligent it can already facilitate social interactions as they “speak our language” [6]. We might not be able to model “the real deal” when trying to make empathic robots, but we can make them appear to be empathic. Here we would use empathy not as much as a set of feelings, but as a driving force in behaviour and social interactions. But why would we want to have empathic robots? What makes empathy so crucial in social interactions?

Empathy: fundamental to human interactions

As stated earlier, empathy is fundamental to human cognition and human-to-human social interactions. It mainly has been associated with, and is a known mediator of, prosocial behaviour, which can significantly impact how people treat each other [10]. An example of prosocial behaviour would be donating money or extending a helping hand when someone had an accident. It has been argued that cognitive empathy has evolved because of the complex social demands in human interactions during evolution [13]. Cognitive empathy enables the following behaviours: facilitating conversations and social expertise and predicting behaviour. But also less savoury behaviours such as lying and deceiving, or recognizing when such things happen to you. Affective empathy, on the other hand, is mainly associated with altruistic behaviour and nurturing relationships and is related to the moral mechanisms and similar behaviours. A simple example would be that you might be less inclined to punch someone in the face because you understand it will hurt and can empathize with the other’s pain. Another example which shows how empathy modulates social relations is given by Bastian et al., who made students eat spicy chilli peppers in groups and showed that the shared pain facilitated group formation [1].

Empathy and moral decision making

As robots are entering social spaces and their autonomy grows, the need for moral agency in robotics increases. How and if robots can be moral agents is highly discussed. B. Duffy argues that especially a humanoid robot specifically designed for social interaction needs to have moral rights and duties. It uses and looks like a human and thus uses our frame of reference, which sets high expectations. Even more fiercely discussed is the role of empathy in moral agency. Morality and empathy are often mentioned in the same breath. However, the relationship between morality and empathy is not yet very clear [15]. Generally, there is consensus about the importance and influence of perspective taking in moral judgements. Empathy allows agents to understand the effect of their actions on the emotions and mental state of others. This directly influences people’s behaviour towards others, which refers to what we earlier described as cognitive empathy. Others, however, have argued for a more fundamental role of emotions in empathy and moral decision-making. L. Damm describes an account of moral responsibility which critically depends on the empathic capacities of the agent. In her paper, she argues that moral responsibility depends on their status as a moral agent. When one cannot satisfy the criteria for moral agency, they are considered not fully responsible [5]. This would indicate we also need a form of affective empathy mechanisms in robots for them to be considered fully morally responsible.

Whether this is possible, is highly contested, but still open for debate and critically depends on whether we believe you need a biological body to experience emotions [4]. This does mean that a robot cannot be fully morally responsible much like a child cannot, given the assumption that affective empathy is indeed needed for full moral responsibility. Whether one believes robots can experience emotions and thus can be truly and fully empathic does not mean the role of robot empathy in robot morality is over. We can still use the construct of cognitive empathy. Or abstract away from biology and view emotions as one of the driving forces behind behaviour. We have now seen how empathy is a foundation for social interactions and morality and that this is much needed in the future. But how does this information translate into practice? What are the applications we can dream about in the future apart from moral robots?

Empathy from a practical standpoint

Current empathic agents are still in its infancy and are very limited in their empathic abilities. However, there are a few domains in which the use of these systems are being investigated. An example is empathic tutoring. A current, very ambitious project is the EMOTE project. This project aims to develop an empathic tutoring system which should facilitate the learning experience of children [3]. Knowledge from social psychology and models of empathy are applied to create emotionally intelligent and responsive tutor agents (robotic and virtual).

A. Tapus and M. Mataric made their recommendations on how to model empathy with the intervention to improve therapeutic robots [14]. They found that empathy is a crucial component of therapy and a robotic system that is supposed to be a therapeutic aid, thus also should have empathic capabilities. Social robots may also be used in therapy for example in supporting ASD patients [8]. Another use case for emotional robots could be robots as companions. Leite et al. developed a social robot companion which reacted in an empathic manner to a chess game played between people by displaying facial expressions and uttering phrases [10]. People to which the robot reacted empathically to, rated the robots as being friendlier. Empathy can also go the other way around: the robot evokes empathy in humans. Paiva et al. addressed how an empathy evoking agent can be used to persuade children to do “the right” action[11]. This demonstrates we can even use robots to enhance people’s moral or social behaviour!

Empathy: a demon we need to tackle

Apart from the debate on if robots can experience emotions and thus can truly be empathic themselves, it is important to know that affective intelligence in cognitive robotics is highly controversial [4]. A critical moral dilemma accompanying social robots that portray empathy and are there to be a listening ear (for example patients) is that of deception [2]. If it merely appears to be empathic and understanding, it might betray our trust. The question is whether the end justifies the means.

However, we have seen that empathy is crucial in social interactions, and we would also need to implement empathic capabilities to a certain extent to make robots capable of navigating social environments. Empathy could have practical benefits when we try to use it to make tutoring or therapeutic systems, but it could also be crucial for a companion robot. We have seen that the need for robot ethics is rising and empathy is a crucial component in human morality. All in all, we should strive to make robots as empathic as humanly possible.


[1] Bastian, B., Jetten, J., & Ferris, L. J. (2014). Pain as Social Glue: Shared Pain Increases Cooperation. Psychological Science, 25(11), 2079–2085.

[2] Bradwell, H. L., Winnington, R., Thill, S., & Jones, R. B. (2020). Ethical perceptions towards real-world use of companion robots with older people and people with dementia: Survey opinions among younger adults. BMC Geriatrics, 20(1), 1–10.

[3] Castellano, G., Paiva, A., Kappas, A., Aylett, R., Hastie, H., Barendregt, W., Nabais, F., & Bull, S. (2013). Towards empathic virtual and robotic tutors. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 7926 LNAI, 733–736.

[4] Cowie, R. (2012). Ethical issues in affective computing 2 . Formal and informal foundations of ethics. 1740.

[5] Damm, L. (2010). Emotions and moral agency. Philosophical Explorations, 13(3), 275–292.

[6] Duffy, B. R. (2006). Fundamental Issues in Social Robotics. International Review of Information Ethics (IRIE), 6(March 2003), 31–36.

[7] Edele, A., Dziobek, I., & Keller, M. (2013). Explaining altruistic sharing in the dictator game: The role of affective empathy, cognitive empathy, and justice sensitivity. Learning and Individual Differences, 24, 96–102.

[8] Esteban, P. G., Baxter, P., Belpaeme, T., Billing, E., Cai, H., Cao, H. L., Coeckelbergh, M., Costescu, C., David, D., De Beir, A., Fang, Y., Ju, Z., Kennedy, J., Liu, H., Mazel, A., Pandey, A., Richardson, K., Senft, E., Thill, S., … Ziemke, T. (2017). How to build a supervised autonomous system for robot-enhanced therapy for children with autism spectrum disorder. Paladyn, 8(1), 18–38.

[9] Leiberg, S., Eippert, F., Veit, R., & Anders, S. (2012). Intentional social distance regulation alters affective responses towards victims of violence: An FMRI study. Human Brain Mapping, 33(10), 2464–2476.

[10] Leite, I., Pereira, A., Mascarenhas, S., Martinho, C., Prada, R., & Paiva, A. (2013). The influence of empathy in human-robot relations. International Journal of Human Computer Studies, 71(3), 250–260.

[11] Paiva, A., Dias, J., Sobral, D., Woods, S., Aylett, R., Sobreperez, P., Zoll, C., & Hall, L. (2004). Caring for agents and agents that care: Building empathic relations with synthetic agents. Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 2004, 1, 194–201.

[12] Paiva, A., Leite, I., Boukricha, H., & Wachsmuth, I. (2017). Empathy in virtual agents and robots: A survey. ACM Transactions on Interactive Intelligent Systems, 7(3).

[13] Smith, A. (2006). Cognitive empathy and emotional empathy in human behavior and evolution. Psychological Record, 56(1), 3–21.

[14] Tapus, A., & Matari?, M. J. (2007). Emulating empathy in socially assistive robotics. AAAI Spring Symposium – Technical Report, SS-07-07, 93–96.

[15] Ugazio, G., Majdandži?, J., & Lamm, C. (2014). Are empathy and morality linked? Insights from moral psychology, social and decision neuroscience, and philosophy. 1–27.

Funny AI and When Things Go Wrong

For many of our readers, we just closed a very high-stress month: the exam period. Even for our non-student readers, there might still be a time where you just have recuperated from the holiday period and you are already working hard to achieve your new year’s resolutions or most likely failed by now. Even the blog writers need to start shedding our holiday weights and have buckled down for the busy exam period. All in all, this is the one time you might want to lay back and enjoy something casual and fun. Thus, for this article, all blog team members have made a collection of their funny and weird AI applications. 


We all have that one family member that has their timeline or journal full of inspirational “quotes”. They are often accompanied by stock images. We all think they are kinda silly, but yet, we crave more and need to be inspired on a daily basis. In comes InspiroBot, an AI that generates these inspiring quotes together with beautiful stock images. As we’re already in mid-February, the change we dream to have in a new year is slowly fading away. Some of us have not even started to run that 10km, as it was pretty busy,very cold, and most of the time it is raining, so you really don’t wanna go outside. Also your cat is now on your lap so it is the rule of the universe to not stand up. Anyway, what I am trying to say is that we might be in need for some extra inspiration! You should just go to InspiroBot and try to get the deeper meaning behind randomly generated quotes. 

Pudding- judge my Spotify

The world is filled with chatbots to assist with your tasks, but this one is a bit different. “Pudding” is how your soul feels after this AI roasts you for your music taste and those guilty pleasures. Trained by Mike Lacher & Matt Daniels, this AI looks at your Spotify history and “judges your awful taste in music”. We say, go for it. Feel what us writers felt when we took a test drive for ourselves. Just remember, we warned you.


This website is a compilation of all the AI fails which happen when it is used to create something from scratch. From creating demon babies with just a tiny resemblance to cats, to creepy Homer Simpson caricatures, to weird food blogs you never want to take directions from, to automobiles created straight out of something like the TopGear workshop. This website contains a cornucopia of hilarious instances when AI tried to take something human and give it an artificial twist. It is like someone has been given a paintbrush and a blank canvas, and they decided to collect all the bad karma you collected. All of these non-existing things are inspired from thispersondoesnotexist, a Generative Adversarial Network architecture, which we will not get into today. 

Reface App

Exploring non-existent cats is no doubt fun in its own right, but the game of deepfakes is taken to the next level with apps such as Reface where you take a video and a face of your choice and voilà! You get an AI-generated video. What’s special about this video is that the face in the original video is swapped with the newly generated face you selected. If you are a movie buff, you no longer have to wonder what it would be like if Tom Cruise played the role of Iron Man instead of Robert Downey Jr

I taught an AI to cook pasta

The Lockdown forced many of us to eat and cook at home. That means lots of pasta. Sabrina gets bored by this and instead of doing the sensible thing and learning new recipes, she decides to teach her computer to generate new pasta recipes. The video is both hilarious and informative at the same time. And the best part is that she links all her resources, so you can give it a try, too. We don’t want to give away too much, just watch it. Bonus point, you’ll learn about different neural networks so you can easily justify the procrastination. 

Code Bullet’s teaching an AI how to walk 

Another one in the category of “Nah, I will not feel guilty watching Youtube because it is somewhat educational”, is a video called “A.I. learns to Walk” by Code Bullet. In this video, the “totally competent programmer” to put it in his own words, has made a Creature Creator. In this simulator you can build your own creature and use AI to teach it how to walk. The algorithm used is the NEAT algorithm which is an evolutionary algorithm that uses a death laser as an incentive to propel the creature forwards. Sit back, relax and laugh your ass off by watching some anthropomorphized blocks flopping away from a deathlazer and exploiting the physics of the simulator to get to safety. Or if you feel a bit more active than that you can play around with the creature creator yourself by going to the website.

AI Generated music

AI is taking away our jobs, now it is encroaching on our music too? A website called Jukebox, uses an AI written by OpenAI researchers to create new music replete with lyrics. They use a very science-fiction, Marvel-movie-technobabble, sounding Vector Quantised Variational Autoencoder to create new music, trained using 1.2 million songs. They produce songs in different genres like pop, rock, country, metal, taking inspiration from legends like David Bowie, Frank Sinatra and Bob Marley. Worry not though, because it takes 9 hours of computation to render one minute of audio right now.

AI Dungeon

Feel like playing Dungeons & Dragons but your friends are not up for it? Or maybe machines are your friends? Then this is for you. AI Dungeon uses “the most advanced AI” to generate a story for you. You can do whatever you want, just type it, and it will happen. You can choose from a set of different worlds to base your adventure in. The AI will guide you in good old pen and paper fashion. Just without pen and paper.

AI Dungeon now has started to monetize their service. However, you get one week of a free trial, try it out! 

Pun Generator

Are you a fan of witty puns, but struggle to come up with them yourself? Then this website will be perfect for you. It will create puns and phrases from any word you type in, within seconds. Impress your friends with punishingly bad puns that you can find here.

AI can be really funny, either with what it is used for or when it just doesn’t exactly work like you expect it to. We as a team hope you enjoyed some time off and had a good laugh. We are sure you’ll find many more instances of hilarious AI implementations. Scared of AI taking over the world? Don’t be. It’s not perfect… yet. 

Seeing was Believing: Part 1

There is a tradition in the United Kingdom every year when Christmas arrives. The Queen addresses the nation on national television. But last year was a bit different. An “alternative” message was delivered by the Queen on BBC Channel 4 [1]. Let’s take a look: 

Given the astounding nature of this video, I am sure you have guessed that this message was in fact not delivered by the real Queen. Despite knowing this, we can’t stop ourselves and wonder if our eyes have deceived us, even if it was for a split second. Such videos are popularly known as DeepFakes and they are taking the internet by the storm.

Observing this increasing popularity of DeepFakes, Turning Magazine is starting this new year with a blog mini-series titled, “Seeing was Believing”. The aim of this series is to raise awareness about DeepFakes and how one can fight against them. In Part 1 of this series, I will introduce an up-and-coming area of research called Media Forensics. In Part 2, we will dive into DeepFakes, understanding their origin and its impact over us. 

Introduction to Media Forensics

Every year key innovations dominate the tech space which generate a significant amount of hype around themselves, and 2020 was no different. From recent advances in natural language understanding with OpenAI’s GPT-3 model to the groundbreaking research in protein folding with DeepMind’s AlphaFold [2], it’s needless to say that this year has seen some breakthroughs, especially in AI. But not every advancement in AI is for the good of humanity. We are living in an era of misinformation, fueled by fake news and media content. And sadly, AI has played a huge role in this. With the rise of DeepFake technology (Figure 1), all of us are left to wonder: is “seeing is believing” even relevant in today’s time?

Figure 1. The growing interest in DeepFake technology via Google Trends (keyword = “deepfake”)

Rise of Fake Media Content

Fake media content, images or videos, go back as far as the existence of digital media. In its simplest form, fake images or videos are nothing but changes made to its original (real) version which results in a depiction that is not true in reality. This fundamental concept has not changed over time, only the techniques and tools which are employed to make these changes as realistic as possible. With this point of view in mind, we should consider all the existing computer-generated movies or movies which use CGI/VFX as fake media, too. Well, not exactly. In today’s socio-political environment, the topic of fake media is more nuanced. We need to consider other aspects such as its potential to spread misinformation and potentially cause harm to individuals in society. 

Given the rise of AI-assisted generation of fake media content, there are typically two major categories for fake images or videos: CheapFakes and DeepFakes. We already know that DeepFakes are a recent advancement in AI which use deep neural networks (specifically, Generative networks) to create manipulations in original (real) media and they are getting more realistic than ever. And then there is the other category: CheapFakes. Although quite recently coined, this type of fake media has been around for a very long time. This category of fake media creates manipulations through conventional techniques such as Adobe Photoshop or even MS Paint. If you cut out a celebrity’s face from a newspaper and stick it on someone else’s photo in a manner which makes it appear realistic, you have made a CheapFake!

Both types of fake media share an equal potential to cause serious damage to our society and democracy by spreading misinformation. Nina Schick, a leading author in the realm of DeepFakes, discusses this very issue in the MIT Technology Review. She highlights that the year 2020 belonged not just to the DeepFakes, but also the CheapFakes [3]. However, we are not completely helpless in this fight against misinformation. The creation of fake media brought about the creation of a group of researchers who develop new methods and technologies to detect and monitor the usage of fake media for malicious purposes. This niche area of research came to be known as Media Forensics. Its relevance in today’s society only increases day after day alongside the usage of fake media on the internet, especially on social media platforms, to harm or defame an individual.

Impact of AI in Media Forensics

If I could boil down the crux of media forensics, I will claim that this area of research aims to assess the fidelity of any media content in focus. For added clarity, whether a certain image or video is portraying the truth and not misleading its audience. Given the digital nature of media content, pixels are the basic building blocks of images and also videos (we ignore the audio component for now). This unequivocally means that to assess the fidelity of digital media, we must investigate the pixels of an image or a video in question. Now as the tools and techniques improve to manipulate pixels in such a way that it appears true to its audience, so will the technology to detect and monitor them.

In a pre-AI era, such manipulations were possible via software tools such as Adobe Photoshop. Several operations like copy-moving or slicing were possible where elements of a source image could be transferred to a target image, thus creating a fake image portraying a lie. As real images and videos are captured through a camera, pixel-level manipulations are possible only in post-processing, i.e. after an image or video is available in its digital form. Such manipulations create, what researchers in media forensics call, artefacts. Artefacts are discrepancies in fake media which can be exploited to design detection technology. But with the use of AI, these artefacts are getting harder to detect through conventional methods. The reason behind this is that conventional methods use manual, hand-picked features which can represent a certain discrepancy. 

Now, if such discrepancies evolve to be more intricate, the process of hand-picking features inevitably fails. Hence, media forensics quickly adapted to this paradigm shift in fake media generation and adopted the use of AI to develop so-called DeepFake detection techniques to regain the ability to exploit discrepancies left behind by DeepFakes. Luisa Verdoliva recently published a survey paper which provides a brilliant overview of the ongoing research in media forensics, with more focus on DeepFake detection [4]. It is highly recommended for any reader interested in knowing more about this research area.  

Figure 2. A growing amount of research into DeepFakes (generation as well as detection) via Web of Science (keyword = “deepfake”)

Future of Media Forensics

This era of misinformation has just begun. With new emerging technologies and social media platforms, we have to accept the fact that anything we now see online should be viewed with a high level of skepticism. But like I mentioned before, we are not helpless against this threat. Figure 2 shows a heuristic representation of academia steadily increasing their contribution to media forensics. It’s only a matter of time before major tech companies and government bodies fund more research into this field. It might seem like a dark age for truth but there’s definitely a bright future for media forensics.

Note to Reader: This article was Part 1 of “Seeing was Believing” Blog Series. Part 2 will cover a more in-depth discussion about DeepFakes: how they originated, how they are used in the world and more!


[1] Deepfake queen to deliver Channel 4 Christmas message. BBC News. 23 December 2020.

[2]  Dormehl, L. (2020), A.I. hit some major milestones in 2020. Here’s a recap. Digital Trends. 

[3]  Schick, N. (2020), Don’t underestimate the cheapfake. MIT Technology Review.

[4] Verdoliva, L. (2020). Media forensics and deepfakes: an overview. arXiv preprint arXiv:2001.06564.

The Perceptron

By now, Deep Learning and Machine Learning have become synonymous with Artificial Intelligence. Neural networks seem to be what flour is to any baker. But it has not always been like this. In fact, there has been a period of time, where neural networks were considered useless for AI applications.

This post is part of “The history of AI”, which is a blog post series about scientific milestones and noteworthy people. If you are new here, check out the other posts, starting with “What is AI and where did it come from” and the blog about the big question “Can machines think?”.

With this post, I will introduce you to the perceptron. In its core, the perceptron is the smallest building block of any neural network. And also a good way to start, if one wants to understand neural networks. This blog post is also accompanied by a Google Colab Notebook. The perceptron implementation is quite straightforward and might help you understand the principle better. Play around with the code a bit to understand the individual parts. Of course, you will need to know at least some basic Python to understand the code. But the blogpost itself can be understood without any coding knowledge whatsoever.

The perceptron

The perceptron is an example of a binary classifier. It can learn to differentiate between two distinct classes. These classes need to be linearly separable. This means that if you would plot each of your data points on a graph, you need to be able to draw a straight line which separates the two classes. If that is not possible, the perceptron is not the right classifier for this type of problem (disclaimer: it still might be by using polar coordinates. But that goes beyond the scope of this article).

To fulfil all cheesy Christmas cliches with this post, let’s assume Santa wants to have a program telling him which child will deserve a present and which not. Of course, this is dependent on two different factors:

(1) naughtiness

(2) the cookies they placed on the plate for Santa.

Conveniently, this data is linearly separable. Our two classes are “gets present” (1) and “no present” (-1), so we can also check the binary classifier criterion.

The goal

Before we start looking at how we get a classifier for our Santa-problem, we should have a look at what we are trying to achieve here. Our data is linearly separable, which means that we are able to draw a straight line through the plot dividing the two classes. And this line is exactly what we want to get from the classifier. We need a function in which we can input a child’s naughtiness and cookie score and see if the child will get a present or not.

The setup

The perceptron requires a number of input values:

  • ?(x),
  • a set of initial weights ?0;
  • and a learning rate.

The input values are the coordinates of our data points. For our example, this corresponds to naughtiness x1 and the cookie-count x2. Additionally, it is common to add a bias term x0 = 1 to the input weights, resulting in an input vector that has one more number than your amount of input values for one data point.

The initial weights are the first guess of what the function may look like. These come in the same form as the input vector with exactly one value more than the number of input values. You can use any value here. Some starting points may be smarter than others, but you will not always know. We’ll be using (0,0,0) here.

Lastly, we need a learning rate. If you are not familiar with learning rates, don’t worry and stick around. It will be much easier to understand when going through the algorithm.

The algorithm

From a broad perspective, a perceptron tries to find the function which correctly divides the data into its two classes by iteratively adjusting the function. Considering one datapoint per iteration, the algorithm checks if this specific point is correctly classified, by predicting its class with the weights. If so, it continues to the next point, keeping the current weights. If not, it updates them with the Perceptron Update Rule.

Okay, let’s add some math to this.


To predict the class, we need two formulae.

As a first step, we calculate the dot product between the weights and our input vector. We do this by just multiplying each element of the input vector to its respective element of the weight vector (e.g. the first number in the input vector times the first value in the weights vector). And then we only have to add the resulting values.

If we were to take our first child with a naughtiness score of 2 and a Santa-cookie-rating of 1, the calculation would be the following:

Unsurprisingly, we end up at 0. If we plug this into the second function, we get f(a) = 1. But this is not right! This child was not meant to get any present (I mean, cookie rating of 1? Santa does not want to return there surely). So we need to update our function.


To update, we have to adjust the weights we’ll be using from now on. We do this with the following function:

?k+1 = ?k + ? × ?(xn) × tn

There are a couple of symbols here, let me quickly explain.

K is the current iteration, making ?k+1 the new weights we are just calculating and ?k those which just got us a wrong prediction. We have already seen ?, which is the learning rate, and ?(xn) is still our input vector. tn is the class that would have been correct for the current x.

Here you can see what the learning rate is for. It controls the impact our update has on the new weights. High learning rates cause big updates, small learning rates cause a smaller update.

We are now left with ?k+1, which will be the weights for our next iteration. We take the next data point, plug it into the formula from above, using the new weights:

And … wrong again. This child should have gotten a present. So we update our weights again

and use them for iteration 3:

and finally, we get a correct prediction.

But this does not mean that the next will be right again. This cycle has to continue until we (1) find a set of weights that does not change after we have used it for every single data point (so if we have found weights that predicts correctly for each of the children whether or not it will get a present) or (2) if we reach a certain stopping criterion. A stopping criterion could, for example, be a maximum number of iterations we want to loop through before stopping.

I used the Santa-helping-tool (Check out the Google Colab Notebook) to calculate the final weights:

? = (0, -0.1, 0.1)

We see that there was no change anymore after we updated ? the last time.

Now, what does this mean for Santa? Well, all he has to do now is take the data he has collected on every child and plug it into the formula from above with our weights.

The output will tell him if the child deserves a present or not.

We can transform the weights into a nice function and draw it into the graph from above to see what actually happened:

As the graph shows, our weights gave us the line which nicely separated the kids who will get a present from the poor suckers who won’t get any.

The perceptron and neural networks

Up to this point, it is maybe not really obvious how the perceptron and neural networks tie together. This becomes more apparent when looking at this graph:

What you see here is exactly what we did. The dot product of the input point and the weights are exactly the same as multiplying xn by ?n and summing all resulting values. In the final step, the sum is converted to an output value of either 1 or -1.

Going back to our Santa-example, x1 is our bias term of 1, x2 the child’s naughtiness, and x3 the cookie rating. The output, you will have guessed it, tells us whether or not the child will receive a gift.

If you are familiar with neural networks you see how this is what happens in neural networks on a fundamental level

For all those who are not, let’s have a look at this graphical representation of a simple artificial neural network (ANN). You can see that it is just a more complex perceptron, using the output of one perceptron as the input of another.

The history

Despite its rather simple mechanisms, the history of the perceptron is what makes it so interesting. The perceptron was first introduced by Frank Rosenblatt in “Principles of Neurodynamics” (1962). It combines two important works: McCulloch & Pitt Neuron and Hebb’s Rule.

The McCulloch & Pitt Neuron is a simple binary neuron that is able to do logical operations. It takes 1 or 0 as an input and performs Boolean operations with it. However, it does not have any weighting of the input nor is it able to learn as the perceptron does with the perceptron update rule.

The learning part of the perceptron comes from Hebb’s Rule. It describes a learning rule, which is commonly summarized as “Neurons that fire together, wire together”. The weights in the perceptron can be seen as the neural activity. Only if the activity reaches a certain threshold, the neuron fires.

Let’s take a quick step back to our Santa example. Here, “firing” means we get an output of 1 and a Santa-helper-elf jumps up to wrap a present. But in order for this to happen, the dot product of the child’s scores and the weights need to be bigger or equal to 0. Otherwise, the output of f(a) (remember, a is the dot product and f the second function for the prediction) will be 0. So our dot product needs to reach the threshold of 0 in order for our activation function (f(a)) to fire.

Let’s go back to 1962. Rosenblatt was extremely confident about his invention and claimed that it could solve any classification problem in finite time. Not all followed his enthusiasm. In 1969 Minsky and Papert published a book called “Perceptrons: An Introduction to Computational Geography”. It discusses the main limitation of the perceptron: it can only find a linearly separable function.

In our Santa-example the data was linearly separable (what a coincidence…). But not all, in fact, most real-life classification problems are not linearly separable. Minsky, therefore, concluded in the book that no significant advances are to be expected from studying the perceptron. This claim would put the research on perceptrons and neural networks into hibernation until the ‘80s.

Fast forward to now: neural networks are a fundamental part of today. This can be explained by an oversight of Minsky. Yes, one perceptron is not able to classify non-linearly separable data. However, if you connect several perceptrons into a neural network, it is able to solve complex classification problems and learn non-linear functions. This, together with backpropagation and hyperparameters (stories for another time), was what was needed to revive neural networks and give them a place in the history of AI.

The moral of the story (to stay in the Christmas spirit of this post)? You should better bake good cookies, it can fix a lot.

Diving into the Wave of the AI Job-Revolution

The AI revolution is here. What has been talked about for the last two decades is finally being realised. It’s here, with all its hype, and it is here to stay. Every company wants the proverbial slice of pie and wants to ingrain the use of Artificial Intelligence (AI) in its products. There has been a massive amount of funding in labs and companies across the world for developing new AI research. With this, fears have arisen about AI taking over the world as well as our jobs. There are statements like “Millions of jobs will be replaced by AI”. Or will they? To answer this question, we have to go back a few decades.

There is no reason anyone would want a computer in their home.

Key Olsen, founder of Digital Equipment Corporation, 1977.

Somehow, the microcomputer industry has assumed that everyone would love to have a keyboard grafted on as an extension of their fingers. It just is not so.

Erik Sandberg-Diment, New York Times columnist, 1985.

Both of these statements have not aged well at all. This is not to throw shade at them, but to show how predicting the impact of any new technology is extremely hard.

Every new technology, especially in the last two and a half decades, has brought about a revolution in the job market. Yes, many jobs were permanently lost. However, each new technological revolution also brought about completely new jobs which were hitherto unknown. For all the jobs lost because of the internet, like media distribution stores, encyclopedia salesman, librarians, phone-book companies, there have emerged new jobs like social media managers, web developers and bloggers. It gave so many people a platform to sell their products and services, as well the ability to reach out to a huge number of people faster. In fact, it also transformed some of the pre-internet jobs and boosted them just because of social media’s easy reach. Similarly, the smartphone revolution killed off the need to use so many devices like a separate radio, mp3 player, point-and-shoot cameras, physical maps, and even wristwatches; smartphones have fast become a behemoth contributor towards the economy, with mobile tech generating $3.3 trillion in revenues in 2014 [1]. Jobs like social media influencer and online tutors on Youtube and various educational websites are available at fingertips thanks to smartphones.

Warehouses in Guanajuato, Mexico

AI development is extremely fast; however, the background frameworks it needs has not grown at the same pace. It also has applications in the real world, and thus, paints a target on its own back for criticism. There is a huge clamour about how it will replace humans in jobs in many journals and articles. While some jobs will be lost due to AI, it will also produce new opportunities. According to the Autonomous Research Report in 2018, 70% of the front office jobs like tellers, customer service representatives, loan interviewers and clerks will be replaced by AI technologies like chatbots, voice assistants and automated authentication; however, according to Accenture, there will be a net gain in jobs among companies using AI, as well as a net gain in revenues. They claim that AI will create new roles like explaining the deployment of AI technologies, which would still be done by humans. The reasoning they give for the latter is that AI will help people with advice at investments and banking, an improvement on the human agents. In the same article from American Banker [2], the senior VP of First National Bank of Wynne in Arkansas argues that people are replaceable, AI is not. That is, if a person makes mistakes repeatedly, they can be replaced by another human more capable of doing the job; a malfunctioning AI, however, cannot be fired, it needs to be shut down, and replaced with a human, which makes companies wary of using AI. Research by the Royal Bank of Canada states that humane skills like active listening and social perception will help prospective job applicants complement AI technologies, rather than compete with them.

These discussions come from management employees, but that doesn’t mean that we have to agree to them necessarily. They state that there will be more jobs available. However, we also need to look at what kind of jobs AI will create. If we look at job markets as a whole, the trend would be that indeed, new jobs will be created. If we decide to look closely, however, we can see that the kind of jobs AI will take away are the main source of livelihoods for many working-class people. A taxi driver would not be too amused to see an autonomous truck taking away their passengers and wages, especially if they’re struggling to pay their rent each month with a job. I would also respectfully disagree with the aforementioned VP of the Bank; the sentence ‘humans are replaceable’ is not a good look with the current job market prospects. We should be protecting our workforce with the help of technology, not replace them with more technology. We also need to consider that people need to adapt to the new technologies; some might not have the resources to do so. Do they then try to find different jobs, or risk being left behind with the new revolution? 

There are discussions in companies like Accenture, that AI will maximise profits for companies, which will lead to maximization of growth and in turn more employees as companies join more global markets. With even more widespread use of AI, it will be necessary to regulate the ethics behind the usage. This will necessitate employing humans to monitor the decisions taken by AI tools. An example of AI being limited to a single purpose job is given in this New York Times article [3], which quotes an important MIT report [4], “A finely tuned gripping robot can pluck a glazed doughnut and place it in a box with its shiny glaze undisturbed, but that gripper only works on doughnuts, it can’t pick up a clump of asparagus or a car tyre.” A general-purpose AI is still a few years away. AI is normally designed as a tool to help humans do better at their jobs. In fact, ASAPP, a New York-based developer of AI-powered customer service software trains human call centre representatives with the help of AI to be more productive, rather than replacing them with AI [5]. Combining human jobs with AI seems to be the best way to go forward, delicately achieving a balance between productivity and human ingenuity.

The effects of AI will be seen in almost all industries. Since AI is a tool which can be applied in so many industries, there is a huge push to apply AI in various domains like medical imaging, neuroscience, business analysis to even sport science. The very essence of AI is such that it pervades all kinds of job markets without discrimination. It is going to change the landscapes of many job markets. Whether you are a taxi-driver, or a doctor, or an assembly-line worker, it is going to affect your job. The hardest hit will be people who are in no position to learn the new skills needed to complement AI entering our lives. This is because as of now, AI is great at doing single purpose repetitive tasks efficiently. AI is not a messiah which will take us to a proverbial promised land, nor is it a weapon of mass destruction wiping out the planet, it is somewhere in the middle, and like with every new technology, we need to adapt. In fact, the balance is more towards the negative. Have we thought about the sheer displacement of people with more and more jobs utilizing AI? Is AI, and especially complex deep learning models, even necessary in tasks? We need to ensure that we’re not just falling victims to following trendy buzzwords and trying to incorporate the latest technology in our services. We have lots of experience in dealing with revolutions, we need to find a way to deal with this latest revolution and ensure it becomes close to the messiah. Thankfully, we still are some years away from AI pervading all job markets, so we can make concrete plans to handle smoother takeovers. 

To tackle the wave of AI revolution and ensure that we’re not left behind on the kerb while AI takes our jobs away, we need to keep reinventing ourselves. Yuval Harrari, in his article the “Reboot for the AI revolution”, sums it up beautifully. He says, “…humankind had to develop completely new models [to adapt to the Industrial Revolution] — liberal democracies, communist dictatorships and fascist regimes. It took more than a century of terrible wars and revolutions to experiment with these, separate the wheat from the chaff and implement the best solutions.” [6] He then adds that we need a working model to overcome the challenges introduced by the new technology. These words may prove to be seminal in the coming years as AI starts disseminating in all job markets. We may need to turn the wheel- or reinvent it completely, as we adapt to the new revolution.

This article delved into how AI is affecting the job industries. The next follow-up article, “Riding the Wave of AI Job Revolution”, of the same series will cover how we can improvise and adapt to this challenging situation. 


[1] Dean Takahashi, “Mobile technology has created 11 million jobs and $3.3 trillion in revenues”, (2015),

[2] Penny  Crosman,  “How artificial intelligence is reshaping jobs in banking”, (2018),

[3] Steve Lohr, “Don’t Fear the Robots, and Other Lessons From a Study of the Dig-ital Economy”, (2020),

[4] Elisabeth Reynolds, David Autor, David Mindell,  “The Work of the Future: Building Better Jobs in an Age of Intelligent Machines”, (2020),

[5] Kenrick Cai Alan Ohnsman, “Meet The AI Designed To Help Humans, Not Replace Them”, (2020),

[6] Yuval Noah Harari, “Reboot for the AI revolution”, Nature News550.7676(2017), p. 324.

A Post-Pandemic Outlook on the Future of AI Research

The year 2020 is most likely to play an instrumental role in how Artificial Intelligence (AI) will evolve in the years to come. It won’t just be how it is perceived in the world’s view (be it good or bad), but also how the AI research community will change. The world right now is in a state of crisis, the COVID-19 pandemic, and we are yet to observe the end of this crisis and any lasting repercussions which might follow.

Where Are We?

A myriad of reports and articles are published on a daily basis which discuss the socio-economic implications of the COVID-19 pandemic, for example [1]. Additionally, this pandemic has brought forth the rise of AI in ways which are astonishing as well as disconcerting. Since the outbreak of COVID-19, AI has significantly contributed to the healthcare industry [2] through applications such as predicting new cases, drug discovery and more. But the pandemic has also inadvertently fuelled the world of surveillance [3] we live in today. Such a rapid development and controversial usage of AI raises concern regarding our privacy and security in the future. Yuval Noah Harari, an Israeli historian and the author of Sapiens (2011), discusses [4] the future of humanity as well as poses some daunting questions regarding our response towards the expected impact after COVID-19. The article discusses how governments are using the pandemic as an excuse to abuse the state of emergency by forcing upon “under-the-skin” surveillance and essentially, setting up a totalitarian regime.

It’s quite natural to find yourself in a state of confusion as to how this grim description of a post-pandemic world connects to the future of AI research. But it should be considered vital and even instrumental in shaping the future of AI research. 

We find ourselves in a thicket of strategic complexity, surrounded by a dense mist of uncertainty

– Nick Bostrom, Superintelligence: Paths, Dangers, Strategies

What Was and What Is?

The evolution of any industry is ever-changing, but certain trends and historical patterns have enabled us to forecast their future, at least to an extent. The AI industry is not stand-alone but rather closely associated with a multitude of other industries, such as healthcare, business, agriculture and more. There has been a significant rise in the adoption of AI by such industries in just the last decade and this has increased the need to forecast AI’s impact in the future and how it will inevitably shape our society. Books like Superintelligence (Nick Bostrom, 2014) and Human Compatible (Stuart J. Russell, 2019) have discussed this and presented their predictions. Stakeholders who fund the advancement of AI make decisions based on such predictions. This in turn drives the AI research community. 

But with the crisis we face today, the status quo is going to change. Seeing how AI research has been conducted since the start of COVID-19, the focus should not only be on the outcomes of a certain study but also the way such studies are conducted. For the sake of simplicity, this article focuses on AI in healthcare but also emphasizes that the presented arguments apply to AI research in general. 

In Focus: AI in Healthcare

The astounding increase in the number of cases during this pandemic pushed the AI community, both academia and industry, to divert their resources in providing any form of support which could be essential in this fight against the virus. AI research in medical imaging [5], specific to COVID-19, has enabled researchers as well as doctors to train and deploy predictive models which can contribute towards patient diagnosis. There is also an increase in drug discovery research which will help researchers to identify vaccines which can be tested and distributed to everyone. AI in drug discovery has been very favourable when compared to conventional clinical trials as AI speeds up the process of developing new drugs as well as drug testing in real-time [2].

So What’s the Issue?

Once the crisis is over, companies are expected to invest even more into AI research [6] and government bodies are expected to increase their involvement [7] to use AI to plan strategies against future pandemics as well as empower other industries which could benefit from AI. Dana Gardner [8] discusses in his podcast that the data collected over the pandemic will be a key factor of how AI will shape the post-pandemic world.

Despite the extensive amount of AI research in such a short time, they pose an inherent flaw of being products of AI black-box systems. Such systems spit out numbers and leave us humans to derive meaning out of it. A majority of research in AI is purely focused on the final results (such as how they perform on a benchmark dataset) rather than how they arrived at it, especially the research conducted by AI start-ups. Even though this attitude was due to the urgency of this crisis, it has been around for quite some time. There are multiple applications such as image recognition which require only numbers to make sense but when we enter human-centric applications such as healthcare, mere numbers are simply not enough. A high accuracy model is not guaranteed to attain high efficacy in such areas and we need to ask ourselves: Will this style of research, a pursuit to train a model with highest accuracy in the shortest amount of time possible, continue in a post-pandemic world? 

Explainable AI, commonly referred to as XAI, deals with research into developing interpretable systems which can provide explanations for its decision making in any given scenario. This is a step ahead of the current “black-box” AI systems.

This argument brings forth Explainable AI, an area of research which is still in its infant stage. What has been already done cannot be changed, but we can and should learn from these past few months. The amount of data in the future, especially in industries such as healthcare, is going to explode and how we handle this data and design new AI systems with it should be the crux of future AI research. Until this pandemic, the questions posed by most AI critics regarding the black box nature of high accuracy models were mostly hypothetical, for example, the trolley problem. But now, the decisions made by these AI systems during the pandemic affected very real humans. We are clearly outside the hypothetical debate and the need to tackle this issue is of utmost importance.

How Do We Move Ahead?

Besides the importance of AI systems being more explainable, at least to the involved stakeholders, new policies and legislations are required which can dictate the plan of action of AI research during a crisis mode, such as COVID-19. Such policies around AI research will introduce important standards and guidelines, similar to other areas like healthcare and environmental sustainability, to tackle problems which are not only ad hoc but also consider the societal as well as ethical implications of the research in question.

[1] Nicola, M., Alsafi, Z., Sohrabi, C., Kerwan, A., Al-Jabir, A., Iosifidis, C., … & Agha, R. (2020). The socio-economic implications of the coronavirus pandemic (COVID-19): A review. International journal of surgery (London, England), 78, 185.

[2] Vaishya, R., Javaid, M., Khan, I. H., & Haleem, A. (2020). Artificial Intelligence (AI) applications for COVID-19 pandemic. Diabetes & Metabolic Syndrome: Clinical Research & Reviews.

[3] Yuan, S. (2020). How China is using AI and big data to fight the coronavirus. Al Jazeera.

[4] Harari, Y. N. (2020). The world after coronavirus. Financial Times, 20.

[5] Bullock, J., Pham, K. H., Lam, C. S. N., & Luengo-Oroz, M. (2020). Mapping the landscape of artificial intelligence applications against COVID-19. arXiv preprint arXiv:2003.11336.

[6] Global companies will invest more in AI post-pandemic. Smart Energy International, Sept 18, 2020.

[7] Jad Hajj, Wissam Abdel Samad, Christian Stechel, and Gustave Cordahi. How AI can empower a post COVID-19 world. strategy&, 2020.

[8] Dana Gardner. How data and AI will shape the post-pandemic future. July 7, 2020.