Explained In A Minute: Neural Networks
03.03.2018 - Samuel Arzt
This is the accompanying blogpost to my YouTube video Explained In A Minute: Neural Networks. There were a lot of things that did not fit into the video. This post describes the difference between feedforward and recurrent Neural Networks, different architectures and activation functions, and different methods for training Neural Networks.
In the last years there was a huge hype around Neural Networks and Machine Learning in general. My video explains the very basic principles behind Neural Networks and how they process their inputs. Here are some things my video didn't mention:
Feedforward / Recurrent Neural Networks and other Architectures
My video only explains the most basic / simple type of Neural Network. This specific architecture can be referred to as a fully-connected, feedforward Neural Network. Feedforward, means its neurons simply feed their output forward to the next layer, without any connections feeding to the same or previous layer. Fully-connected means that each neuron of a layer has an incoming connection from all neurons of the previous layer.
However, there are many other types of Neural Networks and Neural Network architectures. While they are all composed of the same basic concepts (such as neurons, weighted connections and activation functions), specific architectures can make solving certain tasks easier.
Basically, there are two main types of Neural Networks: Feedforward Neural Networks (FNNs) and recurrent Neural Networks (RNNs). Recurrent Neural Networks differ from feedforward ones, in that they have connections to neurons of the same layer or of previous layers. While feedforward networks don't have any sense of time (each input is processed in the same way, independent of previous inputs), recurrent Neural Networks can keep an internal state through these interconnections, which is updated each timestep. I am planning on doing a separate video on recurrent Neural Networks, but if you want to learn more about them right now, I highly recommend Andrej Karpathy's blogpost about them (it's really good!).
Furthermore, there are different architectures for both feedforward as well as recurrent Neural Networks. "Architecture" meaning that the connections and topology of the network are arranged in a certain way that is known to work well for specific problems. Probably the most well-known architecture for feedforward Neural Networks are Convolutional Neural Networks (ConvNets / CNNs). Convolutional Neural Networks are mainly used for image / object recognition tasks and hold the current state-of-the-art in many related disciplines. They are particularly good at solving these problems because they utilize prior knowledge about the invariances of 2D shapes in their architecture. I am also planning on explaining Convolutional Neural Networks in more detail in a future part of the series.
Training Neural Networks
My video only very briefly touched upon the topic of actually training Neural Networks at the end. While it is theoretically possible to adjust the individual weights of the network by hand until it outputs the desired values, this gets infeasible very, very quickly. In practice Neural Networks always use hundreds of thousands or millions of weights.
The most popular method of training Neural Networks is Machine Learning. Machine Learning can be classified into three main types: Supervised Learning, Unsupervised Learning and Reinforcement Learning. The most popular Machine Learning algorithms use some sort of gradient descent, a way of automatically updating the weights step by step into a direction that will make them less wrong. My next part in this series will explain Machine Learning and its subcategories in more detail.
Then there are also Evolutionary Algorithms, which I personally also consider Machine Learning (but I am not quite sure, whether the general public agrees on that). Evolutionary Algorithms are inspired by biological evolution, however they obviously only represent crude simplifications of natural processes rather than biological reality. There are many different types of Evolutionary Algorithms, but they typically share a cycle of 4 steps: Evaluation, Selection, Recombination, Mutation. In other words: A number of random subjects is generated; these are evaluated on how good they can solve the problem; the best are selected and combined to form new subjects; the new subjects are mutated; repeat. For example, I have employed a specific kind of Evolutionary Algorithm, called Genetic Algorithm, to train Neural Networks in this video. I am also planning on explaining Evolutionary Algorithms in a separate part of the series.
Activation Functions
I have only very briefly mentioned the activation function in my video. Before a neuron passes its value to its connected neurons, it typically transforms its calculated value with a specific mathematical function. Every neuron can have its own activation function, although you will mostly see the whole layer using the same function. The function used in my video is the sigmoid function. Historically, this was the standard, however nowadays there are many other functions being used, such as tanh, softsign, ReLU, etc. and the sigmoid was found to be suboptimal for certain setups.
I hope this video and blogpost serve as a good starting point for understanding basic Neural Networks and their underlying concepts. Keep up to date with my upcoming videos by subscribing to my YouTube Channel and by following me on Twitter.