2018年2月7日星期三

27 Neutral Network Explained in Graphics

Neural network is an essential aspect of Machine Learning. It can be easily undertstood as a system of computer hardwares/softwares that works in a way inspired by mimic the human brain. Through massive trainings, such system learns from examples and generally without task specific programming just like a real human does. In this article, I’m going to briefly explain the 27 neutral networks one by one in an easy-to-understand language. 
(Source: Fjodor van Veen from Asimov Institute)


1.Perceptron (P)
The simplest and oldest neural network we've known for a long time. It's a linear classifier that takes in input, combines the input then maps to the corresponding outputs via particular functions. Nothing really fancy about it.

2.Feed Forward
Feed Forward is another oldest neural network we know. It was originated from the 1950s. A feed forward system generally includes the followings : 
  • All nodes are linked
  • No feedback loop to control the output
  • There’s a layer between input and output layer (the hidden layer)
In most cases, this type of network uses back-propagation methods for training.

3.Radial Basis Network
Radial Basis Network is actually a Feedforward with activation function, a radial basis function instead of a logic function.

So what is the difference between these two kinds of networks?
A logic function maps an arbitrary value in the range [0, ... 1] to answer the "yes or no" question. While this is applicable to a classification system it does not support continuous variables.

On the contrary, a radial basis function shows "how far we are from the target." This is perfect for function approximation and machine control (For example, it can be a sub for PID controller).
In short, an RBF is an FF with different activation functions and applications.

4.Deep Feed Forward
DFF has opened the Pandora's Box for Deep Learning in the early 90s. These are still Feed Forward Neural Networks, but with more hidden layers.

When training the traditional FF model, we only pass a small amount of error information to the next layer. With more layers, DFF is able to learn more about errors; however, it becomes impractical as the amount of training time required increases with more layers. Until the early 00s, we have developed a series of effective methods for training deep feedforward neural networks which have formed the core of modern machine learning systems today and enable the functionality of feedforward neural networks.

5.Recurrent Neural Network

RNNs introduce a different type of neurons: recurrent neurons. 
The first network of this type is called the Jordan Network, where every hidden neuron receives its own output after a fixed delay (one or more iterations). Other than this, it is very similar to ordinary fuzzy neural networks.
Of course, there are other changes - such as passing state to the input node, variable delay, etc., but the main idea remains the same. This type of neural network is mainly used when the context is important - the past iterative results and the sample-generated decisions can impact the current ones. An example of the most common context is text analysis - a word can only be analyzed in the context of the preceding words or sentence.

6.Long/Short Term Memory(LSTM)
LSTM introduced a memory unit, a special unit that makes it possible to handle data along with variablee time intervals.  Recurrent neural networks can process text by "remembering" the first ten words, and LSTM long and short memory networks can handle video frames by "remembering" what happened from the many frames before. LSTM networks are also widely used for text and voice recognition.

The storage unit is actually made up of components called gates, which are recursive and control how the information is "remembered" and "forgotten". The figure below illustrates the structure of LSTM:
The above (x) is the gate, they have their own weight, and sometimes have activation function. On each sample, they decide whether to pass the data, erase the memory, and so on.

You can find more comprehensive explaination of LSTM here: 
http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Input Gate determines how much of the last sample is stored in memory; the output gate adjusts the amount of data transferred to the next level, and the Forget Gate controls the rate at which memory is stored.

7.Gated Recurrent Unit
GRU is a LSTM with different gates.
The lack of an output gate makes it easier to repeat the same output multiple times based on a specific input. It is now most often used in sound (music) and speech synthesis.

Though the actual combination is a bit different: all LSTM gates are grouped together into a so-called Update Gate and the Reset Gate is closely related to the input.
They consume less resources than LSTM but have almost the same effect.

8.Auto Encoder
Autoencoders are used for classification, clustering and feature compression.

When you train a feed forward (FF) neural network for classification, you mainly have to provide X examples in Y categories and expect one of the Y output cells to be activated. This is called "supervised learning."

On the other hand, auto-encoders can be trained without supervision. Their structure - when the number of hidden units is less than the number of input units (and the number of output units equals the number of input units), and automatic encoder is trained in a way that the output as close to the input as possible , the automatic encoder is forced to generalize the data and search for common features.

9.Variational Auto Encoder
Compared with an auto encoder, a VAE compresses the probability, not the feature.

In spite of this simple change, an auto encoder can only answer questions like, "How do we summarize the data?", while a VAE answers questions like "How strong is the connection between the two things? Should we divide the error between two things or are they completely independent? "

More in-depth explanation can be found here: https://github.com/kvfrans/variational-autoencoder

10.Denoising Auto Encoder
Although auto-encoders are cool, they sometimes fail to find the most proper features but rather adapts to the input data (in fact an example of over-fitting).

The Noise Reduction Automatic Encoder (DAE) adds some noise to the input unit - changing data by random bits, randomly shifting bits in the input, and so on. By doing this, a forced noise reduction auto-encoder reconstructs the output from a somewhat noisy input, making it more generic, forcing the selection of more common features.

11.Sparse Auto Encoder
Sparse Auto Encoder (SAE) is another form of auto encoding that sometimes pulls out some hidden packet samples from the data. The structure is same from AE, but the number of hidden cells is greater than the number of input or output cells.

12.Markov Chain
Markov Chain (MC) is an old chart concept. Each of its endpoints is assigned with a certain possibility. In the past, we have used it to construct a text structure like "dear" appears after "Hello" with a probability of 0.0053%, and "you" appears after "hello" with a probability of 0.03551%.

These Markov chains are not typical neural networks. They can be used as probability-based categories (like Bayesian filtering), for clustering (for some categories), and also as finite state machines.

13.Hopfield Network
The Hopfield Network (HN) is trained by a limited set of samples so that they react to known samples using the same sample.

Before training, each sample is used as an input sample, as a hidden sample during training and as an output sample after it has been used.
When HN tries to reconstruct the trained samples, they can be used to denoise the input value and repair the input. If half of the pictures or sequences are given for learning, they can feed back to the entire sample.

14.Boltzmann Machine
Boltzmann machine (BM) is very similar to HN, with some cells marked as inputs as well as hidden cells. When the hidden unit updates its status, the input unit becomes the output unit (In training, BM and HN update units one by one instead of in parallel).

This is the first network topology that successfully preserves the simulated annealing approach.
Multi-layered Porzman machines can be used for so-called deep belief networks (to be introduced shortly), and deep belief networks can be used for feature detection and extraction.

15.Restricted Boltzmann Machine
The restricted Borzmann machines (RBMs) are very similar to BMs in structure, but constrained RBMs are allowed to be trained back-propagating like FFs (the only difference is that the RBM will go through the input layer once before data is backpropagated).

16.Deep Belief Network
Deep Belief Network (DBN) is actually a number of Boltzmann machines (surrounded by VAE) togther. They can be linked together (when one neural network is training another), and data can be generated using patterns learned.

17.Deep Convolutional Network
Today, Deep Convolutional Network (DCN) is the superstar of artificial neural networks. It has convolutional units (or pools) and kernels, each for a different purpose.
Convolution kernels are actually used to process incoming data, and pooling layers are used to simplify them (in most cases using nonlinear equations such as max) to reduce unnecessary features.
They are usually used for image recognition, they run on a small part of the image (about 20x20 pixels). The input window slides one pixel by pixel along the image. The data then flows to the convolution layer, which forms a funnel (compression of the identified features). In terms of image recognition, the first layer identifies the gradient, the second layer identifies the line, and the third layer identifies the shape, and so on, up to the level of a particular object. DFF is usually attached to the end of the convolution layer for future data processing.

18.Deconvolutional Network
The deconvolution network (DN) is the inverted version of DCN. DN can generate the vector as (dog: 0, lizard: 0, horse: 0, cat: 1) after capturing the cat's picture. DNC can draw a cat after getting this vector.

19.Deep Convolutional Inverse Graphics Network
Deep Convolutional Inverse Graphics Network (DCIGN), looks like DCN and DN attached together, but not exactly so.

In fact, it is an auto encoder, DCN and DN are not as two separate networks, but rather as a space that carries the network input and output. Most of these neural networks can be used in image processing and can process images that they have not been trained on before. For it's level of abstraction, these networks can be used to remove something from a picture, redraw it, or replace a horse with a zebra like the famous CycleGAN.

20.Generative Adversarial Network
The Generative Adversarial Network (GAN) represents the dual networks family consisting of generators and differentiator. They are always hurting each other - the generator tries to generate some data, and the differentiator receives the sample data and tries to discern which are the samples and which ones are generated. As long as you can maintain the balance between the training of the two neural networks, in the constant evolution, this neural network can generate the actual image.

21.Liquid State Machine
Liquid State Machines (LSMs) are sparse neural networks whose activation functions are replaced (not all connected) by thresholds. Only when the threshold is reached, the cell accumulates the value information from the successive samples and the output freed, and again sets the internal copy to zero.
The idea comes from the human brain, these neural networks are widely used in computer vision, speech recognition systems, but has no major breakthrough.

22.Extreme Learning Machine
Extreme Learning Machines (ELM) reduce the complexity behind an FF network by creating a sparse, random connection of hidden layers. They require less computer power, and the actual efficiency depends very much on tasks and data.

23.Echo State Network
Echo status network (ESN) is a subdivision of a repeating network. Data passes through the input, and if multiple iterations are monitored, only the weight between hidden layers is updated after that.
To be honest, besides multiple theoretical benchmarks, I don’t know any practical use of this Network. Any comments is welcomed.

24.Deep Residual Network
Deep Residual Network(DRN) passes parts of input values to the next level. This feature makes it possible to reach many layers (up to 300 layers), but they are actually recurrent neural network(RNN) without a clear delay.

25.Kohonen Network
Kohonen network (KN) introduces the "cell distance" feature. For the most part used for classification, this network tries to adjust their cells to make the most probable response to a particular input. When some cells are updated, the cells closest to them are also updated.
Much like SVM, these networks are not always considered "real" neural networks.

26.Support Vector Machine
Support Vector Machines (SVMs) are used for binary categorical work and the result will be "yes" or "no" regardless of how many dimensions or inputs the network processes.

SVMs are not always known as neural networks. 

27.Neural Turing Machine
Neural networks are like black boxes - we can train them, get results, enhance them, but most of the actual decision paths are not visible to us.

The Neurological Turing Machine (NTM) is trying to solve this problem - it is an FF after extracting memory cells. Some authors also say that it is an abstract version of LSTM.

Memory is content-based; this network can read memory based on the status quo, write memory, and also represents the Turing complete neural network.

I hope this summary will be helpful to anyone that's interested in learning about Neural Network. If you feel like anything needs to bee corrected or added, please contact us at support@octoparse.com. 

Source: Octoparse

More Related Articles: