Regularization in Neural Networks

In Deep Learning it is necessary to reduce the complexity of model in order to avoid the problem of overfitting. Also, the model should be able to generalize well. This problem can be solve by using regularization techniques.

Technically, overfitting harms the generalization. So, our primary goal is to solve the problem of overfitting, which will automatically solve the problem of generalization.

Avoiding overfitting can single-handedly improve our model’s performance.

###### Regularization Techniques

Here, we’ll learn a few different techniques in order to apply regularization in deep learning.

###### 1. L1 & L2 method

These are the most common methods. These update the general cost function by adding another term known as the regularization term.

Cost function = Loss term + Regularization term

Due to the addition of this new term, the values of weight matrices decrease because it assumes that a neural network with smaller weight matrices leads to simpler models. Therefore, it will also reduce overfitting to quite an extent.

**Mathematical representation of L1 and L2 Regularization term** –

In L1 regularization we consider the absolute value of the weights.

In L2 regularization, regularization term is the sum of square of all feature weights. L2 method makes the weights to be small but does not make them zero and does non sparse solution.

###### 2. Data Augmentation

Particularity in computer vision the prediction should be unchanged under one or more transformations of the input images. In other words, the testing accuracy should not be affected by change in position of object or its size within the frame.

If sufficiently large numbers of training patterns are available, then an adaptive model such as a neural network can learn the in-variance, at least approximately.

This involves including within the training set a sufficiently large number of examples of the effects of the various transformations.

Thus, for translation in-variance in an image, the training set should include examples of objects at many different positions. This is called data augmentation.

Data Augmentation helps to increase training dataset. There are a few ways of increasing the size of the training data – rotating the image, flipping, scaling, shifting, etc.

###### 3. Early Stopping

On testing dataset, if loss begins to increase and accuracy begins to decrease then the model stops to perform. At this level the model have low variance and it generalize the data well. Training the model further would increase the variance and lead to overfitting. This regularization technique is called “early stopping”.

###### 4. Dropout

This is the most common and most used technique in the deep learning because of its good performance. It reduces overfitting to next level.

Let’s try to understand it from figure given below :-

As shown in the second figure some of the nodes are being randomly removed.

At each iterations the different set of nodes gets remove and as a result we get different outputs. This makes the model stop from over learning.

Dropout is usually preferred when we have a large neural network structure in order to introduce more randomness and stop overfitting.

###### References

- Section 5.5 Pattern Recognition and Machine Learning, 2006.
- Chapter 9 Learning and Generalization, Neural Networks for Pattern Recognition, 1995.
- Wikipedia