Convolutional Neural Network (CNN) is a class of Deep Learning, mainly use for Computer Vision. It is similar to artificial neural network, only difference is it uses convolutional mathematical linear operation instead of simple matrix multiplication in at-least one of its layer. Building a Convolutional Neural Network is nothing but building a human eye, how human see the world and recognize the pattern.

Steps involved in Convolutional Neural Network
Step 1 – Convolution
Convolution function is given by :

A convolution is basically a combined integration of two functions, and it shows how one function modifies the other function. We does convolution to filter the input image or to extract the features from input image. Let’s consider 7×7 input image matrix with pixel value range from 0 to 255 , where pixel values are only 0 and 1, and consider a feature detector or in simple words a filter of size 3×3 matrix.

From this above figure we can see that the input image is get compress a bit after passing through filter and we got feature map. The whole point of getting a feature map is to make image smaller, because it will be easier to process.
Calculation –

In this way we get feature map from input image. To get more accurate result we can use more than one filter.
Stride –
Stride is the number of pixels by which we slide our filter matrix over the input matrix. When the stride is 1 then we move the filters one pixel at a time. When the stride is 2, then the filters jump 2 pixels at a time as we slide them around. It is better to use more than one stride when dealing with real life image to increase the speed of processing.

Padding –
We can see that the size of output image is smaller than input image. So, to maintain the dimension of output image as input image we use padding.

Rectifier Function –
As the data we’ll use in real life would be non-linear. So, to make image more non linear, we use rectifier function or ReLu Layer with feature map. ReLU stands for the Rectified Linear Unit and is a non-linear operation. Its output is given by:

Step 2 – Pooling
Pooling is used to extract smooth and sharp image. There are different types of pooling available like max pooling, average pooling, sum pooling etc. Max pooling is most popular technique and performs better than average pooling. Max pooling helps in extracting low level features by selecting maximum element from the particular region of the feature map. the output after max-pooling layer would be a feature map containing the most important features of the previous feature map.

Step 3 – Flattening
Flattening is the process of converting the pooled feature map to 1-Dimensional vector, because we cannot apply 2 or more dimensional vector to the neural network.

Step 4 – Fully connected Neural Network layer
Fully connected layer is also called as hidden layers. So, basically that whole column or vector of outputs that we have after the flattening, we are passing it into the input layer of Artificial Neural Network. The main purpose of artificial neural network is to combine our features into more attributes that predict the classes even better. Over a series of epochs, the model is able to distinguish between dominating and certain low-level features in images and classify them using the Softmax Classification technique.

I have discuss artificial neural network in previous blog.