Convolutional Neural Network process an image and come up with detecting low-level features in images like edges, corners, etc. These layers are successful in capturing the spatial and temporal dependencies in an image. This is done with the help of filters. But to train the model using simple CNN, we required large amount of data. sometimes the data available is to small to process. So, we need to build more complex model and train it, or we can use pretrained model instead of going through long process of training model from scratch. this method of learning from one predefined and trained model to some new domain by reusing the network layer weights is called transfer learning.
Transfer learning is a process that allows us to use knowledge obtained from other tasks.

Transfer Learning Methods
1. Reusable Models
Reusable models means the model which is train on one task, is using on another task having similar properties in presence of insufficient data. Suppose you want to detect cars on road, and you have very less data of car images, but large data of truck images. Then, to perform task of detecting car can be done by model which is train on truck along with car images data.
2. Pre-trained models
In previous method you were using the self-build trained model on truck images to train car images. But, pre-trained model is a model created by some one else to solve a similar problem. Here, you don’t have to build and train model from scratch, because sometimes in low configuration PC’s it becomes impossible to work on large amount of data. So, instead of that there are lots of trained models available in market. This type of transfer learning is most commonly used throughout deep learning. Keras provides nine pre-trained models, they are:

The top-1 and top-5 accuracy refers to the model’s performance on the ImageNet validation dataset.
Depth refers to the topological depth of the network. This includes activation layers, batch normalization layers etc.
Popular pre-trained models description
VGG16
This network is characterized by its simplicity, using only 3×3 convolutional layers stacked on top of each other in increasing depth. Reducing volume size is handled by max pooling. Two fully-connected layers, each with 4,096 nodes are then followed by a softmax classifier.
from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.vgg16 import preprocess_input
import numpy as np
model = VGG16(weights='imagenet', include_top=False)
img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
features = model.predict(x)
Inception V3
The Inception V3 model allows for increasing the depth and width of the deep learning network, but maintaining the computational cost constant at the same time. This model was trained on the original ImageNet dataset with over 1 million training images. It works as a multi-level feature generator by computing 1 × 1, 3 × 3 and 5 × 5 convolutions.
from tensorflow.keras.applications.inception_v3 import InceptionV3
from tensorflow.keras.layers import Input
# this could also be the output a different Keras model or layer
input_tensor = Input(shape=(224, 224, 3))
model = InceptionV3(input_tensor=input_tensor, weights='imagenet', include_top=True)
img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
features = model.predict(x)
ResNet18
The ResNet model comes with a residual learning framework to simplify the training of deeper networks. The architecture is based on the reformulation of network layers as learning residual functions with respect to the layer inputs. The depth of the residual network is eight times deeper than VGG nets, but its complexity is lower.
from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions
import numpy as np
model = ResNet50(weights='imagenet')
img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
preds = model.predict(x)
# decode the results into a list of tuples (class, description, probability)
# (one such list for each sample in the batch)
print('Predicted:', decode_predictions(preds, top=3)[0])
# Predicted: [(u'n02504013', u'Indian_elephant', 0.82658225), (u'n01871265', u'tusker', 0.1122357), (u'n02504458', u'African_elephant', 0.061040461)]
DenseNetNet
The DenseNet architecture requires fewer parameters than a traditional CNN. DenseNet layers use only 12 filters with a small set of new feature maps (Figure 5). Another problem with DenseNet is the training time, because every layer has its input from previous layers. However, DenseNet solves this issue by giving access to the gradient values from the loss function and the input image. This significantly reduces the computation cost and makes this model a better choice
from tensorflow.keras.applications.DenseNet import DenseNet
from tensorflow.keras.layers import Input
# this could also be the output a different Keras model or layer
input_tensor = Input(shape=(224, 224, 3))
model = DenseNet(include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,)
img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
features = model.predict(x)
3. Feature Extraction
Another approach is to use deep learning to discover the best representation of your problem, which means finding the most important features. This approach is also known as representation learning, and can often result in a much better performance than can be obtained with hand-designed representation.

Summary
From the above discussion we can say that:
Transfer learning is an optimization, a shortcut to saving time or getting better performance.
In general, it is not obvious that there will be a benefit to using transfer learning in the domain until after the model has been developed and evaluated.