Logistic Regression is a Supervised Machine Learning Algorithm used for the classification problem. Now, you might wonder how regression can be used for classification problem.
It works on the principle of Simple Linear Regression. Logistic Regression is taken from the field of statistics, where it is used to categorize the classes into true or false, spam or not spam etc.
The Logistic Regression uses a more complex function than the Linear Regression, the function used is Sigmoid Function or Logistic Function.
Sigmoid Function
Sigmoid Function gives the output in the range of (0, 1) and mathematically represented as :

The Sigmoid Function gives an ‘S’ shaped curve. S(x) is output between 0 and 1, where ‘x’ is an input to the function
This curve has a finite limit of:
‘0’ as value(x) approaches −∞
‘1’ as value(x) approaches +∞
def sigmoid(x):
return 1 / (1 + np.exp(-x))
Here, our function will return probabilistic value between 0 and 1. But what if we want discrete class like (spam / not spam), then we have to set threshold value.
Let’s take an example to understand:
Supposed you have dataset of 10,000 dog images and you train your model on that dataset. Now, you have given a job to classify cat and dog from a dataset of 1 million images , and you want to use same model that you had train on dog dataset. So, you set the threshold value to 0.5, and mentioned if the predicted value is above 0.5; it’s a dog class and below 0.5 it’s a cat class. In this way you can classify 1 million images into two classes.
Error Calculations in Logistic Regression
In logistic regression we cannot use mean square error cost function. Logistic regression consist of sigmoid function which is non linear, squaring of this function will result into non concave function with many local minimums. so gradient decent may stuck in local minimum point.

To calculate error or cost we mainly use the log-loss function. It is given as follows with respect to particular attribute and probability :

cost = (1/m)*(((-y).T @ np.log(h + epsilon))-((1-y).T @ np.log(1-h + epsilon)))
- y = output variable
- h = sigmoid function
Log–loss is an appropriate performance measure when you’re model output is the probability of a binary outcome.
To minimize the cost, we can use gradient descent just like we used in Linear Regression.
References
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html