Linear and Logistic regression algorithms are the most commonly used supervised learning algorithms in today’s machine learning world. But still, lots of people face confusion between them. So in this article i.e “Linear Regression Vs Logistic Regression” we are going to see each of these algorithms in detail and finally the differences between them.
Since both linear and logistic regression algorithms are basically types of supervised learning algorithms. So let us first start with a brief on machine learning.
What is the Machine learning?
Machine learning is basically an ability to learn and improvise from previous experiences without being explicitly programmed instructions. It is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions, relying on patterns and inference instead.
Machine learning is broadly classified into three main types;
- Supervised learning: Learning under human supervision and using the labeled data.
- Unsupervised learning: Learning without any supervision and using the unlabeled data.
- Reinforcement learning: Learning in an uncertain environment without any kind of supervision. Here agent learns by trial and error method and interacting with the environment.
Our today’s topic Linear Regression Vs Logistic Regression is related to supervised learning only so we will focus more on supervised learning.
What is Supervised learning and its types?
Supervised learning is a type of machine learning that uses a labeled dataset to train the algorithm. Here labeled dataset simply means data along with correct answers. Imagine a teacher teaching his students with the training data that contains both input data along with its correct answers.
In supervised learning humans acting as a guide for teaching the algorithm what conclusions it should come up with. During the training phase, the algorithm search for the relations in the data that correlate with the desired outputs.
Consider an example of how a small baby learns to identify different fruits. While learning the baby captures all the feature information of fruit like its shape, color, size, texture and the name of the fruit (Label). During this learning phase, relations get established between fruit features. e.g color and shape of a banana is a kind of relation. So in the future, whenever the baby is shown with the picture of fruit he immediately identifies it. Since he has all the information (Data) of fruit along with its name (Label).
Supervised learning is used for predicting two types of values i.e a continuous value and a categorical value. So based upon the types of results predicted, supervised learning is again classified into two types.
In both of these types, we try to find out the value of a variable which is known as the dependent variable.
The regression algorithm predicts the continuous-valued output. The output here is called continuous since it can predict any value i.e. there is an infinite number of possible values regression algorithms can predict.
For example, the weight of a person based on his height. Here there are limitless possibilities for the weight of a person based on his height data. Other examples of regression algorithms are, Crop yield based on the rainfall data, Price of a house based on its size, Or stock price at a specific period of time. In all these examples output is a number with any possible value.
To solve regression problems linear algorithms are used.
The classification algorithm predicts the label for input data. When we want to label our input data with specific classes we can use classification algorithms. For example, Labeling the incoming emails into spam and non-spam, Identifying the Dog or Cat breed, etc.
The classification algorithm always predicts the answer in a categorical (binary) format. i.e Yes/No, True/False, 0/1, etc.
When the input data is classified into two distinct classes then it is called binary class classification. Otherwise, if the input data is classified into more than two classes then it is called multi-class classification.
To solve classification problems logistic algorithms are used.
What is Linear regression?
Linear regression is a method of predicting the value of a dependent variable (x) based upon the value of an independent variable (y). Here the value of the dependent variable is a continuous quantity i.e predicted value can be any possible integer number.
For example, predicting a stock price over a period of time. Here the stock price can be any possible number but it depends upon the time. So the price becomes the dependent variable, and time is the independent variable.
- Dependent Variable (x): It is the variable whose value we are finding. In the linear regression model value of the dependent variable is a continuous quantity.
- Independent Variable (y): It is a variable that does not depend on any other variable but it affects the value of the dependent variable. Like the stock price varies based upon the time.
In this regression model values of a dependent and independent variable varies linearly that is why this model is called a linear regression model. e.g price of the house increases with its increase in size. Since the relationship is linear so if we try to plot the graph of dependent and independent variables it is mostly a straight line and not the curve.
The equation to draw out relationship between dependent and independent variable is;
What is Logistic regression?
Logistic regression is a method of predicting dependent variable value based upon the independent variable. The main point to note here is dependent variable value is categorical. Here categorical means the value is in a binary format such as True/False, Yes/No, 0/1, etc.
The logistic algorithm is used for solving classification problems. Such as predicting whether the incoming email is spam or not OR based on weather conditions will rain today or not, etc.
Logistic regression basically predicts the probability of occurring an event. So every time probability value can’t be exactly 0 or 1 rather it varies between a range 0 to 1.
Consider a situation, when the probability value predicted is 0.7. Here threshold point helps us to categorize the predicted value whether to label it 0 or 1. So any value greater than the threshold point can be considered as 1 and less than that will be considered as 0. For time being consider 0.5 as our threshold point so the predicted value, 0.7 can safely be considered as 1.
For the logistic regression problem, if we try to plot the relationship between dependent and independent variables on a graph paper then it comes out to be a curve and not a straight line. The curve plotted mostly is a sigmoid curve.
Linear Regression Vs Logistic Regression
|Linear Regression||Logistic Regression|
|Definition||Linear regression is a method of predicting the value of a continuous dependent variable based upon the value of an independent variable.||Logistic regression is a method of predicting categorical dependent variable value based upon the independent variable.|
|Predicted output value||Continuous value i.e any integer number||Categorical i.e Binary value 0/1, True/False, etc.|
|Problems solved||Used to solve Regression problems.||Used to solve Classification problems.|
|Fit||Straight Line||Sigmoid curve|
|Relationship||Linear relation||Not necessary to be linear every time.|
Below are the few important points for today’s article Linear Regression Vs Logistic Regression.
- Supervised learning is classified into two types i.e Regression and Classification.
- Regression: It predicts the output that is a continuous integer value.
- Classification: It predicts the possible label for the input data.
- Linear regressions:
- These algorithms are used to solve regression problems of supervised learning.
- The predicted value can be any integer value.
- The relationship between the dependent and independent variables is a linear type.
- The relationship comes out to be a straight line if printed in graph paper.
- Logistic Regression:
- These algorithms are used to solve classification problems of supervised learning.
- The predicted value is of binary format which ranges between 0 to 1.
- The relationship comes out to be a curve if printed in graph paper.