Logistic Regression

- April 20, 2024

Logistic Regression is used when our dependent variable is dichotomous or binary. It just means a variable that has only 2 outputs, for example, The student will pass this exam or not.

It is also used in multiclass classification. For example, Will you attend the seminar? [Yes, Maybe, No]

Types of Logistic Regression:

1) Binary logistic regression

It is used to predict the probability of a binary outcome, such as yes or no, true or false, or 0 or 1

2) Multinomial logistic regression

It is used to predict the probability of one or more possible outcomes, such as the type of product a customer will buy [cheap, costly, affordable, popular]

3) Ordinal logistic regression

It is used to predict the probability of an outcome that falls into a predetermined order, such as the level of customer satisfaction, the severity of a disease, or the stage of cancer.

Why to use Logistic Regression instead of Linear Regression?

Let's understand this by an example:

Suppose, if study hours > 1.8 => pass, else => fail

Now, if you create a model for this condition, using linear regression, then for the good inputs it will work, but if there are outliers, in that case, our model will not predict correct output.

consider 0 = fail, 1 = pass.

As you can see in the figure, slope changes because of the outliers, so if you observe the orange line,

for 2.5 hours, first figure predicts the right outcome, but 2nd figure shows 0, which is wrong prediction.

Another problem is that, probability is always between 0 and 1, but if we use linear regression this probability may exceed 1 or go below 0.

Logistic Regression Solves this Problem!

we need to squash this line using sigmoid function,

The graph of linear regression is a straight line, where as for logistic regression, the graph is 'S - shaped'.

Sigmoid or Logistic Function:-

equation of the best fit line in linear regression is:

Let’s say instead of y, we are taking probabilities (P). But there is an issue here, the value of (P) will exceed 1 or go below 0 and we know that range of Probability is (0-1). To overcome this issue we take “odds” of P.

The odds of an event happening is defined as the probability of the event divided by the probability of the event not happening:

$Odds = \frac{�}{1 - �}$

Odds are always in range (0,+∞ ).

but we are dealing with probability, so we still need a range from (0,1).

To control this we take the log of odds (also known as the logit function) which has a range from

(-∞,+∞) .

logit(P)=log(1−PP)=β0+β1x1+β2x2+…+βnxn

log of odds which has a range from (-∞,+∞) and Odds are always in range (0,+∞ ) so, how do we get the range of probability P which is (0,1)?

To do so we will multiply by exponent on both sides and then solve for P.

Assumptions of Logistic Regression:

Binary outcome:
It is used when dependent variable is dichotomous (e.g., 0 or 1, yes or no).
Independence of observations: The outcome of one observation should not be influenced by the outcome of another observation.
Linearity of independent variables and log odds: The relationship between the independent variables and the log odds of the dependent variable should be linear.
No multicollinearity: Multicollinearity occurs when two or more independent variables are highly correlated, which can make it difficult to determine the effect of each variable on the outcome.
Large sample size: To ensure that the model is not overfitting the data.
No outliers: Outliers can affect the estimates of the coefficients in logistic regression, so it's important to check and address outliers in the dataset.

Cost Function in Logistic Regression

Cost function in linear regression was:

where Yi was B0 + B1x

So, the graph for cost function was bell-shaped curve, but here, for logistic regression,

Yi is a non-linear function (Ŷ=1/1+ e-z), so it will give a non-convex graph with many local minima as shown

So, to solve the problem of local minima, cost function for logistic regression is defined as

Let’s see what will be the graph of cost function when y=1 and y=0

If we combine both the graphs, we will get a convex graph with only 1 local minimum and now it’ll be easy to use gradient descent here.

source : analyticsvidhya

Search This Blog

Machine Learning