Ridge & Lasso Regression

- March 19, 2024

Before going for ridge & lasso regression, you need to understand few terminologies first.

So, Let's start!

Imagine you're teaching a robot to recognize different types of fruits based on their weight. You decide to use a simple model: if the fruit weighs less than 100 grams, it's a grape; if it weighs more, it's an apple.

Bias:
Imagine all the fruits in your training data are apples, and there are no grapes. Your model will learn that everything is an apple, leading to a high bias. This is like teaching the robot that everything, regardless of weight, is an apple.
Variance:
Now, imagine your training data has a mix of apples and grapes, but you teach the robot to recognize each fruit by its exact weight, including the weight of imperfections and stickers. Your model will learn these specific details, leading to a high variance. This is like teaching the robot to identify each fruit by its unique weight, even if it's just a tiny bit different.

In this example, a balanced model would consider a reasonable weight range for each fruit, ignoring small variations that don't affect the overall classification.

Overfitting :

Model performs well on -> Training data (Low Bias)
Fails to perform well on -> Testing data (High Variance)

Underfitting :

Model's accuracy is bad for both Training and Testing data. (High Bias & High Variance)

Model 1	Model 2	Model 3
Training Accuracy : 90%	Training Accuracy : 92%	Training Accuracy : 70%
Testing Accuracy : 80%	Testing Accuracy : 91%	Testing Accuracy : 65%
Overfitting	Generalized Model	Underfitting
Low Bias	Low Bias	High Bias
High Variance	Low Variance	High Variance

In overfitting, cost function is 0, so to avoid overfitting we need to make sure that our cost function is near to 0, but not 0.

Ridge Regression (L2 Regularization) :

- Prevents Overfitting, How?

- The ridge regression model adds a penalty term to the cost function to shrink the coefficients.

Cost=MSE+λ∑i=1nβi2

where

�

is the regularization parameter and

�_{�}

are the coefficients.

- Ridge regression has smaller coefficients compared to linear regression. The penalty term in ridge regression (proportional to

�

) penalizes large coefficient values, leading to a simpler model that is less likely to overfit the training data and it minimizes the computational cost as well.

Notice in the image that as the Lambda increases, our co-efficient values are becoming more less, which will give us more generalized model.

The value of

�

is typically chosen using techniques like cross-validation to balance between fitting the data well and keeping the model simple.

Lasso Regression (L1 Regularization) :

- Prevents Overfitting

- Helps in Feature Selection

Cost=MSE+λi=1∑n∣βi∣

where

�

is the regularization parameter and

�_{�}

are the coefficients.

Here, there is no square of slope, so if the value is too less for βi , we can remove that feature.

Notice in the image that for lambda=40, our Co-efficient is 0, which means that we can remove that particular feature.

Overall, Lasso regression is useful when dealing with high-dimensional datasets with many features, as it can help identify and focus on the most important features while discarding the less important ones.

Image source : https://www.youtube.com/watch?v=Xm2C_gTAl8c

Search This Blog

Machine Learning