Step-by-Step Guide to Implement Linear Regression Model: Practical Example

 Step 1: Import the Libraries

import matplotlib.pyplot as plt

import numpy as np

Matplotlib is used for creating visualizations like charts and plots.

Numpy is used for numerical operationd and array manipulations



Step 2: Define the Dataset

x= np.array([2.4,5.0,1.5,3.8,8.7,3.6,1.2,8.1,2.5,5,1.6,1.6,2.4,3.9,5.4]) #Experience
y = np.array([2.1,4.7,1.7,3.6,8.7,3.2,1.0,8.0,2.4,6,1.1,1.3,2.4,3.9,4.8]) #salary
n = np.size(x)

x stores the data for 'Experience'
y stores the data for 'Salary' 
n is the total number of elements


Step 3: Plot the Data points


plt.scatter(x, y, color = 'red')
plt.xlabel("Experience")
plt.ylabel("Salary")
plt.show()

For scatter plot, plt.scatter() is used.
x-axis : Experience
y-axis : Salary



Step 4: Calculate values of Coefficients (Intercept & Slope)


#initialize the parameters
a0 = 0                  #intercept
a1 = 0                  #Slope
lr = 0.0001             #Learning rate
iterations = 1000       # Number of iterations
error = []              # Error array to calculate cost for each iterations.
for itr in range(iterations):
    error_cost = 0
    cost_a0 = 0
    cost_a1 = 0
    for i in range(len(x)):
        # predict value for given x
        y_pred = a0+a1*x[i]  #Hypothesis Function for Simple Linear Regression 
        error_cost = error_cost +(y[i]-y_pred)**2 #Cost Function
        for j in range(len(x)):
            partial_wrt_a0 = -2 *(y[j] - (a0 + a1*x[j]))                #partial derivative w.r.t a0
            partial_wrt_a1 = (-2*x[j])*(y[j]-(a0 + a1*x[j]))   #partial derivative w.r.t a1
            cost_a0 = cost_a0 + partial_wrt_a0      #calculate cost for each number and add
            cost_a1 = cost_a1 + partial_wrt_a1      #calculate cost for each number and add
        a0 = a0 - lr * cost_a0    #update a0
        a1 = a1 - lr * cost_a1    #update a1
        print(itr,a0,a1)          #Check iteration and updated a0 and a1
    error.append(error_cost)      #Append the data in array



Step 5: Observe the value for a0 and a1


print('intercept',a0)
print('slope',a1)

Step 6: Observe that the error is minimum


As the no. of iterations increases, the error is almost 0.

plt.figure(figsize=(10,5))
plt.plot(np.arange(1,len(error)+1),error,color='red',linewidth = 5)
plt.title("Iteration vr error")
plt.xlabel("iterations")
plt.ylabel("Error")



Step 7: Predict Salary


pred = a0+a1*x
print(pred)


It's hard to check numbers, let's plot a graph and see the predictions clearly.

plt.scatter(x,y,color = 'red') #scatter values
plt.plot(x,pred, color = 'green') #found a best fit line
plt.xlabel("experience")
plt.ylabel("salary")


Step 8: Analyze the model's performance by calculation mean squared error


error1 = y - pred
se = np.sum(error1 ** 2) #squared error
mse = se/n #mean of squared error
print("mean squared error is", mse)


Step 9: Use the scikit library to confirm the above calculations


experience=x
salary=y
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error 
experience = experience.reshape(-1,1)
model = LinearRegression()
model.fit(experience,salary)
salary_pred = model.predict(experience)
Mse = mean_squared_error(salary, salary_pred)
print('slope', model.coef_)
print("Intercept", model.intercept_)
print("MSE", Mse)


Observe that both the times all 3 (a0,a1,MSE) values are same.
So, our calculations are correct and model is also working good!









Comments

Popular posts from this blog

Extracting Tables and Text from Images Using Python

Positional Encoding in Transformer

Chain Component in LangChain