Step-by-Step Guide to Implement Linear Regression Model: Practical Example

- March 17, 2024

Step 1: Import the Libraries

import matplotlib.pyplot as plt

import numpy as np

Matplotlib is used for creating visualizations like charts and plots.

Numpy is used for numerical operationd and array manipulations

Step 2: Define the Dataset

x= np.array([2.4,5.0,1.5,3.8,8.7,3.6,1.2,8.1,2.5,5,1.6,1.6,2.4,3.9,5.4]) #Experience

y = np.array([2.1,4.7,1.7,3.6,8.7,3.2,1.0,8.0,2.4,6,1.1,1.3,2.4,3.9,4.8]) #salary

n = np.size(x)

x stores the data for 'Experience'

y stores the data for 'Salary'

n is the total number of elements

Step 3: Plot the Data points

plt.scatter(x, y, color = 'red')

plt.xlabel("Experience")

plt.ylabel("Salary")

plt.show()

For scatter plot, plt.scatter() is used.

x-axis : Experience

y-axis : Salary

Step 4: Calculate values of Coefficients (Intercept & Slope)

#initialize the parameters

a0 = 0 #intercept

a1 = 0 #Slope

lr = 0.0001 #Learning rate

iterations = 1000 # Number of iterations

error = [] # Error array to calculate cost for each iterations.

for itr in range(iterations):

error_cost = 0

cost_a0 = 0

cost_a1 = 0

for i in range(len(x)):

# predict value for given x

y_pred = a0+a1*x[i] #Hypothesis Function for Simple Linear Regression

error_cost = error_cost +(y[i]-y_pred)**2 #Cost Function

for j in range(len(x)):

partial_wrt_a0 = -2 *(y[j] - (a0 + a1*x[j])) #partial derivative w.r.t a0

partial_wrt_a1 = (-2*x[j])*(y[j]-(a0 + a1*x[j])) #partial derivative w.r.t a1

cost_a0 = cost_a0 + partial_wrt_a0 #calculate cost for each number and add

cost_a1 = cost_a1 + partial_wrt_a1 #calculate cost for each number and add

a0 = a0 - lr * cost_a0 #update a0

a1 = a1 - lr * cost_a1 #update a1

print(itr,a0,a1) #Check iteration and updated a0 and a1

error.append(error_cost) #Append the data in array

Step 5: Observe the value for a0 and a1

print('intercept',a0)

print('slope',a1)

Step 6: Observe that the error is minimum

As the no. of iterations increases, the error is almost 0.

plt.figure(figsize=(10,5))

plt.plot(np.arange(1,len(error)+1),error,color='red',linewidth = 5)

plt.title("Iteration vr error")

plt.xlabel("iterations")

plt.ylabel("Error")

Step 7: Predict Salary

pred = a0+a1*x

print(pred)

It's hard to check numbers, let's plot a graph and see the predictions clearly.

plt.scatter(x,y,color = 'red') #scatter values

plt.plot(x,pred, color = 'green') #found a best fit line

plt.xlabel("experience")

plt.ylabel("salary")

Step 8: Analyze the model's performance by calculation mean squared error

error1 = y - pred

se = np.sum(error1 ** 2) #squared error

mse = se/n #mean of squared error

print("mean squared error is", mse)

Step 9: Use the scikit library to confirm the above calculations

experience=x

salary=y

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error

experience = experience.reshape(-1,1)

model = LinearRegression()

model.fit(experience,salary)

salary_pred = model.predict(experience)

Mse = mean_squared_error(salary, salary_pred)

print('slope', model.coef_)

print("Intercept", model.intercept_)

print("MSE", Mse)

Observe that both the times all 3 (a0,a1,MSE) values are same.

So, our calculations are correct and model is also working good!

Search This Blog

Machine Learning