In contrast to the Numpy Polyfit way of getting linear regression, the one involving sklearn is somewhat more complex.
Step 1 – Get the data (normally with .csv)
# credit https://databasetown.com/machine-learning-with-python-a-real-life-example/ import pandas as pd import numpy as np from sklearn import linear_model import matplotlib.pyplot as plt mango_data = {'Year': [2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019], 'Mango_Price': [40, 50, 55, 60, 65, 70, 75, 80, 90]} # This is the code that you would use if you already had a csv file: # df = pd.read_csv('mangoes_price.csv') # df mango_prices = pd.DataFrame(data=mango_data) mango_prices

Step 2 – Visualise the data with scatter plot
plt.title('Our historic mango prices', fontsize=12) plt.xlabel('Year') plt.ylabel('Mango Price') plt.scatter(mango_prices.Year,mango_prices.Mango_Price, color='blue', marker='.', linewidth='5') plt.show()

Step 3 – Reformatting values
new_df = mango_prices.drop('Mango_Price', axis='columns') new_df
Mango_Price = mango_prices.Mango_Price Mango_Price
Step 4 – Use Sklearn fit() method to get Linear Regression
# in order to train model, we create an object of # Linear Regression class and call a fit() method reg_model = linear_model.LinearRegression() reg_model.fit(new_df,mango_prices.Mango_Price)
LinearRegression()
Step 5 – Use Sklearn regression model prediction to predict two years values
# predict the price of the mangoes in 2020 and 2021 reg_model.predict([[2020],[2021]])
array([93.33333333, 99. ])
Step 6 – Use Sklearn to get coefficient and intercept
# find the slope (coefficient) reg_model.coef_ reg_model.intercept_ print(reg_model.coef_) print(reg_model.intercept_)
[5.66666667] -11353.333333333334
Step 7 – Test the prediction by plugging coefficient and intercept in formula
# y = mx + b <-- m is a slope and b is an intercept. # Values of coefficent and intercept in above equation 2020 * 5.66666667 + (-11353.333333333334)
93.33334006666519
Step 8 – Get the accuracy
# check model accuracy reg_model.score(new_df,Mango_Price)
0.9880341880341843
Step 9 – Visualise the linear regression in a scatter plot
# add a 10 year range year_df = [2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024, 2025] # reshape that 10 year 1D array to a 2D array that we can use for our model B = np.reshape(year_df,(-1,1)) # mango price predictions price = reg_model.predict(B) C = np.reshape(price,(-1,1))
plt.title('Our historic prices (blue) and our predictions (red) ', fontsize=12) # the predictions based on linear regression plt.scatter(year_df,price, c='r') # the actual prices in blue plt.scatter(mango_prices.Year,mango_prices.Mango_Price, c='b') plt.xlabel('Year') plt.ylabel('Mango Price') plt.show()
