Note btw that alex and sbs.model are the same object:
assertid(sbs.model) ==id(model)
Predict
test = np.random.rand(100,1)test_predictions = sbs.predict(test)plt.plot(x,y,'.')plt.plot(test,test_predictions,'.')plt.show()
Save/load model
sbs.save_checkpoint('linear.pth')
sbs.load_checkpoint('linear.pth')
Visualize model
One can use make_dot(yhat) locally. I can’t make graphviz work on GitHub, but the output looks like this:
Set up tensorboard
One can add tensorboard to monitor losses, this will be important when having long training. We can start tensorboard from terminal using tensorboard --logdir runs (or from notebook if using extension via %load_ext tensorboard). The tensorboard should be running at http://localhost:6006/ (ignore "TensorFlow installation not found" message, we don’t need it). Make sure path is right, tensorboard will be empty if it can’t find the runs folder.
Tips dataset
Let’s study the linear regression model from classical perspective. Let’s load a tips dataset where independent variables are: total_bill, sex, smoker, day, size, time, and depended variable is tips. First we simplify model by keeping only 1 independed variable, total_bill:
# Load the datasettips = sns.load_dataset("tips")tips.head()
Interesting that w1 is slightly off 12.1% vs 10% (regularization is not a culprit). Let’s plot together with the scatter plot:
# Create scatterplotsns.scatterplot(x="total_bill", y="tip", data=tips)# Add title and axis labelsplt.title("Tips vs Total Bill")plt.xlabel("Total Bill")plt.ylabel("Tip")# seaborn plot a linesns.lineplot(x=X_train.flatten(), y=w0 + w1*X_train.flatten(), color='red', label='Linear Regression')# Show the plotplt.show()
This graph might not mean males are bigger tipers, since it might have been that more males ate in bigger groups as well. Plotting relative tip (i.e. tip/total_bill) might be more informative:
Indeed, women left larger percentage of tip (then again, they might have had smaller portionsl there are many angles one can look at this data). How about compare group sizes:
_ = sns.violinplot(x="sex", y="size", data=tips)
That seems very similar distribution.
_ = sns.boxplot(x="size", y="tip", data=tips)
Multivariable linear regression
Let’s run multivariable linear regression, we first need to encode all the categorical variables into numerical:
tips.dtypes
total_bill float64
tip float64
sex category
smoker category
day category
time category
size int64
dtype: object
and these are the coefficients. Let’s predict on valid dataset:
y_valid_pred = lr.predict(X_valid)
and let’s plot the predicitons and targets:
# write a code to plot y_valid_pred vs y_validplt.scatter(y_valid, y_valid_pred)plt.xlabel('Actual output')plt.ylabel('Predicted output')plt.title('y_valid_pred vs y_valid')plt.show()
Measure of accuracy is R^2 (i.e. coefficient of determinaltion):