Introduction
Linear regression is a parametric regression method that is used to estimate real-valued outputs given real-valued input vectors. In this post, mean squared error is used as a loss function and gradient descent (or batched gradient descent) is used to learn parameters.
The computation graph below shows how linear regression works. The dot product of each input, a vector of size , and weight vector transposed is taken to produce the output . The loss is then calculated by calculating the mean squared error using and output .
The bias term is ignored for the purposes of this post, but can easily be appended to weight vector after appending a to an input vector .
Learning
Given a data set of size with dimensions, parameters must be learned that minimize our loss function . The weight vector is learned using gradient descent. The derivation for term in the weight update is displayed in the derivations section of this post.
Code
Code for a linear regressor class is shown in the block below
from typing import List
from tqdm import trange
import torch
def MeanSquaredError(y: torch.Tensor, yhat: torch.Tensor) -> float:
""" Calculate mean squared error rate
Args:
y: true labels
yhat: predicted labels
Returns:
mean squared error
"""
return torch.sum((y - yhat)**2) / y.shape[0]
class LinearRegressor:
def __init__(self) -> None:
""" Instantiate linear regressor
"""
self.w = None
self.calcError = MeanSquaredError
def fit(self, x: torch.Tensor, y: torch.Tensor, alpha: float=0.00001, epochs: int=1000, batch: int=32) -> None:
""" Fit logistic regression classifier to data set
Args:
x: input data
y: input labels
alpha: alpha parameter for weight update
epochs: number of epochs to train
batch: size of batches for training
"""
self.w = torch.zeros((1, x.shape[1]))
epochs = trange(epochs, desc='Error')
for epoch in epochs:
start, end = 0, batch
for b in range((x.shape[0]//batch)+1):
hx = self.predict(x[start:end])
dw = self.calcGradient(x[start:end], y[start:end], hx)
self.w = self.w - (alpha * dw)
start += batch
end += batch
hx = self.predict(x)
error = self.calcError(y, hx)
epochs.set_description('MSE: %.4f' % error)
def predict(self, x: torch.Tensor) -> torch.Tensor:
""" Predict output values
Args:
x: input data
Returns:
regression output for each member of input
"""
return torch.einsum('ij,kj->i', x, self.w)
def calcGradient(self, x: torch.Tensor, y: torch.tensor, hx: torch.Tensor) -> torch.Tensor:
""" Calculate weight gradient
Args:
x: input data
y: input labels
hx: predicted output
Returns:
tensor of gradient values the same size as weights
"""
return torch.einsum('ij,i->j', -x, (y - hx)) / x.shape[0]
Derivations
Derivative of loss function with respect to the regression output :
Derivative of regression output with respect to weight :
Derivative of loss function with respect to weight :
Resources
- Russell, Stuart J., et al. Artificial Intelligence: A Modern Approach. 3rd ed, Prentice Hall, 2010.
- Burkov, Andriy. The Hundred Page Machine Learning Book. 2019.