The mean squared error loss function

A prominently used loss function is the mean squared error (MSE) function, represented algebraically in the following formula. As you will notice, this function, at its core, simply compares the actual model output (y) with the predicted model output (). This function is particularly helpful for us to asses our predictive power, as this function models the loss quadratically. That is to say, if our model performs poorly, and our predicted and actual output values become more and more divergent, the loss increases by an exponent of two, allowing us to penalize higher errors more severely:

The average MSE between output values yi, and predicted values .

We will revisit this notion to understand how we reduce the difference between what our model predicts versus the actual output using various types of loss functions. For now, it suffices to know that our model's loss may be minimized through a process known as gradient descent. As we will soon see, gradient descent is simply grounded in calculus and implemented through backpropagation-based algorithms. The process of mathematically reducing the difference between predicted and actual output, by tuning the parameters of a network, is actually what makes the network learn. This tuning occurs as we train our model, by showing it new examples of inputs and associated outputs.