Introduction
When you’re just starting out with machine learning, the initial steps can feel deceptively simple. You load a dataset, train a model, and suddenly you’re faced with terms like “loss = mse” or “criterion = nn.CrossEntropyLoss()”. Quickly, tutorials dive into complex topics like equations, gradients, and optimization, often leaving beginners nodding without fully comprehending. If you’ve ever felt lost, rest assured you’re not alone. This article aims to demystify loss functions, a core concept in machine learning, by focusing on the foundational ideas before delving into the mathematics. This is part of my noob series, designed to make these concepts more accessible.
What is a Loss Function?
A loss function is fundamentally a tool that informs a machine learning model how incorrect its predictions are. When a model makes a prediction, the loss function compares this prediction to the actual result, providing a numerical value that indicates the magnitude of the model’s error.
A high loss signifies that the model’s prediction was significantly off. Conversely, a low loss indicates that the model’s prediction was reasonably close to the target. Throughout the training process, the model continuously adjusts its parameters to minimize this loss, analogous to a game of darts where feedback helps improve aim. The loss function effectively measures the distance between the dart (prediction) and the target (correct answer), giving the model essential feedback to refine its predictions.
Just like how throwing too close or too far from the target can be measured, a loss function quantifies how wrong a model’s prediction was, helping it to learn and improve.
Mean Squared Error
One of the most commonly used loss functions for predicting numerical values is the Mean Squared Error (MSE). It’s frequently employed in scenarios where models predict continuous values, such as property prices or temperatures. The concept is straightforward:
- Error: For each prediction, calculate the difference between the predicted value and the actual value.
- Square: Multiply each difference by itself to square it.
- Mean: Take the average of all these squared differences.
Here’s how you might implement it in Python:
def Mean_squared_error(predictions, actuals):
squared_errors = [(p - a) ** 2 for p, a in zip(predictions, actuals)]
return sum(squared_errors) / len(squared_errors)
Squaring the errors serves two purposes: it ensures all errors are positive (a +3 error and a -3 error both become 9), and it disproportionately penalizes larger errors, which is beneficial in many applications.
Mean Absolute Error
The Mean Absolute Error (MAE) is another widely used loss function. Like MSE, it measures the deviation between predictions and actual outcomes, but instead of squaring the errors, it takes their absolute values.
Here’s a Python implementation:
def Mean_absolute_error(predictions, actuals):
absolute_errors = [abs(p - a) for p, a in zip(predictions, actuals)]
return sum(absolute_errors) / len(absolute_errors)
MAE is more forgiving of outliers compared to MSE. It doesn’t exaggerate the impact of large errors to the extent that MSE does, making it suitable for datasets with outliers.
Cross Entropy Loss
While MSE and MAE are excellent for numerical predictions, many machine learning tasks revolve around categorizing data. In such scenarios, Cross Entropy Loss is often employed.
For classification tasks, models predict probabilities for each class. If a model predicts 70% probability for a dog, 20% for a cat, and 10% for a fish, and the actual image is of a dog, the loss function evaluates how well these probabilities align with reality.
- Correct and confident predictions result in low loss.
- Correct but uncertain predictions lead to moderate loss.
- Incorrect and confident predictions incur high loss.
Cross entropy is particularly effective for classification because it takes into account both the correctness and the confidence of the model’s predictions.
Loss vs Precision
It’s important to distinguish between loss and precision. While precision measures the proportion of correct predictions, loss quantifies the severity of incorrect predictions.
Consider two models, both achieving 90 correct predictions out of 100. While their precision is identical, their loss could differ if one model is more confident in its correct predictions and less wrong in its incorrect ones.
The Training Loop
Once a model receives its loss score, it can begin to improve through a process known as the training loop:
- The model makes predictions.
- The loss function evaluates the errors.
- An optimizer updates the model’s parameters.
- The model refines its predictions.
- Ideally, the loss decreases over time.
During training, plotting the loss over time can help visualize the model’s progress. Initially, the loss is high due to numerous errors, but as training progresses and the model learns, the loss should decrease, eventually leveling off. A typical training curve starts with a steep drop in loss, followed by gradual flattening, as shown below:
Flattening indicates that the model has learned simple patterns and is now making incremental improvements. However, a rising validation loss while training loss decreases suggests overfitting, where the model memorizes training data instead of generalizing patterns.
Final Thoughts
In essence, a loss function provides an error score for a model, guiding its learning process by indicating how incorrect its predictions are. Understanding loss functions lays the groundwork for comprehending more complex concepts in machine learning, such as gradient descent, backpropagation, optimization, and evaluation metrics.
Start with the core idea:
- The model makes a guess.
- The loss function assesses the guess.
- The model adjusts to reduce the loss.
At the heart of machine learning, loss functions are how models recognize their mistakes and learn to err less over time. This concludes our exploration of loss functions. Stay tuned for more insights in our noob series.
Kanwal Mehreen is a machine learning engineer and technical writer passionate about data science and the intersection of AI and medicine. She co-authored the ebook “Maximizing Productivity with ChatGPT”. As a 2022 Google Generation Scholar for APAC, she champions diversity and academic excellence. She is also recognized as a Teradata Diversity in Tech Fellow, Mitacs Globalink Research Fellow, and Harvard WeCode Fellow. Kanwal is a strong advocate for change, having founded FEMCodes to empower women in STEM fields.
For further reading, visit the original article Here.
“`

