Loss function explained for noobs (how models know they are wrong)

Introduction

When you’re just starting out with machine learning, the initial steps can feel deceptively simple. You load a dataset, train a model, and suddenly you’re faced with terms like “loss = mse” or “criterion = nn.CrossEntropyLoss()”. Quickly, tutorials dive into complex topics like equations, gradients, and optimization, often leaving beginners nodding without fully comprehending. If you’ve ever felt lost, rest assured you’re not alone. This article aims to demystify loss functions, a core concept in machine learning, by focusing on the foundational ideas before delving into the mathematics. This is part of my noob series, designed to make these concepts more accessible.

What is a Loss Function?

A loss function is fundamentally a tool that informs a machine learning model how incorrect its predictions are. When a model makes a prediction, the loss function compares this prediction to the actual result, providing a numerical value that indicates the magnitude of the model’s error.

A high loss signifies that the model’s prediction was significantly off. Conversely, a low loss indicates that the model’s prediction was reasonably close to the target. Throughout the training process, the model continuously adjusts its parameters to minimize this loss, analogous to a game of darts where feedback helps improve aim. The loss function effectively measures the distance between the dart (prediction) and the target (correct answer), giving the model essential feedback to refine its predictions.

Visualizing the Dart Analogy

Just like how throwing too close or too far from the target can be measured, a loss function quantifies how wrong a model’s prediction was, helping it to learn and improve.

Mean Squared Error

One of the most commonly used loss functions for predicting numerical values is the Mean Squared Error (MSE). It’s frequently employed in scenarios where models predict continuous values, such as property prices or temperatures. The concept is straightforward:

Error: For each prediction, calculate the difference between the predicted value and the actual value.

Square: Multiply each difference by itself to square it.

Mean: Take the average of all these squared differences.

Here’s how you might implement it in Python:



def Mean_squared_error(predictions, actuals):

    squared_errors = [(p - a) ** 2 for p, a in zip(predictions, actuals)]

    return sum(squared_errors) / len(squared_errors)

Squaring the errors serves two purposes: it ensures all errors are positive (a +3 error and a -3 error both become 9), and it disproportionately penalizes larger errors, which is beneficial in many applications.

Mean Absolute Error

The Mean Absolute Error (MAE) is another widely used loss function. Like MSE, it measures the deviation between predictions and actual outcomes, but instead of squaring the errors, it takes their absolute values.

Here’s a Python implementation:



def Mean_absolute_error(predictions, actuals):

    absolute_errors = [abs(p - a) for p, a in zip(predictions, actuals)]

    return sum(absolute_errors) / len(absolute_errors)

MAE is more forgiving of outliers compared to MSE. It doesn’t exaggerate the impact of large errors to the extent that MSE does, making it suitable for datasets with outliers.

Comparison of MSE and MAE curves

Cross Entropy Loss

While MSE and MAE are excellent for numerical predictions, many machine learning tasks revolve around categorizing data. In such scenarios, Cross Entropy Loss is often employed.

For classification tasks, models predict probabilities for each class. If a model predicts 70% probability for a dog, 20% for a cat, and 10% for a fish, and the actual image is of a dog, the loss function evaluates how well these probabilities align with reality.

Correct and confident predictions result in low loss.

Correct but uncertain predictions lead to moderate loss.

Incorrect and confident predictions incur high loss.

Cross entropy loss curve

Cross entropy is particularly effective for classification because it takes into account both the correctness and the confidence of the model’s predictions.

Loss vs Precision

It’s important to distinguish between loss and precision. While precision measures the proportion of correct predictions, loss quantifies the severity of incorrect predictions.

Consider two models, both achieving 90 correct predictions out of 100. While their precision is identical, their loss could differ if one model is more confident in its correct predictions and less wrong in its incorrect ones.

The Training Loop

Once a model receives its loss score, it can begin to improve through a process known as the training loop:

The model makes predictions.

The loss function evaluates the errors.

An optimizer updates the model’s parameters.

The model refines its predictions.

Ideally, the loss decreases over time.

During training, plotting the loss over time can help visualize the model’s progress. Initially, the loss is high due to numerous errors, but as training progresses and the model learns, the loss should decrease, eventually leveling off. A typical training curve starts with a steep drop in loss, followed by gradual flattening, as shown below:

Training loss curve

Flattening indicates that the model has learned simple patterns and is now making incremental improvements. However, a rising validation loss while training loss decreases suggests overfitting, where the model memorizes training data instead of generalizing patterns.

Final Thoughts

In essence, a loss function provides an error score for a model, guiding its learning process by indicating how incorrect its predictions are. Understanding loss functions lays the groundwork for comprehending more complex concepts in machine learning, such as gradient descent, backpropagation, optimization, and evaluation metrics.

Start with the core idea:

The model makes a guess.

The loss function assesses the guess.

The model adjusts to reduce the loss.

At the heart of machine learning, loss functions are how models recognize their mistakes and learn to err less over time. This concludes our exploration of loss functions. Stay tuned for more insights in our noob series.

Kanwal Mehreen is a machine learning engineer and technical writer passionate about data science and the intersection of AI and medicine. She co-authored the ebook “Maximizing Productivity with ChatGPT”. As a 2022 Google Generation Scholar for APAC, she champions diversity and academic excellence. She is also recognized as a Teradata Diversity in Tech Fellow, Mitacs Globalink Research Fellow, and Harvard WeCode Fellow. Kanwal is a strong advocate for change, having founded FEMCodes to empower women in STEM fields.

For further reading, visit the original article Here.

“`

Does AI think like your students?

This thin under-the-pillow speaker helped me fall asleep without headphones

McKinsey Global AI Survey 2025: 88% of organizations now use AI in at least one function, up from 78%, but most are still stuck...

Galaxy Watch 9 and Ultra 2 leaks reveal more changes, no Classic after all [Gallery]

Loss function explained for noobs (how models know they are wrong)

Introduction

What is a Loss Function?

Mean Squared Error

Mean Absolute Error

Cross Entropy Loss

Loss vs Precision

The Training Loop

Final Thoughts

Does AI think like your students?

This thin under-the-pillow speaker helped me fall asleep without headphones

McKinsey Global AI Survey 2025: 88% of organizations now use AI in at least one function, up from 78%, but most are still stuck...

Galaxy Watch 9 and Ultra 2 leaks reveal more changes, no Classic after all [Gallery]

Microsoft Discovers New Lightweight Backdoor That Steals Cryptocurrency

Building AI agents in Rust – part 3

Next-generation medical image interpretation with MedGemma 1.5 and medical speech synthesis with MedASR

Reclaim hours every day with autonomous agents in Amazon Quick

Python Dictionary Tips and Tricks You Should Always Remember

Your agent loop will drift. Here is the KL divergence equation which measures how far it has strayed from its original statement.

LEAVE A REPLY Cancel reply

Useful Links

Latest News

This thin under-the-pillow speaker helped me fall asleep without headphones

McKinsey Global AI Survey 2025: 88% of organizations now use AI in at least one function, up from 78%, but most are still stuck...

Galaxy Watch 9 and Ultra 2 leaks reveal more changes, no Classic after all [Gallery]

Our Newsletter