# Index

There Is No One Ring to Rule them All
Be Sure About the Type of the Problem
Build a Meaningful Test Set
Select a Metric
Evaluate the Model
Hyperparameters Tuning
Conclusions

# There Is No One Ring to Rule them All

In many of Virgilio's guides, you have learned that there is rarely a "best" way to do something, as is common in all engineering disciplines.

Data Science doesn't make any difference, on the contrary, it takes to the extreme the concept of tradeoff, of "choice" in front of various possibilities, with the best choice often guided by the details of the specific problem.

This concept is well illustrated by the "No Free Lunch Theorem" (opens new window), which states

All algorithms that search for an extremum of a cost function perform the same when averaged over all possible cost functions.

That is, in human-understandable words, that there's no one model that can fit every need or solve every problem.

Sure, there are certain classes of algorithms that work better for certain classes of problems, or algorithms that have been completely outclassed by more "intelligent" versions, but in general, it is not possible to state this with certainty for every problem.

The bottom line is that you must learn how to choose the right algorithm that will give you the best-trained model for your specific problem.

How to do this?

Well, in this case, the experience plays the master role, and even after years of doing Data Science, you'll find yourself learning new stuff about how an algorithm can perform better or worse given certain conditions.

But luckily you can measure algorithm performance on our problem and discover what works best!

To do that, you need to do 3 things:

Build a meaningful test set
Choose the right metric to measure your model performances against it
Track the parameters and the associated results with each evaluation

# Be sure about the type of problem

It would be impossible to list all the existing types and sub-types of problems, also because some of them can be classified as classification or regression problems depending on the approach!

The advice that Virgilio gives you is, therefore, the following:

Once you are in front of a problem, understand first what type problem you are facing, and from that start to learn, search and understand the metrics that may interest you!

Be sure to know what problem you're dealing with, even though you should have done this before start solving the problem:

Common ML Problems (opens new window)

# Build a Meaningful Test Set

To evaluate your model, you should have a clear picture of what kind of "real-world" it will behave in, which means having a test set representative of the actual problem you want to solve.

Often, if you pick a pre-built dataset, it comes with a ready-to-go test set, but in real problem scenarios, you don't have that!

A technique to generate a test set from a dataset is to shuffle and randomly draw a set (opens new window) from it, and this can work for certain situations, but it's not a general rule. This approach assumes that your dataset is balanced, is representative of the real-world problem, and some other details.

In general, when you build a test set you should make it:

Big enough to draw conclusions about the performance of the model
Representative of the real-world scenario in which the model will be deployed
Representative of the training set (if not, for sure you won't get good performances)
It needs to avoid biases introduced by pre-processing transformations or outlier removal

Luckily for us, Andrew NG (opens new window) collected for us all a very exhaustive list of tips and trick to build a meaningful test set, and you can find them in the practical book:

Machine Learning Yearning (opens new window)

# Select a Metric

You know, Google is your friend, and with a 99.99% chance someone will already be in the same situation as you, and probably a simple query like 'ML metric for type_X problem' can give you a ton of good answers.

A good starting point though can be the following:

Selecting Metrics for Machine Learning (opens new window)

Even if you can easily find in a lot of literature and good information about each ML metric on Google, be sure to read this presentation about the ML evaluation phase in general:

Performance Evaluation for Learning Algorithms (opens new window)

Then you can dive deeper into these more detailed resources:

Bonus resources about model evaluation:

Check this library for testing machine learning models, specifically those in scikit-learn:

Drifter_ML (opens new window)

# Evaluate the Model

After you've built your test set, is time to evaluate your trained algorithm against it!

But to do that, you should have well clear in mind which is a valuable metric to measure the performance of your algorithms.

It turns out that this heavily depends on the problem you are facing, and in particular the type of problem: are you facing a supervised problem or not? If yes, are you solving a classification or regression problem?

And so on...

To learn how to evaluate your model after the training phase, get through Chapter 6 of the book Machine Learning Engineering (opens new window) by Andriy Burkov (opens new window):

Chapter 6 - Model Evaluation (opens new window)

# Hyperparameters Tuning

When you evaluate a Machine Learning model, you evaluate a model trained with a certain configuration, and the parameters in this configuration are called "hyperparameters".

**Choosing the right hyperparameters to let the ML algorithm learn well is probably the most difficult job in the Data Science Process (someone calls it "Black Magic (opens new window)").

There are no Golden Rules (There Is No One Ring to Rule them All), even though some best practices are recognized to work well in a variety of situations.

But when you tackle a novel problem, with a custom dataset, it's difficult to choose these hyperparameters right at the first shot.

For example, imagine you're training a neural network (even a simple one), you can have a lot of these hyperparameters: About the topology of the network:

the type of layers
the number of layers
the number of neurons in each layer
the activation functions

Or about the optimization phase (the actual training):

the number of training epochs
the batch size
the learning rate
the learning rate decay
regularization techniques

What a mess! How to get these right?

Often data scientists try to overcome this problem with some techniques, like Grid Search, Random Search, or Evolutionary Algorithms.

Be sure to get through the following resources to understand how to tune the hyperparameters of your model!

# Conclusions

In this guide, you've learned how to choose the right test set for your evaluation, how to choose a meaningful metric for the problem at hand, and how to evaluate the model against it.

You've also seen some techniques that help you move in the deep space of the hyperparameters configurations.

Now, once you're satisfied by the performance fo your model, you're ready to use it in the real world!

← Deep Learning Theory Tools and Libraries →