# Index
- Buy This Book
- The Machine Learning Landscape
- End-to-End Machine Learning project
- Machine Learning Full Course
- Code an Algorithm from Scratch
- Must-Know Supervised Learning Algorithms
- Unsupervised Learning Algorithms
- Conclusions
# Buy This Book
Virgilio strongly recommend you to buy this phenomenal book: Hands-On Machine Learning with Scikit-Learn and TensorFlow (opens new window).
The book has inspired the birth of Virgilio and has driven most of the organization and hierarchy of the content listed below.
WARNING
Be sure to buy the 2nd edition of the book, which comes with TensorFlow 2.0 and many of the chapters updated.
Apart from this, everything listed here is open source and free, from world-renowned universities and open source associations, in pure Virgilio's spirit.
Note: if you think about the price of the book is high, be okay with that, but remember that you won't find a higher quality hands-on book on Machine and Deep Learning. Don't hesitate, the book is definitely worth its price.
It is necessary to avoid confusion when we learn something new, especially when the topic is as wide and complex as Machine Learning. When possible, we've tried to create this guide and the following ones preferring content from the same author or context.
# The Machine Learning Landscape
First things first!
Directly from the book cited earlier, this is the most concise and illuminating overview of what is and when you need machine learning. Let's stop using buzzwords!
Check it here: The Machine Learning Landscape (opens new window).
Also check this: A Visual Introduction to Machine Learning (opens new window).
# End-to-End Machine Learning project
Virgilio wants you to feel what a complete Data Science project would be, along with model creation and selection. You
For a first taste, go through this (opens new window) Kaggle notebook, which has a classical example of an ML task.
The goal is to try to predict if a Titanic passenger would have been most likely to survive or not.
This is commonly considered the "Hello World" problem for new Machine Learning practitioners.
Many things will be unclear for now, but don't worry, they will all be explained comprehensively later. It is nice to get the picture of the "applied" project, going through the classical steps of applied Machine Learning (problem framing, data exploration, question formulation...).
The notebook is on Kaggle (opens new window), the go-to platform for ML and general Data Science projects, which provides a lot of free datasets and offers interesting challenges and ML model experiments.
Remember: Read the notebook and try to understand the big picture of the process. Some details, functions, and code will be clearer later.
# Machine Learning Full Course
Now that you've been exposed to your first machine learning end-to-end project, you maybe start wondering how do you choose an algorithm to try on your data, and what is the learning theory behind them.
The best thing you can do now is to take a full course on Machine Learning theory.
There are plenty of those out there, but the most classical and complete course is probably the most famous one too.
Plan some weeks of study and prepare to follow:
Machine Learning Course from Andrew NG (opens new window)
This course would take you through the basics of Machine Learning algorithms, plus the math theory behind the training process. Concepts like Overfitting, Regularization, and Loss Functions are explained in-depth.
The course has a part in Deep Learning, so you're not obliged to take them (even if it's recommended).
In the next guide, "Deep Learning", we'll give you specific courses about it.
The course has homework to do (highly recommended), but unluckily these assignments are thought with the Octave programming language in mind, which is kind of out-dated and limited if compared to Python.
But don't be scared!
Awesome people out there re-created the course assignments in Python, through Jupyter notebooks!
Check them here:
Coursera Machine Learning by Andrew Ng - Python Programming Assignments (opens new window)
Thanks to this course and the exercises you should grasp most of the basics concepts behind Machine Learning theory and the process of training models.
Alternatively, you can replace the former course with this one:
Once you're done with the course, check also the following course from Google:
Machine Learning Crash Course (opens new window)
This second one should take you no more than some days to get through, and it can give you a more practical perspective on the Machine Learning modeling process (selection, training, evaluation).
In the next sections of this guide, we'll see some criteria on how to choose the algorithms to deepen your knowledge about.
It's nearly impossible to know all the Machine Learning algorithms along with their versions, and many more algorithms are being developed every month!
Nevertheless, there are some algorithms which are the foundation of statistical learning theory, so you want to have them clear in mind.
For example, these algorithms are the ones a recruiter can ask you about!
# Must-Know Supervised Learning Algorithms
The algorithms listed here are of the "Supervised" type, in the sense that you need labeled data to make your models work.
Read here about: Difference between supervised and unsupervised algorithms (opens new window)
The algorithms that we retain the most foundational are:
# Linear Regression
Classification is one of the most important ML tasks when wanting to predict an outcome out of different possibilities. For example, given handwritten digits, classify them with the lowest error possible. The simplest case is the binary classification (Yes or No, Survived or Not Survived), have a look here (opens new window). Check here (opens new window) for a brief explanation of the theory of logistic regression for classification, and check here (opens new window) for a deeper comprehension (using the Titanic dataset). You can use a lot of different ML models to classify things, even neural networks! For now, just take a look here (opens new window), where you see an example of accuracy and recall comparison among different models. Here (opens new window) you have an article about the metrics used to evaluate your classifiers.
# Support Vector Machines
This is another classical algorithm to create ML models. Here (opens new window) you have the explanation of the theory, and here (opens new window) a more practical approach. Check both. Here (opens new window) is a very good explanation + practice application in Scikit-Learn.
# Decision Trees
Decision Trees are one of the most simple but effective ideas behind predicting outcomes, and they're used in many ways (e.g. Random Forest). Check here (opens new window) and go through the playlist to get a theoretical overview of Decision Trees (ID3). Here (opens new window) you have the practical application of ID3. Here you have some end-to-end examples with Scikit-Learn:
- Example 1 (opens new window)
- Example 2 (opens new window)
- Example 3 (opens new window)
- Example 4 (opens new window) couples decision trees with genetic algorithms.
# Ensemble Learning and Random Forest
The idea of Ensemble Learning is to leverage all the different features, pros, and cons of several ML models to obtain a group of "voters" that, for each prediction, gives you the most likely outcome, voted by different classifiers (SVM, ID3, maybe Logistic Regression). Here (opens new window) you get the basics of ensemble learning model, and here (opens new window) you find the most classic of them, the Random Forest. Although the idea is simple, this ensemble model turned out to be effective in tackling even some "hard" classification problems, or with a lot of data.
Here (opens new window) you get a complete overview of the best practices for ensemble learning, and here (opens new window) you find an example of Random Forest with Scikit-Learn. Both links come with a bunch of useful techniques to use in practice.
# Unsupervised Learning
Up to now, we've considered more the Supervised type of learning, where you have labeled data and you learn from them.
But the world is often full of un-labeled data, and the labeling process is tedious and costly.
So it's important to be aware of unsupervised learning classes of algorithms.
- This (opens new window) is a brief introductory video.
- This (opens new window) are the Unsupervised Learning lectures from Stanford, take these if you want to go deeper.
- This (opens new window) is a very good Reddit post on Why Unsupervised is so important.
- Here (opens new window) is an interesting read about the difference between Supervised Learning, Unsupervised Learning, and Reinforcement Learning.
The two most important techniques here are Association Rules Exploration (opens new window) and Clustering (opens new window).
Association Rules tutorials and examples: 1 (opens new window), 2 (opens new window), 3 (opens new window), 4 (opens new window), 5 (opens new window).
Clustering tutorials and examples: 1 (opens new window), 2 (opens new window), 3 (opens new window), 4 (opens new window), 5 (opens new window), 6 (opens new window).
Dive deep with: Stanford slides (opens new window). MIT slides (opens new window).
Tips & Best practices when dealing with unsupervised datasets: 1 (opens new window), 2 (opens new window), 3 (opens new window), 4 (opens new window), 5 (opens new window).
# Conclusions
This guide is very dense and assuming average skills (in programming, math, and statistics) you should consider at least a month to digest all the content listed here. We know that you're excited to put things in practice, but don't underestimate the importance of building a solid theoretical "ground floor" on which to build the rest of your knowledge.
This guide is probably the most important of the Purgatorio, in terms of single concepts learned, so if you feel that two or 3 months are needed to grasp all the concepts, don't be afraid of!