# Index

What is Transfer Learning
Transfer Learning Magical Properties
How Does Transfer Learning Work
Where to Find Pretrained Models
Conclusions

# What is Transfer Learning

Transfer Learning (opens new window) is a research problem in Machine Learning that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem.

This is done by using an existing model, pre-trained on another dataset (usually bigger and with generic content, like ImageNet (opens new window)), and then fine-tuning the model on a smaller and similar dataset.

For example, you can fine-tune a pre-trained model from ImageNet on your much smaller dataset of images, in order to benefit the pre-training and obtain a series of almost-magical properties and results.

To get a comprehensive overview of Why, What and How is used Transfer Learning, plus possible applications, read this awesome article:

Transfer Learning - Machine Learning's Next Frontier (opens new window)

# Transfer Learning Magical Properties

The theory of Transfer Learning suggests that when using a pre-trained model on a similar dataset:

You need less:

Data: you need much smaller datasets to obtain the same (or better) model performances
Computational Power / Computational Time: empirical results have shown that pre-trained models reach top performances after much fewer epochs of training, than starting the training from scratch

You improve:

Out-of-sample generalization: when predicting on test data you often obtain better accuracy, thus reducing overfitting (or whatever metric you're using)
Robustness: you make the model more "robust" to real-world low-quality data

So, the first thing to do every time you frame a new problem, is to ask yourself:

Can I leverage Transfer Learning in solving this problem?

If the answer is yes, be sure that the model you're using:

is robust and produces sounds results
it's results are reproducible
it's input it's compatible with your data
the source dataset (on which the model has been pre-trained) and the target dataset (your one) are similar enough

See:

# How Does Transfer Learning Work

The practice of Transfer Learning allows reusing most of the parameters (weights) of a neural network previously trained on a problem similar to the one we have to solve, dwelling on the training only of the last layers that are usually those dedicated to the classification and/or regression of the features obtained with the previous layers.

This allows us to obtain two key results:

reuse of the behavior of a model already trained to effectively extract features from input data
limit processing to a significantly smaller number of parameters, thus speeding up the training time by a big margin (corresponding to the last layers)

For example, if we were to classify apple varieties from an image, we could start by using a neural network already trained to classify images of planes, cars, dogs, cats, eggs, etc... This is because of the greater variety of training dataset ensures a better ability to extract features of various kinds from images.

Of the pre-trained neural network, we would keep only the initial layers, because they extract lower-level features (for examples lines and edges, which are common across every existing 2D image), and so we redefine only the last layers of classification.

The reused layers would be labeled as "read-only" or "frozen", in order to train only the parameters corresponding to the last layers, speeding up the training time, reducing the processing power required, and generally improving the accuracy.

In general, the set of layers that are reused by a pre-trained network is called a backbone or feature-extractor.

The practice of applying Transfer Learning theory to real-world models is called "fine-tuning".

Often where using a pre-trained model it's a good practice to reuse all the layers except the last one, but you can experiment with removing last N layers (usually not more than 3-4).

To learn more about Transfer Learning:

# Where to Find Pretrained Models

Having clear in mind how useful it is to be able to use pre-trained templates to solve your problem, let's now see where on the Internet you can find these templates, plus some guides and tutorials to do the fine tune them.

Some general purposes places where you fill find pre-trained models are:

It's very likely that you will find a model that fits your needs in these places. If you can't try to google

# Pre-Trained Models: Computer Vision

# Pre-Trained Models: Natural Language Processing and Understanding

# Transfer Learning Tutorials

# Conclusions

In this guide, you've seen what is Transfer Learning, why it's so useful, and why it's a hot topic in research today. Also, now you know where to find the model that fits your needs!

← A Messy Real World Best Practices →