- What is Transfer Learning
- Transfer Learning Magical Properties
- How Does Transfer Learning Work
- Where to Find Pretrained Models
# What is Transfer Learning
Transfer Learning is a research problem in Machine Learning that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem.
This is done by using an existing model, pre-trained on another dataset (usually bigger and with generic content, like ImageNet), and then fine-tuning the model on a smaller and similar dataset.
For example, you can fine-tune a pre-trained model from ImageNet on your much smaller dataset of images, in order to benefit the pre-training and obtain a series of almost-magical properties and results.
To get a comprehensive overview of Why, What and How is used Transfer Learning, plus possible applications, read this awesome article:
# Transfer Learning Magical Properties
The theory of Transfer Learning suggests that when using a pre-trained model on a similar dataset:
You need less:
- Data: you need much smaller datasets to obtain the same (or better) model performances
- Computational Power / Computational Time: empirical results have shown that pre-trained models reach top performances after much fewer epochs of training, than starting the training from scratch
- Out-of-sample generalization: when predicting on test data you often obtain better accuracy, thus reducing overfitting (or whatever metric you're using)
- Robustness: you make the model more "robust" to real-world low-quality data
So, the first thing to do every time you frame a new problem, is to ask yourself:
Can I leverage Transfer Learning in solving this problem?
If the answer is yes, be sure that the model you're using:
- is robust and produces sounds results
- it's results are reproducible
- it's input it's compatible with your data
- the source dataset (on which the model has been pre-trained) and the target dataset (your one) are similar enough
- Finding Similarities in Datasets
- How to measure similarity or dissimilarity between two data set?
- Three Similarity Measures between One-Dimensional Data Sets
# How Does Transfer Learning Work
The practice of Transfer Learning allows reusing most of the parameters (weights) of a neural network previously trained on a problem similar to the one we have to solve, dwelling on the training only of the last layers that are usually those dedicated to the classification and/or regression of the features obtained with the previous layers.
This allows us to obtain two key results:
- reuse of the behavior of a model already trained to effectively extract features from input data
- limit processing to a significantly smaller number of parameters, thus speeding up the training time by a big margin (corresponding to the last layers)
For example, if we were to classify apple varieties from an image, we could start by using a neural network already trained to classify images of planes, cars, dogs, cats, eggs, etc... This is because of the greater variety of training dataset ensures a better ability to extract features of various kinds from images.
Of the pre-trained neural network, we would keep only the initial layers, because they extract lower-level features (for examples lines and edges, which are common across every existing 2D image), and so we redefine only the last layers of classification.
The reused layers would be labeled as "read-only" or "frozen", in order to train only the parameters corresponding to the last layers, speeding up the training time, reducing the processing power required, and generally improving the accuracy.
In general, the set of layers that are reused by a pre-trained network is called a backbone or feature-extractor.
The practice of applying Transfer Learning theory to real-world models is called "fine-tuning".
Often where using a pre-trained model it's a good practice to reuse all the layers except the last one, but you can experiment with removing last N layers (usually not more than 3-4).
To learn more about Transfer Learning:
- A Comprehensive Survey on Transfer Learning
- How transferable are features in deep neural networks?
- Using Pre-Training Can Improve Model Robustness and Uncertainty
- What makes ImageNet good for transfer learning?
# Where to Find Pretrained Models
Having clear in mind how useful it is to be able to use pre-trained templates to solve your problem, let's now see where on the Internet you can find these templates, plus some guides and tutorials to do the fine tune them.
Some general purposes places where you fill find pre-trained models are:
It's very likely that you will find a model that fits your needs in these places. If you can't try to google
# Pre-Trained Models: Computer Vision
- Computer Vision pre-trained models - GitHub
- PyTorch Image Models
- PyTorch Segmentation Models
- Classification Models (TF and Keras)
# Pre-Trained Models: Natural Language Processing and Understanding
- NLP Pre-Trained Models
- Pre-Trained Models for Natural Language Processing: A Survey
- Hugging Face Website
- Hugging Face Repository
- NLP Recipes - Microsoft
# Transfer Learning Tutorials
- A Comprehensive Hands-on Guide to Transfer Learning
- BigTransfer (BiT): State-of-the-art transfer learning for computer vision
- Transfer Learning in Practice with Keras
- Transfer learning with a pretrained ConvNet
- Fine-tuning a BERT model
- Pytorch Model Transfer Learning for Computer Vision
In this guide, you've seen what is Transfer Learning, why it's so useful, and why it's a hot topic in research today. Also, now you know where to find the model that fits your needs!