# Index

# Take your Bible with You

In this guide you will learn how to learn the theory about Deep Learning, its most recent applications and challenges, and how you can develop solid DL skills.

Virgilio has arranged for you a comprehensive list of link and resources, but he strongly recommends you to be use the following awesome book along the way:

This book represents our attempt to make Deep Learning approachable, teaching you the concepts, the context, and the code (both in Pytorch, Tensorflow and MXNET).

A more theoretical book is the following, and Virgilio used it to structure the content of this guide.

Armed with these 2 books, you should have no problem during your learning journey about Deep Learning!

# What is Deep Learning

Deep Learning (opens new window) is a subcategory of Machine Learning and indicates the branch that refers to algorithms called artificial neural networks. These algorithms are inspired by the structure and function of the biological brain (opens new window).

The similarities between neural and artificial networks are much more limited than the name suggests. In fact, artificial neural networks (pieces of software) are not even comparable (opens new window) to their biological counterparts in terms of complexity, functions, and vastness of processes that take place in a brain!

Despite this, neural networks are surprisingly useful in detecting hidden patterns in large amounts of data, quantities that are much larger than those that a single human could even see in the course of his life, let alone make sense of them!

Deep Learning is part of a wider family of Machine Learning methods based on the assimilation of data representations, as opposed to algorithms for the execution of specific tasks (like more traditional ML algorithms).

Neural networks (with which today the concept of Deep Learning is also brought to the attention of the general public) for examples have been applied (opens new window) in computer vision, automatic recognition of spoken language, natural language processing, audio recognition and bioinformatics (the use of computer tools to describe from a numerical and statistical point of view certain biological phenomena such as gene sequences, protein composition, and structure, biochemical processes in cells, etc.).

The main characteristics of Deep Learning are the following:

  • The learning algorithm always involves at least a neural network (more precisely, an acyclic graph) composed by several layers of neurons. These neurons are connected across layers, and these connections are adjustable weights, represented by a floating number. These numbers (the weights, the connections) are randomly initialized and then tweaked through an iterative process (the training phase) showing the network a big amount of data examples. The output of the training phase is the trained model.

  • Neural networks are part of the broader class (opens new window) of algorithms for learning data representation within Machine Learning

  • Neural networks use various levels of cascading non-linear units (opens new window) to perform characteristics extraction and transformation tasks. Each successive level uses the output of the previous level as input.

  • Neural networks learn multiple levels of representation corresponding to different levels of abstraction and these levels form a hierarchy of concepts. This is why Deep Learning is also called Representational Learning (opens new window).

As the 2010’s draw to a close, it’s worth taking a look back at the monumental progress that has been made in Deep Learning in this decade.

Read: The Decade of Deep Learning (opens new window)

# Why Deep Learning

In traditional Machine learning techniques, most of the features need to be identified by a domain expert in order to reduce the complexity of the data and make patterns more visible to learning algorithms to work.

For example, in a time-series dataset, traditionally an expert would separate the day from the month and year, giving them different columns in the dataset. Maybe he would build a "season" columns, of flag columns to indicate the Christmas holidays for example.

So an expert is helping the ML algorithm "disentangling" the several features it has to deal with.

One of the biggest advantages of Deep Learning algorithms is that they try to learn features from data in an automatic and incremental manner.

This reduces the need for domain expertise and hardcore feature extraction.

See: What is the difference between handcrafted and learned features (opens new window)

Although the demand for huge computational capabilities may represent a limit, the scalability of Deep Learning thanks to the increase in available data and algorithms is what differentiates it from Machine Learning: Deep Learning systems, in fact, improve their performance as the data increases while Machine Learning applications once they reach a certain level of performance are no longer scalable even by adding examples and training data to the neural network.

This is because in Machine Learning systems the characteristics of a given object (in the case of visual recognition systems) are extracted and selected manually and are used to create a model capable of categorizing objects (based on the classification and recognition of those characteristics).

In Deep Learning systems, on the other hand, the extraction of characteristics is done automatically: the neural network learns autonomously how to analyze raw data and how to perform a task (for example, classifying an object by recognizing its characteristics autonomously).

BUT!

We'll see that even if Deep Learning is such a powerful approach, this doesn't mean that every problem should be tackled with it.

Data scarcity and computational requirements often suggest that more traditional Machine Learning algorithms should be considered way before Deep Learning algorithms.

Basically, you should avoid Deep Learning in the following cases:

  • If there is a simpler approach that provides an adequate solution.

  • If you need to know why the network produced the output it did, and this is critical to the application (Deep Learning is hardly explainable (opens new window)).

  • If you can't define a loss function.

  • If you don't have resources to train the network (data or GPU-power).

  • If you don't have resources to tweak out the hyperparameters (i.e. training multiple models with different configurations and choose the best one).

See the discussion and the article:

When not to use Deep Learning (opens new window)

From the point of view of potentiality, if Deep Learning may seem more "fascinating" and useful than Machine Learning, it should be pointed out that the computational calculation required for their operation is really impacting, also from an economic point of view: the most advanced CPUs and top-of-the-range GPUs useful to "hold" the workloads of a Deep Learning system still cost thousands of dollars.

The use of computational capabilities via Cloud only partially mitigates the problem because the formation of a deep neural network often requires the processing of large amounts of data using clusters of high-end GPUs for many, many hours (so it is not said that buying "as a service" the necessary computing capacity is cheap).

Of course, this is valid for enterprise-level requirements, with a good local GPU you can still experiment and obtain good results with most of the small/medium-sized datasets. These will be enough to learn and be comfortable in training neural networks.

See the Workspace Setup and Cloud Computing Guide for a complete overview of the opportunities that Cloud Computing offers you to train your deep learning models, or how to set up your local workspace.

Advanced approach: In the last couple of years, two TPUs (opens new window) and clustered FPGAs (opens new window) - these approaches use much less power than CPUs and GPUs and can do low- and mixed- precision calculations especially suited for NN kind of computations. And they are faster than CPUs and GPUs in performance (use less energy as well).

# Neural Networks

So, it's time to understand what Neural Networks are and how you can use them in tackling real-world problems.

From now on we'll follow a track that will take us from zero knowledge about Neural Networks to fully understanding them, thanks to the Stanford University Deep Learning (opens new window) course and tutorials. Some of them come from Google, others from Stanford or Cambridge universities, and you will learn to leverage neural networks for several kinds of Deep Learning tasks.

We'll focus on the main 4 types of Neural Networks, even if novel architectures are discovered out on a daily basis (most of the time revisiting the existing ones, or mixing them in more complex ways).

It is not easy to understand the theory and application of Neural Networks at first glance. You will need to go through tutorials repeatedly to fully comprehend the topic. You should expect a good amount of time to grasp all the concepts related.

Virgilio tries to propose the following proven learning strategy, but you can tweak it as you prefer because every brain is different.

3-step iterative cycle:

  • 1 Get an idea of the main concepts through an entire pass of this Stanford course (opens new window), don't worry too much about the math explanations, focus on the what and why.
  • 2 Deeply explore one kind of network, with theory + tutorials + examples (e.g. RNN theory + RNN tutorials + RNN examples) with the links and resources of the topic section of the guide.
  • 3 After iterating the 2nd step for each topic, go through the entire Stanford course again. This time you can fully understand all the formulae, connecting them and picking up the "math flow" of the course.

This iterative process (1-2-2-2-2.....-3) can be repeated as many times as you want, and will probably construct in your mind a nice general schema of the things. In each complete iteration, you can drop one or more topics, and focus on the ones that are more interesting to you or not so clear.

In each section, there is content for the first time you arrive there (during the first complete iteration), and some content for next time you arrive there (after the 3rd step).

The structure follows the track proposed by the awesome Stanford course. You can find the slides here (opens new window) and the video here (opens new window).

The course leans a bit more on the Convolutional Neural Network type, but it gives you a good understanding of all the other types.

This (opens new window) is an alternative course from MIT, it has slightly different contents.

Of course, you should consider Deep Learning Specialization (opens new window) from the same course of the Machine Learning Theory (opens new window) Guide.

It's worth watching all of them, to compare and have a different point of view on the things you are learning, besides listening to some of the best professors of the world exploring each topic.

Once you're done, you should have a complete panoramic view of Deep Learning, even if some concepts or passages are not clear, so it's time to dive deep into each Neural Network type to understand them fully. Without diving deep in them you won't be able to use them in a useful way, because as you've already seen they have a lot of parameters and tricky configurations to consider.

This is the Deep Learning Book (opens new window) we refer to in each of the next sections.

# Feedforward Neural Networks

The basic kind of Neural Network. Best suited for classification and regression tasks where the input is a tabular dataset, or very simple images or text.

First look (in order):

Second pass:

Play with a Neural Network: Deep Learning Playground (opens new window)

Tips & Best practices: 1 (opens new window), 2 (opens new window), 3 (opens new window), 4 (opens new window), 5 (opens new window), 6 (opens new window), 7 (opens new window), 8 (opens new window).

# Convolutional Neural Networks

The most used kind of Neural Network to deal with images and videos. They are thought to be less computationally expensive than vanilla Neural Networks (sharing weight among the layers) and have other useful features that help in dealing with 2D images or videos.

First look (in order):

Second pass:

Dive Deeper: Use the latest "survey paper" to choose which papers are worth exploring for your case and get the big picture of the state of the art: A Survey of the Recent Architectures of Deep Convolutional Neural Networks (opens new window)

Tips & Best practices: 1 (opens new window), 2 (opens new window), 3 (opens new window), 4 (opens new window), 5 (opens new window), 6 (opens new window), 7 (opens new window), 8 (opens new window).

# Recurrent Neural Networks

Recurrent Neural Network (RNNs) outperform other kinds when dealing with text and time-series data because it implements the concept of "memory" inside the network.

First look (in order):

Second pass:

Dive Deeper: Use the latest "survey paper" to choose which papers are worth exploring for your case and have a nice panoramic of the state of the art: Fundamentals of Recurrent Neural Network and Long Short-Term Memory Network (opens new window)

Tips & Best practices: 1 (opens new window), 2 (opens new window), 3 (opens new window), 4 (opens new window), 5 (opens new window), 6 (opens new window), 7 (opens new window).

# Generative Adversarial Networks

Generative Adversarial Networks, are an approach to generative modeling using deep learning methods, such as convolutional neural networks.

Generative modeling is an unsupervised learning (opens new window) task in machine learning that involves automatically discovering and learning the regularities or patterns in input data in such a way that the model can be used to generate or output new examples that plausibly could have been drawn from the original dataset.

GANs are a smart way of training a generative model by framing the problem as a supervised learning problem with two sub-networks: the generator network that we train to generate new examples, and the discriminator network that tries to classify examples as either real (from the domain) or fake (generated). The two networks are trained together in a zero-sum game (opens new window), "adversarial", until the discriminator model is fooled about half the time, meaning the generator model is generating plausible examples.

First look (in order):

Dive Deeper: Use the latest "survey paper" to choose which papers are worth exploring for your case and have a nice panoramic of the state of the art: Generative Adversarial Networks: recent developments (opens new window)

Second pass: Generative Models Chapter (opens new window).

# AutoEncoders

Autoencoders are a particular class of Deep Learning algorithms that tries to, given and input, compress it in the internal layers and then reconstruct it as approaching the output layer, with the final output of the network that should be similar as much as possible to the input. They can be used to a variety of tasks, for example, denoising data, compress the data to visualize them, and use some trained layers of an Autoencoder to bootstrap the learning process of another Neural Network.

Autoencoders are based on the same principles of compression/decompression algorithms (zip/unzip archives).

First look (in order):

Second pass: AutoEncoders Chapter (opens new window).

Dive Deeper: Use the latest "survey paper" to choose which papers are worth exploring for your case and have a nice panoramic of the state of the art: Recent Advances in Autoencoder-Based Representation Learning (opens new window)

Tips & Best practices: 1 (opens new window), 2 (opens new window), 3 (opens new window), 4 (opens new window), 5 (opens new window)

# Training Neural Networks Effectively

The Deep Learning field, even if rapidly developing and giving awesome results, often lacks a theoretical understanding of the phenomenon that happens under the hood (especially during training).

There are some efforts about understanding the effect of hyperparameters (like the number of neurons per layer, the number of layers, etc..) but unluckily a big number of concepts in Deep Learning are still somewhat a magical black-box (opens new window) that we didn't crack yet.

Does this mean that you won't be able to train a neural net well? Not really! In fact, you'll probably get better results than you could ever get with any other method, and the hard thing will be understanding why you get them!

With experience, you'll develop an insight into which approaches work well and which don't, how to fix existing architectures to your specific problems, and so on.

Here you have some of the best interactive explanations on the Internet about training Neural Networks (Parameter initialization and Optimization):

Some other super-useful resources:

# Common Issues

You already know that training a Neural Network involves using a training dataset to update the model weights to create a good mapping of inputs to outputs.

This training process is solved using an optimization algorithm that searches through a space of possible values for the neural network model weights for a set of weights that results in a good performance on the training dataset.

Training Neural Networks can be hard (opens new window), and some of the reasons are listed here:

37 Reasons why your Neural Network is not working (opens new window)

Of course, your small CNN is performing well when testing it on a benchmark dataset like MNIST, but how does it work against a real-life dataset, maybe with millions of images?

Training Neural Networks effectively requires experience and even the most expert researchers try a lot of different neural architectures before the results they want.

It's hard to find a list on the internet that includes expert knowledge and best practices, and that's what Virgilio tries to do in this section.

First things first:

Must read:

It's strongly recommended to refer to this page (opens new window) from Stanford and go through all the Modules 1 and 2.

# Understanding Backpropagation

Backpropagation is the mathematical "magical trick" that powers the training of neural networks.

Basically, it allows using the Gradient Descent iterative algorithm to find the optimal weights of the network (the connections among the neurons).

How?

Backpropagation computes the gradient of the loss function with respect to the weights of the network for a single input/output example, and does so efficiently, unlike a naive direct computation of the gradient with respect to each weight individually.

Understanding backpropagation is simple if you have the necessary math basics (you can find them in the Math Fundamentals Guide).

The basic reference you need to understand backpropagation is the following chapter of the book Neural Networks and Deep Learning (opens new window):

How the backpropagation works (opens new window)

Once you're done with this reading, you can play with some other different explanations in order to deepen your understanding (remember that visualization is awesome for learning, as stated in the Virgilio's Teaching Strategy Guide.

Additional resources:

Reinvent the wheel: A good way to understanding backpropagation is to code it from scratch! Try to do it without help, but these tutorials will give you the solution.

# Transfer Learning:

This is probably one of the most useful and beautiful ideas about the world of machines that learns from data.

Transfer learning is a learning strategy in which a model developed for one task is reused as a starting point for a model on a second task.

It is a popular approach in deep neural network learning, where preformed models are used as a starting point for computer vision and natural language processing activities, given the vast computing resources and time needed to develop neural models on these problems.

Learn the theory behind Transfer Learning in this lecture:

Transfer Learning brings with it very useful and powerful features:

  • It allows you to train models in a fraction of the expected time.

  • Requires much fewer data than "from scratch" training.

  • The diversity of the starting dataset (on which the pre-training was done) compared to the dataset on which the fine-tuning is done (the second task) help the model's generalizing ability.

Even is Transfer learning may sometimes work when use models that were trained across different domains (i.e. image for text analysis), it works best within the same domain most of the time

Once you approach new problems, the first thing you should do is to look for a similar public dataset, order to find pre-trained models and fine-tune them on your task. Of course, the dataset and the problem need to be similar!

For example, if you want to train a neural network for image classification, you should try to fine tune a pre-trained model from the Imagenet dataset (opens new window). Or, if you are trying some text classification, you should try to fine tune the BERT model (opens new window)!

There are a lot of models you can find at the following links:

Never forget to try Transfer Learning on your problem!

# Full Stack Deep Learning Course

This is the most useful set of resources about Deep Learning in production you can find over the Internet, be sure to take it!

# Conclusions

In this guide, you found a path to a theoretical understanding of Deep Learning, the reasons to use it, the main kinds of neural networks you can find in literature. You have also seen how to forge a solid understanding of the math happening behind a neural network, and you've seen that Transfer Learning is often the go-to tool when facing a new problem.

In the next guides, you will get your hands dirty and start training models on real-world problems.