# Index

Let's dive right in!

# Motivation

The field of statistics is the science of learning from data.

  • Statistics is a crucial process behind how we make discoveries in science, make decisions based on data, and make predictions.

  • Statistical knowledge helps you use the proper methods to collect the data, employ the correct analyses, and effectively present the results.

Every fancy term you read related to the field of Data Science, or the even more misunderstood Artificial Intelligence, is nothing more than statistical models trained with some mathematical algorithms.

That's why learning the basics of statistics is fundamental in order to tackle real-world Data Science problems and build powerful predictive models upon them.

The two fundamental subjects to know in order to develop successful Data Science projects are descriptive statistics and theory of probability.

We'll explore an additional one, Bayesian Statistics, which is pretty powerful but not mandatory for a first pass of the statistics topics.

Play with this interactive website (opens new window) to taste the flavor of the topics that this guide will suggest to you.

It's awesome.

# Descriptive Statistics

Descriptive statistics (opens new window) aim to analyze and summarize the data collected during an experiment.

The set of methods and techniques used in descriptive statistics makes it possible to express the information contained in a set of data by means of graphs and particular numerical indicators.

Moreover, with the techniques of investigation of descriptive statistics, it is possible to verify the adaptation of the experimental data to an assigned theoretical model, as well as to carry out comparative analyses between datasets.

Descriptive statistics are widely used in economics, demography, medicine, and in all-natural sciences.

First of all watch these videos, they are truly inspirational:


To understand the concepts of descriptive statistics, the book Virgilio recommends is the following:

Statistics in Plain English (opens new window) - Timothy C. Urdan (opens new window)

This book is fantastic because it focuses more on the reason behind the concepts concepts rather than on formulas or calculations, which are easily found elsewhere.

The book is mandatory for a full understanding of descriptive statistics (it is certainly not the only book of statistics out there, but it is the best in terms of clarity of explanation and completeness of the basic concepts.).

Once finished, you can choose whether to go deeper with a more technical manual, like the following free book:

Elements of Statistics (opens new window)

This book deals in more depth with the various topics, but to start doing small experiments in Data Science is not necessary to address it immediately.

Keep it as a reference!

# Check your knowledge

Test your knowledge with some of these exercises (opens new window).

Anyway, you can find a ton of them online, and remember, that the more you practice, the better you will get!

# Probability Theory

Probability theory governs every daily event, from microscopic to macroscopic, even if we hardly realize it!

But you may have thought about the odds of winning, for example, the game of dice, or when you estimate the probability of having successfully passed an exam!

Probability theory is so interconnected with statistical and mathematical concepts that it is difficult to explain why it is fundamental: simply consider it as the basic building blocks of more advanced statistical concepts based on it.

Not to mention that knowledge of probability theory can be very useful in everyday life since it can suggest to us how to behave on certain occasions, to make the best choices, or involving the least risk of failure.

Fortunately, you can have the (perhaps) best course on probability theory available to date, offered as usual by MIT (opens new window).

It's good to have a first-class education from home for free, huh?

What an incredible time to be alive!

Take the course at:

Introduction to Probability - Video Lectures (opens new window)

In addition, keep this free complete book as a reference:

Introduction to Probability - Book (opens new window)

# Check your knowledge

Test your knowledge with some of these exercises (opens new window).

Anyway, you can find a ton of them online, and remember, that the more you practice, the better you will get!

# Bayesian Statistics

This last topic is not fundamental for a first pass study, but it's highly recommended since some of Machine Learning algorithms are based on it or leverage some concepts from Bayesian Theory.

The Bayesian approach is often distinguished from the "classical" approach, which you have studied in the Descriptive Statistics section, and which is called "frequentist" when compared to the "Bayesian" approach.

Read:

The Bayesian approach to statistics is based upon the famous Bayes Theorem (opens new window), which describes the probability of an event, based on prior knowledge of conditions that might be related to the event.

For example, if cancer is related to age, then, using Bayes’ theorem, a person’s age can be used to more precisely or accurately assess the probability that they have cancer, in contrast to the assessment of the probability of cancer made without knowledge of the person's age.

First, get comfortable with the Bayesian approach with:

Then you can take this course to dive deep into the Bayesian approach to statistics, along with many more concepts, taking the following course:

Statistical Rethinking - Video Lectures (opens new window)

You find the complete book from the same author at:

Statistical Rethinking - Book (opens new window)

# Ask Questions

A rule of thumb rule to learn fast and effectively is to ask questions and read other's questions and answers.

Join communities of people interested in the topic (e.g. Reddit): here you can find discussions, search by keywords (e.g. "standard deviation"), and ask questions, with experts who will answer and help you.

Some tips regarding questions:

  • Try to form specific, well-written questions, to minimize the time used by the respondent.
  • Do not ask a question whose answer is found with a quick search on google.
  • If the questions are too general or show laziness they'll likely remain unanswered...

Some subreddits you can subscribe to are:

Two other good places to post (well structured) questions are:

# Conclusions

With the above-suggested courses, you should be equipped with most of the statistics knowledge you need to start working on simple Data Science projects.

A very good book we suggest you to read is "How To Lie With Statistics" (opens new window).

The book is a brief, breezy illustrated volume outlining errors when it comes to the interpretation of statistics, and how these errors may create incorrect conclusions. Very suggested!

The next guide will focus on learning the Python programming language, the real tool that will allow you to put into practice the knowledge you've gained with your hard study.