# Index
Disclaimer!
This guide is not intended to be exhaustive of everything you can do with time series data, but to offer you a good overview of the possible approaches you can take or things you should check when dealing with them.
Having that said, let's begin!
# Time Is Money
Whether it is the sale of products, the estimated production of an agricultural field, or the forecast of any activity of a company, time series are an indispensable type of data, which has been used for thousands of years to inform the activities to be done today, based on what will happen tomorrow.
A good estimate of the future data, for a company, can bring a great saving of time (and money)!
If you ask any manager will tell you that time is even more important than money, because you can convert time into money, but the opposite is not possible!
Learning to work with time series is a very valuable skill in the real world, and applicable to many situations and problems.
It is also considered one of the most "difficult" types of data to manage since it has additional complexities compared to more traditional data such as tabular data or text.
We will see these additional complexities in the section Time Series Additional Challenges.
Now that you know why it's important to know how to work with time series, it's time to start seeing them in practice!
# Introduction To Time Series
# Know the Time Series Data Structure
The first resources Virgilio recommends to get you started are the following:
- Tutorial: Time Series Analysis with Pandas (opens new window)
- Working with Time Series (opens new window)
These are an exhaustive introduction to the type of "time" data, how it is manipulated with Pandas (indexing, transformation, visualization).
After you've gone through them, you should understand how to write "idiomatic Pandas" for time series, and you can find a tutorial (section 7 of Modern Pandas (opens new window)) here:
Common time-series data manipulation steps:
- Train-Test Split (opens new window)
- Resampling (opens new window)
- Shift (opens new window)
- Lag (opens new window)
- Autocorrelation (opens new window)
After learning these resources, you will have a better understanding of what "time-series" mean and what time-series data is about!
# Time Series Problems Overview
In the next sections, we see what are the main problems that can be solved with time series data.
WARNING
Be aware that this is not an exhaustive list, and you should consider it only a guideline!
Once you have identified a problem that you are interested in solving, you need to find papers that talk about the state of the art of that problem and go deeper!
It is always true that there is no silver bullet that solves every problem, remember the No Free Lunch Theorem (opens new window).
Learning to manipulate time-series data with Pandas is mandatory, and the Pandas DataFrame structure is the natural landing place for this type of data. However, due to the "sequential" nature of the data type, the DataFrame has some structural limitations, because it is designed for more classic tabular data.
To overcome these limitations, the Alan Turing Institute (opens new window) has developed a special version of Scikit-Learn, Sktime (opens new window), using a specific .ts format to load time series data into Pandas DataFrames.
From the docs:
sktime is a Python machine learning toolbox for time series with a unified interface for multiple learning tasks. sktime provides dedicated time series algorithms and scikit-learn compatible tools for building, tuning, and evaluating composite models.
We currently support:
- Forecasting
- Time series classification
- Time series regression
For deep learning methods, see our companion package: sktime-dl (opens new window).
To get started with SKtime get through the following resourse:
- SKTime - How to get started (opens new window)
- Loading data in SKTime (opens new window)
- SKTime - Examples (opens new window)
Also, be aware of TSLearn (opens new window), a very good machine learning toolkit for time series analysis in Python.
These are the basic tools you need to know to work with time series in Python, let's now have a look at what type of problem you can solve with time series data.
# Time Series Analysis
Imagine you are working as a Data Scientist, and you are asked to do "something useful" not better specified. Your first intuition must be, as with any type of data, to understand the dataset and the type of information it brings with it.
This process, often called "analysis", is fundamental before any other, for example before trying to make predictions about future values.
The path Virgilio suggests is the following:
First of all, take this free and extremely complete course that introduces you to time series analysis:
- Introduction to Time Series Analysis (opens new window)
- https://www.youtube.com/playlist?list=PL3N9eeOlCrP5cK0QRQxeJd6GrQvhAtpBK (opens new window)
Once you're done with this, you can expand your knowledge with the following (extremely detailed) resources.
- Eberly College of Science - Applied Time Series Analysis (opens new window)
- Statistical forecasting: Notes on regression and time series analysis (opens new window)
- Modern Time Series Analysis (opens new window)
Eventually, you can use this very deep book as a reference:
TIP
Be sure to check the The Matrix Profile section, you won't be disappointed!
This extremely useful method gives you invaluable insights about patterns in your times series data!
Virgilio is pretty sure that these 3 resources and the book can give you a very detailed preparation on the topic, so be sure to take the time that's needed to digest them very well!
Some educational videos can be found:
- Aileen Nielsen - Time Series Analysis - PyCon 2017 (opens new window)
- Time Series Analysis - Georgia Tech (opens new window)
- Time Series Talk (opens new window)
Some other links that can help you are the following:
- Time series analysis - Python (opens new window)
- Time Series Analysis in Python - Getting Started (opens new window)
- Learning Time Series Analysis (opens new window)
# Time Series Forecasting
The most classic of the problems related to time series is that of predicting the future values of the series.
Whether it is the price of a stock, the number of products sold, or the electricity needs of a part of the city, the topic of forecasting is fundamental in any aspect of human society.
An interesting read:
TIP
Knowing how to predict a time series, with enough historical data behind it, is an invaluable skill in the modern data market, and every kind of company can benefit from it!
As usual, Virgilio has collected for you the best free resources available, let's see!
The first and most important resource in time series forecasting is the following textbook:
The textbook uses R examples throughout the lessons, but it's not mandatory to learn R to use it!
The advice Virgilio gives you is to try to follow the lessons translating the R code to Python code, obviously helping you with Google and all the resources at your disposal!
In this way, you'll learn a lot about how to use Python for time series forecasting, and you'll have a lot of reusable code for your future projects!
Some other useful resources are:
- Applying Statistical Modeling & Machine Learning to Perform Time-Series Forecasting (opens new window)
- A Worked Example of Using Neural Networks for Time Series Prediction (opens new window)
- Reliably forecasting time-series in real-time (opens new window)
- Time Series Forecasting using Statistical and Machine Learning Models (opens new window)
TIP
It's important to understand that no-one-method will always outperform others, and that time series forecasting is a hot research topic, and you should always try to be aware of new techniques or approaches!
Be sure to read some survey papers, once a while, like this one (opens new window)!
Check also the Prophet (opens new window) project, from Facebook AI Labs!
Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects.
It works best with time series that have strong seasonal effects and several seasons of historical data.
Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.
TIP
Check also NeuralProphet (opens new window), Neural Network based Time-Series model, inspired by Facebook Prophet and AR-Net, built on PyTorch.
# Time Series Forecasting as a Classification Problem
A very useful approach to keep in mind when working with time series forecasting is to treat the problem of forecasting as a classification problem.
If treated in this way, the problem can be "simplified" and so be faced with a simpler problem to solve.
For example, preventing the exact price of Apple shares tomorrow could be a very difficult challenge, but fortunately, it is a problem that can be "simplified"!
An effective way to simplify this problem is to divide the space of tomorrow's possible prices (in terms of % variation) into bins!
For example with 20 bins, the classes would be: [-100%, -90%], [-90%,-80%]....the other bins...[+80%, +90%], [+90%,+100%].
This type of multiclass classification can be simplified "at-will", even going so far as to classify whether tomorrow the price will be higher or lower than today (binary classification, higher or lower).
If you think about it, if you struggle to get a decent model for the simplest (binary) classification, you don't have much hope to predict the exact value of the shares the next day!
This brings us to the next section of the guide, where you will learn how to deal with time series classification problems.
To read more about this approach:
- Forecasting vs Classification (opens new window)
- Forecasting to Classification: Predicting the direction of stock market price (opens new window)
# Time Series Classification
The problem of time series classification is of primary importance in the world of data mining, and over the last two decades, countless methods have been proposed to solve it.
Knowing how to predict the best choices in the near future is even vital in a variety of industrial scenarios or even critical scenarios such as aircraft safety systems!
Some examples of time series classification problems are:
- Predict whether a machine might break or not
- predict whether a customer will leave a service or not
- classify the type of disease of a patient by the time series of his heartbeat
- classify an animal according to the sound it makes
- predict anomalies and trend changes in quantities measured by the sensors
- and many more...
In this section of the guide, you can find a logical collection of all the resources that can be useful to take advantage of the power of modern time series classification methods.
First of all, the site you must ** refer to** to find all the latest datasets, techniques, papers, and code that you can use to solve your time series classification problems:
This website is an ongoing project to develop a comprehensive repository for research into time series classification.
For a complete comparison of all available useful methods and their tradeoffs, read:
- The Great Time Series Classification Bakeoff: An Experimental Evaluation (opens new window)
- Deep learning for time series classification: a review (opens new window)
A very hot (and very recent) method that has been proposed and you should be aware of:
Another extremely effective method that is always worth trying is that of the matrix profile (next section).
TIP
Some methods are more precise, others are faster, others require less data, others can find complex relationships in the data (neural networks). A fundamental point is that rarely it is worth choosing complex methods (such as a complicated recurrent neural network) over more "simple" ones, such as BOSS or DTW.
Equipped with the methods considered here, it's almost guaranteed that you will find the perfect fit for the requirements of your problems.
# The Matrix Profile
From the The UCR Matrix Profile Page (opens new window) website:
The Matrix Profile (and the algorithms to compute it: STAMP, STAMPI, STOMP, SCRIMP, SCRIMP++, SWAMP, and GPU-STOMP), has the potential to revolutionize time series data mining because of its generality, versatility, simplicity, and scalability. In particular, it has implications for:
- time series motif discovery
- time series joins
- shapelet discovery (classification)
- density estimation
- semantic segmentation
- visualization
- rule discovery
- clustering
To learn how to use the Matrix Profile for your times series classification problems go through these invaluable resources:
- Time Series Data Mining Using the Matrix Profile: A Complete Tutorial (opens new window)
- 100 Time Series Data Mining Questions and Answers (opens new window)
- The Matrix Profile - How Does It Work? (opens new window)
The recommended Python package that Virgilio suggests you to work with the Matrix Profile is Stumpy (opens new window) (docs here (opens new window)), which implements the latest and most efficient methods to calculate the Matrix Profile for your time series.
The Author of the package (Sean Law) contributed to this guide too, reviewing it on GitHub and helping Virgilio collecting all the resources. You can find a lot of detailed tutorials about using Stumpy for the matrix profile on his Medium Page (opens new window) or get in touch with him at his LinkedIn Page (opens new window). In addition, he's very active in answering questions and giving tips about Tumpy in the "Issues" section (opens new window) of the GitHub project!
One of the (very many) advantages of the Matrix Profile is that you can calculate it to feed it to a more traditional method of supervised classification.
In this sense, the Matrix Profile is also a method of automatic features extraction!
Other useful links are:
- Modern Time Series Analysis With STUMPY (opens new window)
- Stumpy Tutorials on Binder (opens new window)
- Stumpy: unleashing the power of the matrix profile for time series analysis (opens new window)
# Automatic Time Series Feature Extraction
The extraction and engineering of features from data is fundamental in the Data Science process, and time series are no different.
On the contrary, historically they offer additional challenges and require some knowledge of signal theory (opens new window) in order to be fully understood and exploited!
Luckily there are extremely intelligent methods to help you in this and extract the most important features automatically!
Say thanks to the creators of Tsfresh (opens new window):
TSFRESH automatically extracts 100s of features from time series. Those features describe basic characteristics of the time series such as the number of peaks, the average or maximal value, or more complex features such as the time-reversal symmetry statistic.
The set of features can then be used to construct machine learning models on the time series to be used for example in regression or classification tasks. To avoid extracting irrelevant features, the TSFRESH package has a built-in filtering procedure. This filtering procedure evaluates the explaining power and importance of each feature for the regression or classification tasks at hand.
Important links:
# Additional Resources
Check out these additional resources:
- Time Series Topic - awesome-ai-ml-dl (opens new window)
- Understanding LSTM Networks (opens new window)
- Visualizing memorization in RNNs (opens new window)
- Attention and Augmented Recurrent Neural Networks (opens new window)
# Conclusions
This guide is long and detailed, and you can use it as a clear path in becoming very proficient when working with time series, or as a reference for important resource links to keep in mind.
To become comfortable with the time series type of data will take commitment and dedication, in particular, it is highly recommended to experiment with the methods proposed on datasets that you can find online, as well as compose your dataset and work with it.
Remember that only by facing small real projects will you fix your theoretical knowledge acquired with the resources Virgilio provides you!