# What you will learn
The purpose of this guide is to show you the importance of data visualization and why it's so useful when working with data. We'll show you best practices and reasons for using them, along with the "storytelling" approach to data science.
# Index
- Data Visualization
- Legolas, how do your elf eyes see?
- The Importance of Context
- The Data / Ink Ratio
- Choose an Effective Visual
- Focus your Audience’s Attention
- Think like a Designer
- Exploring Model Visuals
- Data Visualization tools
- Take Inspiration
- Storytelling with Data
- Common Visualization Mistakes
- Additional Resources
- Conclusions
# Data Visualization
It was hard for the Homo Sapiens to survive in the African savannah: a human or animal could kill you at any time.
The human brain has evolved in this wild and unpredictable context, and evolution has "coincidentally" chosen to devote a great deal of computing power to capturing and understanding the world through sight (more than 60 % (opens new window)).
So, it' trivial that a clear and effective data visualization it's one of your best weapons in the Data Science world.
The track which inspired me for this guide is one of the must-buy book Storytelling with Data (opens new window). By far is the best data visualization book I've ever read.
You can find here (opens new window) the free PDF.
Another piece of dense knowledge, with exceptional conciseness and "father" of every data visualization book: The Visual Display of Quantitative Information (opens new window).
I assume you know basic Python (opens new window).
Each content listed here is not tool-specific (apart from "tools", did you ever imagine that?).
# Legolas, how do your elf eyes see?
What do I intend with Data Visualization?
Let's consider the Tableau (opens new window) definition:
Data visualization is a graphical representation of information. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data. In the world of Big Data, data visualization tools and technologies are essential to analyze massive amounts of information and make data-driven decisions."
And according to Wikipedia (opens new window):
Effective visualization helps users analyze and reason about data and evidence. It makes complex data more accessible, understandable and usable. Users may have particular analytical tasks, such as making comparisons or understanding causality, and the design principle of the graphic should follow the task.
So, the goal of Data Visualization is to communicate data facts to drive wise business decisions. Often these decisions have to be taken by executives, councils or managers and maybe they don't know all the technical stuff behind data!
Another interesting concept you should be familiar with, is the Data-Driven company, a business model that more and more convincing organization to marry it.
Here (opens new window) you find a nice definition of Data-Driven company and here (opens new window) an interesting article about it.
As a data scientist, you are the interface among several business functions: product, research, techies and managers, and your main goal is to convince people into taking the right decisions, based on data.
Often you intend to abstract the representation of the data from the underlying technical details and make them available for others. As usual, the target you refer to is fundamental in the decision of what data to communicate, and how.
The natural consequence of this statement is that you need to consider the importance of context.
# The Importance of Context
As in any other field of communication, knowing your audience is critical to understand what you need to communicate.
Here (opens new window) you find an article with some tips to know your audience.
Basically, the more you know about your audience interests, jobs, and individual situations, the more you can intercept their business needs and desires.
The more you can be specific about who your audience is, the more effective your position will be for successful communication.
Avoid a general audience, such as "external stakeholders" or "anyone in the product department", trying to communicate to too many different individuals with different needs at once, you risk not communicating to any of them as effectively as you would if you narrowed your target audience.
If you must remain general for some reason, try to simplify the most you can, and check here (opens new window) for some useful tips.
Here (opens new window) you have some other reason why your data presentation should be driven by the target audience.
Once you've clear in mind your target, you can start developing the content you want to present.
# The Data / Ink Ratio
The human brain has limited resources and overkilling it with numbers and notions can only lead to negative effects. People become bored easily, especially if your charts are hard to read or they offer too much information. As most of the concepts I taught you in the Impactful presentation guide (opens new window), Less Is More is one of the principles you need to follow strictly. The Tufte's book stresses this out mercilessly calling it "Information / Ink Ratio". Here (opens new window) you find an interesting journey of a chart, that brings it to un-readable to the state-of-art of minimalism.\ The general lesson here is to get rid of everything is not needed to communicate the core of your data: extra lines, numbers, legends, names, points and so on.
The more noise you can avoid, the more your information will flow gently to your audience and the more they'll remember it.
Data/Ink Ratio = Amount of Ink used on Data / Amount of Ink used
Some additional resources to learn how to optimize the Data / Ink Ratio:
- 1 (opens new window), 2 (opens new window), 3 (opens new window), 4 (opens new window), 5 (opens new window), 6 (opens new window)
# Choose an Effective Visual
As a warrior choose his weapon depending on the context, you have to wisely choose the chart to use to represent each number you want to communicate.
Here (opens new window) is a list of the most common shapes and ideas to present data.
As you can see, there are many different graphs and other types of visual displays of information, but a handful will work for the majority of your needs (please don't use cake charts (opens new window)!).
Here (opens new window) and here (opens new window) you have a detailed checklist easy to follow, in order to decide which type of chart suits best for your case.
# Focus your Audience’s Attention
Within the brain, there are three types of memory that are important to understand as we design visual communications: iconic (opens new window) memory, short‐term (opens new window) memory, and long‐term (opens new window) memory. What we need to leverage well for our presentations is the iconic one. In fact, she's responsible for the most part of the first impression about what we see, and has by far the most important impact on our perception.
Here (opens new window) you find a good explanation about how to understand how to leverage iconic memory.
Here (opens new window) another good read about this topic.
# Think like a Designer
The most important principle in design is that "the design of _____ should be driven by its function".
Imagine a gladius (opens new window), the bread-and-butter weapon of the Roman army: you can easily understand what's his purpose, even if no one told you!
Read here (opens new window) a gentle introduction to design theory, really recommended!
Here (opens new window) you find useful design guidelines, and here (opens new window) how to design an effective dashboard.
# Exploring Model visuals
# Line Graph
Despite its simplicity is the most effective chart you can show (remember, less is more!). Probably the most part of the data you have can be presented through a line graph.
Here (opens new window) you find how to use its power with awareness.
# Annotated Line Graph
Like the previous one, but with annotations that can help readability.
Here (opens new window) you find only 88 examples of that 😃
# Stacked Bars
Probably the most effective chart to compare quantities, they were used more than 270 years ago (opens new window)!
Here (opens new window) you find complete guidelines to use them.
Here (opens new window) you can understand why is important to keep them as simple as possible, without 3D effects. Really interesting and in-depth read.
# Positive and Negative Stacked Bars
With negative values, you can easily show bad-vs-good performance or in-vs-out flows.
Here (opens new window) a detailed explanation about how and when to use them.
# Horizontal Stacked Bars
You don't need to be a fan of the Flat Earth "theory" to use Horizontal bar chart! They're similar to their vertical cousins, but orienting the chart horizontally means the category names along the left are easy to read in the horizontal text.
Here (opens new window) a guide about using them.
Here (opens new window) an interesting article that explains when to choose horizontal or vertical bars.
# Storytelling with Data
When you see a great play, watch a captivating movie, or read a fantastic book, you’ve experienced the magic of the story. A good story grabs your attention and takes you on a journey, evoking an emotional response. In the middle of it, you find yourself not wanting to turn away or put it down. After finishing it—a day, a week, or even a month later—you could easily describe it to a friend.
If you reach this goal in your audience, you've arrived, and you have won the first prize!
- Find a subject you care about. It is this genuine caring, and not your games with language, which will be the most compelling and seductive element in your style.
- Keep it simple. Great masters wrote sentences which were almost childlike when their subjects were most profound. “To be or not to be?” asks Shakespeare’s Hamlet. The longest word is three letters.
- Choose who to leave behind. If a sentence or a chart, no matter how excellent, does not illuminate your subject in some new and useful way, scratch it out.
- Don't fool people with data. These (opens new window) are clear examples of what I'm saying.
- Be clear. If I broke punctuation, or I bend the meaning of the words (technical and not), I would simply won't be understood.
- Pity the readers. Our audience requires us to be sympathetic and patient teachers, ever willing to simplify and clarify.
- Be suggestive. Try to summon pictures, sounds, and feeling during your stories.
- Have a great End. Leave your audience with a sentence that will be the remainder of your presentation, the most internal core of your topic. The things you want your audience this about when they remember your presentation.
For other tips and suggestions about storytelling, check my other Impactful presentation guide (opens new window).
Sorry, I'm a DRY principle (opens new window) hopeless fan.
# Data Visualization tools
I this section I introduce you to the most accessible and well-known tools, that will give you an expendable skill in Data Visualization.
# Microsoft Excel
Do a favor to yourself, learn Excel now! (opens new window)
Excel is the swiss-knife for a lot of basic data management, computation, and representation.
Despite its scalability limits, it's still one of the tools that support companies today.
Take this (opens new window) course about data visualization with Microsoft Excel.
Here (opens new window) you have another good one.
Here (opens new window) you have some exercises to test your skill.
Here (opens new window) a list of cool websites about Excel visualizations.
# Matplotlib
Matplotlib (opens new window) is one of the most used libraries for graphical representation in Python and a lot of other libraries are built on the top of it.
My personal opinion about it is that it's not too easy to understand and implement, but today is still relevant to grasp the most out of the tutorials on the Internet. You also have a lot of examples in StackOverflow (opens new window).
The official beginner's guide (opens new window) is really complete and contains everything you need to get started and then proficient with the library.
Here (opens new window) you have the complete documentation.
Here (opens new window) another bunch of chart-specific tutorials.
Here (opens new window) an ensemble of the 50 most useful visualizations with code.
Here (opens new window) you find advanced charts and the code to realize them.
Here (opens new window) an handy cheat-sheet.
Challenge yourself:
Best Practices
# Seaborn
As your brain is fascinated by the beauty in humans, art, or cute puppies, it is by beautiful visualizations. A common library built on top of Matplotlib is Seaborn (opens new window). It's used to enhance Matplotlib charts, so you need to become comfortable with the "mother library" first.
Follow this (opens new window) Youtube tutorial, it covers the most you need to get started with it.
Then read this (opens new window) long and complete blog post.
Here (opens new window) you find another long tutorial for beginners.
Challenge yourself: 1 (opens new window), 2 (opens new window), 3 (opens new window), 4 (opens new window)
Best practices: 1 (opens new window), 2 (opens new window), 3 (opens new window),
Additional examples: 1 (opens new window), 2 (opens new window), 3 (opens new window), 4 (opens new window)
# Bokeh
From the Bokeh (opens new window) documentation:
Bokeh is an interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of versatile graphics, and to extend this capability with high-performance interactivity over very large or streaming datasets. Bokeh can help anyone who would like to quickly and easily create interactive plots, dashboards, and data applications.
Bokeh prides itself on being a library for interactive data visualization.
Unlike popular counterparts in the Python visualization space, like Matplotlib and Seaborn, Bokeh renders its graphics using HTML and JavaScript. This makes it a great candidate for building interactive web-based dashboards and applications.
But what's the real difference among Bokeh, Matplotlib and Seaborn?
As a comment in this Reddit thread (opens new window) says:
Each library has its own distinct purpose:
Matplotlib is for basic plotting -- bars, pies, lines, scatter plots, etc.
Seaborn is for statistical visualization -- use it if you're creating heatmaps or somehow summarizing your data and still want to show the distribution of your data
Bokeh is for interactive visualization -- if your data is so complex (or you haven't yet found the "message" in your data), then use Bokeh to create interactive visualizations that will allow your viewers to explore the data themselves.
Here (opens new window) you have the official tutorial. It covers pretty everything you need to know, go through it. It contains exercises too.
Here (opens new window) you have the official user guide.
Another list of useful additional tutorials: 1 (opens new window), 2 (opens new window), 3 (opens new window)
Additional examples: 1 (opens new window), 2 (opens new window), 3 (opens new window), 4 (opens new window), 5 (opens new window)
# Power BI
Power Bi (opens new window) is a super cool tool from Microsoft, used mostly in Business Intelligence to build relationships among data, cleaning and visualizing them in wonderful interactive dashboards. The thing that I love of Power BI is that's free for personal usage and very cheap for enterprise purposes. It's also super easy to use.
Check this (opens new window) tutorial for beginners and then explore the official Guided Learning (opens new window), they have a lot of step-by-step tutorials and side projects to challenge yourself.
Good additional resources to follow: 1 (opens new window), 2 (opens new window), 3 (opens new window), 4 (opens new window), 5 (opens new window)
Best practices: 1 (opens new window), 2 (opens new window), 3 (opens new window), 4 (opens new window), 5 (opens new window)
# Take Inspiration
The best way you can get self-confident with data visualization is to watch, watch, and watch data visualization. I put here plenty of resources where you can take inspiration and ideas from.
Websites: 1 (opens new window), 2 (opens new window), 3 (opens new window), 4 (opens new window), 5 (opens new window), 6 (opens new window), 7 (opens new window), 8 (opens new window)
Bonus point! Try Google Facets (opens new window), a super useful web-tool for fast visualizations. It's really EASY to use, and you can upload your dataset and get the first insights from it. It's also awesome for showing data to not-technical people.
# Storytelling with Data
I can't stress more on this point. When you prepare data visualizations, focus on a story to tell to your audience.
This approach has several proven and positive (opens new window) effects.
Definitely check this (opens new window), is the best resource I've ever found on this concept applied in data visualization.
Here (opens new window) you find a good article that explains why.
Here (opens new window)'s a great presentation about storytelling with data.
Here (opens new window) another interesting read.
# Common Visualization Mistakes
From an old Chinese statement:
Look at the other's mistakes, and correct your ones.
To know what are the most frequent mistakes is fundamental to master a skill, so I list here for you a bunch of resources that will give you the awareness of the "Don't"s in data visualization:
- 1 (opens new window), 2 (opens new window), 3 (opens new window), 4 (opens new window), 5 (opens new window)
# Additional Resources
I really love data visualization and during the last years, I've collected a lot of cool websites and "need-to-bookmark" places. I've already given you a lot of them, here I list everything else is remaining.
- Data is Beautiful SubReddit (opens new window)
- Analytics SubReddit (opens new window)
- The Pudding (opens new window)
- Flow Data (opens new window)
- Small Multiples (opens new window)
- Awesome Interactive Journalism (opens new window)
- EdwardTufte Twitter account (opens new window)
- Fivethirtyeight (opens new window)
- List of super cool websites (opens new window)
- Every line of Hamilton (opens new window)
- Storytelling with Data blog (opens new window)
# Conclusions
In this guide we've tried to list a map of the most useful resources about data visualization (after searching and compared a lot of them), trying to give you a reference point of the subject.
You know that the only way to become really comfortable with something is to face it in the first person. So the best tip I can give you is "find your project".
- Choose an argument that interests you in some way. You can find a lot o free public dataset to experiment with. Check your country websites or enter Kaggle (opens new window) or UCI (opens new window) to find a lot of them.
- Plot the data in every way you can experiment, applying the techniques you have seen.
- Inspire yourself watching how people visualized similar datasets. Search in Kaggle for "Visualization" and you'll be stunned by the number of examples.
It's better to be proficient in one tool and barely know other ones, than being the jack of all trades but masters of none. So, I suggest you choose the tool that inspires you more and diving deep into that. In fact, the tools we've seen overlap with each other in many ways, but they are different in scale and approach.