# What you will learn
The purpose of this guide is to show you the importance of data visualization and why it's so useful when working with data. We'll show you best practices and reasons for using them, along with the "storytelling" approach to data science.
- Data Visualization
- Legolas, how do your elf eyes see?
- The Importance of Context
- The Data / Ink Ratio
- Choose an Effective Visual
- Focus your Audience’s Attention
- Think like a Designer
- Exploring Model Visuals
- Data Visualization tools
- Take Inspiration
- Storytelling with Data
- Common Visualization Mistakes
- Additional Resources
# Data Visualization
It was hard for the Homo Sapiens to survive in the African savannah: a human or animal could kill you at any time.
The human brain has evolved in this wild and unpredictable context, and evolution has "coincidentally" chosen to devote a great deal of computing power to capturing and understanding the world through sight (more than 60 %).
So, it' trivial that a clear and effective data visualization it's one of your best weapons in the Data Science world.
The track which inspired me for this guide is one of the must-buy book Storytelling with Data. By far is the best data visualization book I've ever read.
You can find here the free PDF.
Another piece of dense knowledge, with exceptional conciseness and "father" of every data visualization book: The Visual Display of Quantitative Information.
I assume you know basic Python.
Each content listed here is not tool-specific (apart from "tools", did you ever imagine that?).
# Legolas, how do your elf eyes see?
What do I intend with Data Visualization?
Let's consider the Tableau definition:
Data visualization is a graphical representation of information. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data. In the world of Big Data, data visualization tools and technologies are essential to analyze massive amounts of information and make data-driven decisions."
And according to Wikipedia:
Effective visualization helps users analyze and reason about data and evidence. It makes complex data more accessible, understandable and usable. Users may have particular analytical tasks, such as making comparisons or understanding causality, and the design principle of the graphic should follow the task.
So, the goal of Data Visualization is to communicate data facts to drive wise business decisions. Often these decisions have to be taken by executives, councils or managers and maybe they don't know all the technical stuff behind data!
Another interesting concept you should be familiar with, is the Data-Driven company, a business model that more and more convincing organization to marry it.
As a data scientist, you are the interface among several business functions: product, research, techies and managers, and your main goal is to convince people into taking the right decisions, based on data.
Often you intend to abstract the representation of the data from the underlying technical details and make them available for others. As usual, the target you refer to is fundamental in the decision of what data to communicate, and how.
The natural consequence of this statement is that you need to consider the importance of context.
# The Importance of Context
As in any other field of communication, knowing your audience is critical to understand what you need to communicate.
Here you find an article with some tips to know your audience.
Basically, the more you know about your audience interests, jobs, and individual situations, the more you can intercept their business needs and desires. The more you can be specific about who your audience is, the more effective your position will be for successful communication.
Avoid a general audience, such as "external stakeholders" or "anyone in the product department", trying to communicate to too many different individuals with different needs at once, you risk not communicating to any of them as effectively as you would if you narrowed your target audience.
If you must remain general for some reason, try to simplify the most you can, and check here for some useful tips.
Here you have some other reason why your data presentation should be driven by the target audience.
Once you've clear in mind your target, you can start developing the content you want to present.
# The Data / Ink Ratio
The human brain has limited resources and overkilling it with numbers and notions can only lead to negative effects. People become bored easily, especially if your charts are hard to read or they offer too much information. As most of the concepts I taught you in the Impactful presentation guide, Less Is More is one of the principles you need to follow strictly. The Tufte's book stresses this out mercilessly calling it "Information / Ink Ratio". Here you find an interesting journey of a chart, that brings it to un-readable to the state-of-art of minimalism.\ The general lesson here is to get rid of everything is not needed to communicate the core of your data: extra lines, numbers, legends, names, points and so on.
The more noise you can avoid, the more your information will flow gently to your audience and the more they'll remember it.
Data/Ink Ratio = Amount of Ink used on Data / Amount of Ink used
Some additional resources to learn how to optimize the Data / Ink Ratio:
# Choose an Effective Visual
As a warrior choose his weapon depending on the context, you have to wisely choose the chart to use to represent each number you want to communicate.
Here is a list of the most common shapes and ideas to present data.
As you can see, there are many different graphs and other types of visual displays of information, but a handful will work for the majority of your needs (please don't use cake charts!).
Here and here you have a detailed checklist easy to follow, in order to decide which type of chart suits best for your case.
# Focus your Audience’s Attention
Within the brain, there are three types of memory that are important to understand as we design visual communications: iconic memory, short‐term memory, and long‐term memory. What we need to leverage well for our presentations is the iconic one. In fact, she's responsible for the most part of the first impression about what we see, and has by far the most important impact on our perception.
Here you find a good explanation about how to understand how to leverage iconic memory.
Here another good read about this topic.
# Think like a Designer
The most important principle in design is that "the design of _____ should be driven by its function".
Imagine a gladius, the bread-and-butter weapon of the Roman army: you can easily understand what's his purpose, even if no one told you!
Read here a gentle introduction to design theory, really recommended!
Here you find useful design guidelines, and here how to design an effective dashboard.
# Exploring Model visuals
# Line Graph
Despite its simplicity is the most effective chart you can show (remember, less is more!). Probably the most part of the data you have can be presented through a line graph.
Here you find how to use its power with awareness.
# Annotated Line Graph
Like the previous one, but with annotations that can help readability.
Here you find only 88 examples of that 😃
# Stacked Bars
Probably the most effective chart to compare quantities, they were used more than 270 years ago!
Here you find complete guidelines to use them. Here you can understand why is important to keep them as simple as possible, without 3D effects. Really interesting and in-depth read.
# Positive and Negative Stacked Bars
With negative values, you can easily show bad-vs-good performance or in-vs-out flows.
Here a detailed explanation about how and when to use them.
# Horizontal Stacked Bars
You don't need to be a fan of the Flat Earth "theory" to use Horizontal bar chart! They're similar to their vertical cousins, but orienting the chart horizontally means the category names along the left are easy to read in the horizontal text.
Here a guide about using them. Here an interesting article that explains when to choose horizontal or vertical bars.
# Storytelling with Data
When you see a great play, watch a captivating movie, or read a fantastic book, you’ve experienced the magic of the story. A good story grabs your attention and takes you on a journey, evoking an emotional response. In the middle of it, you find yourself not wanting to turn away or put it down. After finishing it—a day, a week, or even a month later—you could easily describe it to a friend.
If you reach this goal in your audience, you've arrived, and you have won the first prize!
- Find a subject you care about. It is this genuine caring, and not your games with language, which will be the most compelling and seductive element in your style.
- Keep it simple. Great masters wrote sentences which were almost childlike when their subjects were most profound. “To be or not to be?” asks Shakespeare’s Hamlet. The longest word is three letters.
- Choose who to leave behind. If a sentence or a chart, no matter how excellent, does not illuminate your subject in some new and useful way, scratch it out.
- Don't fool people with data. These are clear examples of what I'm saying.
- Be clear. If I broke punctuation, or I bend the meaning of the words (technical and not), I would simply won't be understood.
- Pity the readers. Our audience requires us to be sympathetic and patient teachers, ever willing to simplify and clarify.
- Be suggestive. Try to summon pictures, sounds, and feeling during your stories.
- Have a great End. Leave your audience with a sentence that will be the remainder of your presentation, the most internal core of your topic. The things you want your audience this about when they remember your presentation.
For other tips and suggestions about storytelling, check my other Impactful presentation guide.
Sorry, I'm a DRY principle hopeless fan.
# Data Visualization tools
I this section I introduce you to the most accessible and well-known tools, that will give you an expendable skill in Data Visualization.
# Microsoft Excel
Do a favor to yourself, learn Excel now!
Excel is the swiss-knife for a lot of basic data management, computation, and representation.
Despite its scalability limits, it's still one of the tools that support companies today.
Take this course about data visualization with Microsoft Excel.
Here you have another good one.
Here you have some exercises to test your skill.
Here a list of cool websites about Excel visualizations.
Matplotlib is one of the most used libraries for graphical representation in Python and a lot of other libraries are built on the top of it.
My personal opinion about it is that it's not too easy to understand and implement, but today is still relevant to grasp the most out of the tutorials on the Internet. You also have a lot of examples in StackOverflow.
The official beginner's guide is really complete and contains everything you need to get started and then proficient with the library.
Here you have the complete documentation.
Here another bunch of chart-specific tutorials.
Here an ensemble of the 50 most useful visualizations with code.
Here you find advanced charts and the code to realize them. Here an handy cheat-sheet.
As your brain is fascinated by the beauty in humans, art, or cute puppies, it is by beautiful visualizations. A common library built on top of Matplotlib is Seaborn. It's used to enhance Matplotlib charts, so you need to become comfortable with the "mother library" first.
Follow this Youtube tutorial, it covers the most you need to get started with it.
Then read this long and complete blog post.
Here you find another long tutorial for beginners.
From the Bokeh documentation:
Bokeh is an interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of versatile graphics, and to extend this capability with high-performance interactivity over very large or streaming datasets. Bokeh can help anyone who would like to quickly and easily create interactive plots, dashboards, and data applications.
Bokeh prides itself on being a library for interactive data visualization.
But what's the real difference among Bokeh, Matplotlib and Seaborn?
As a comment in this Reddit thread says:
Each library has its own distinct purpose:
Matplotlib is for basic plotting -- bars, pies, lines, scatter plots, etc.
Seaborn is for statistical visualization -- use it if you're creating heatmaps or somehow summarizing your data and still want to show the distribution of your data
Bokeh is for interactive visualization -- if your data is so complex (or you haven't yet found the "message" in your data), then use Bokeh to create interactive visualizations that will allow your viewers to explore the data themselves.
# Power BI
Power Bi is a super cool tool from Microsoft, used mostly in Business Intelligence to build relationships among data, cleaning and visualizing them in wonderful interactive dashboards. The thing that I love of Power BI is that's free for personal usage and very cheap for enterprise purposes. It's also super easy to use.
Check this tutorial for beginners and then explore the official Guided Learning, they have a lot of step-by-step tutorials and side projects to challenge yourself.
# Take Inspiration
The best way you can get self-confident with data visualization is to watch, watch, and watch data visualization. I put here plenty of resources where you can take inspiration and ideas from.
Bonus point! Try Google Facets, a super useful web-tool for fast visualizations. It's really EASY to use, and you can upload your dataset and get the first insights from it. It's also awesome for showing data to not-technical people.
# Storytelling with Data
I can't stress more on this point. When you prepare data visualizations, focus on a story to tell to your audience.
This approach has several proven and positive effects.
Definitely check this, is the best resource I've ever found on this concept applied in data visualization.
Here you find a good article that explains why.
Here's a great presentation about storytelling with data.
Here another interesting read.
# Common Visualization Mistakes
From an old Chinese statement:
Look at the other's mistakes, and correct your ones.
To know what are the most frequent mistakes is fundamental to master a skill, so I list here for you a bunch of resources that will give you the awareness of the "Don't"s in data visualization:
# Additional Resources
I really love data visualization and during the last years, I've collected a lot of cool websites and "need-to-bookmark" places. I've already given you a lot of them, here I list everything else is remaining.
- Data is Beautiful SubReddit
- Analytics SubReddit
- The Pudding
- Flow Data
- Small Multiples
- Awesome Interactive Journalism
- EdwardTufte Twitter account
- List of super cool websites
- Every line of Hamilton
- Storytelling with Data blog
In this guide we've tried to list a map of the most useful resources about data visualization (after searching and compared a lot of them), trying to give you a reference point of the subject.
You know that the only way to become really comfortable with something is to face it in the first person. So the best tip I can give you is "find your project".
- Choose an argument that interests you in some way. You can find a lot o free public dataset to experiment with. Check your country websites or enter Kaggle or UCI to find a lot of them.
- Plot the data in every way you can experiment, applying the techniques you have seen.
- Inspire yourself watching how people visualized similar datasets. Search in Kaggle for "Visualization" and you'll be stunned by the number of examples.
It's better to be proficient in one tool and barely know other ones, than being the jack of all trades but masters of none. So, I suggest you choose the tool that inspires you more and diving deep into that. In fact, the tools we've seen overlap with each other in many ways, but they are different in scale and approach.