# Index
- Why Python
- Computer Science Fundamentals
- Learn Python
- Develop Small Projects
- Learn Git and GitHub
- Ask Questions
- Conclusions
Let's dive right in!
# Why Python
According to Sun Tzu:
If you don't know Python, learn it yesterday!
There are hundreds of programming languages, mature as C and C++, or recent as Ruby, C# or Lua, or even corporate giants like Java.
Choosing a programming language to learn is difficult.
There is no language that can solve all the possible problems out there (it's not a one-shirt-fits-all solution), but Python is a good choice in many cases and is also well positioned for those who learn to program.
Python was born (opens new window) thinking "I want a programming language that is as close as possible to plain English". So, most of the time, when you don't know the name of something, just try to think about the plain literal English name of that thing, and probably the Python name will be that one!
Python is used by hundreds of thousands of programmers around the world and their numbers are growing all the time.
There are many reasons for this success.
Python is intuitive, you think of a way to solve a problem and you can express it that way and it works most of the time.
Python works everywhere, whether Windows, Linux/UNIX, Mac or other, from supercomputers to mobile phones.
It allows you to develop small applications and fast prototypes but is structured for creating large programs.
It is equipped with an easy to use graphical user interface, libraries for web programming. Best of all, it's free.
There is a large and ever-growing community of Python developers starting from academia, research all the way to business and everyday hobbyists taking up Python.
You can create and prototype things very quickly due to the ease of the language and availability of free libraries, packages and frameworks.
Why do you need to learn Python for doing Data Science?
Python is necessary and sufficient for doing Data Science.
Python is simple to understand, is simple to read, is powerful and flexible, can help you in everyday tasks (even if you're not a programmer!) and automatize a lot of boring stuff.
Moreover, is the core Data Science tool, and most of the frameworks we'll need in the next guides are written in Python or have rich Python wrappers.
Note on the R programming language:
In many Internet guides, you will find the programming language R (opens new window) recommended for Data Science.
R is a purely statistical programming language (not general purpose/scripting like Python).
Now, the opinion shared by Virgilio's collaborators is as follows:
IF:
- you are a beginner in programming
- You're a beginner in data science.
THEN:
It makes no sense to learn two programming languages at the same time.
It's a waste of time and energy, and it just creates confusion.
Not to mention that while Python is the undisputed king of scientific computing in general, and it is not acceptable that a Data Scientist does not know it, R is not as widespread or widely used and it does not have the same support that Python has and also due to the large cache of useful libraries.
What does this mean? that R is useless? Certainly not!
Indeed, it is recognized that its data visualization and statistical capabilities are useful and powerful... But really, if you're starting your path in Data Science today, start with Python, you can learn R later!
# Computer Science Fundamentals
Virgilio targets mainly those who have programming basics but have never touched the field of Data Science.
However, if you have never programmed in your life, don't worry!
Python is easy to learn (but the way to master it is long, like all things), and this MIT course introduces you to the main concepts of programming:
Introduction to Computer Science and Programming (opens new window)
If you want a wider curriculum (even like a first-level university) you can have a look here:
Free self-taught Path - Computer Science (opens new window)
That said, at Virgilio, we don't think you need to know the whole spectrum of the computer science concepts to start getting your hands dirty in Data Science.
But it's true that the more you know, the better!
On the other hand, we remember that Data Science is nothing more than mathematics and statistics, applied through programming!
So, it's worth spending time becoming very comfortable with Python.
# Learn Python
But how can you learn Python?
Virgilio hates to re-invent wheels (opens new window) and for our purposes, THIS (opens new window) free book is the perfect track to follow.
You can buy (opens new window) it too.
This free book is meant for total beginners.
The first chapter of the book will explain to you how to install Python (the interpreter of the code you will write) and the Python IDLE (a development environment that will simplify your coding life).
After reading a chapter, do the exercises, trying to look for alternative solutions.
Once you finish a chapter, go to the W3School (opens new window) website and try to solve as many exercises you can, depending on the topic you just learned in the book (for example, after Chapter 4 - Lists in Python (opens new window), you want to tackle the Exercises on Lists on W3School (opens new window)).
For additional Python resources, check this link (opens new window)!
# Coding Challenges
We suggest you get into CodeAcademy (opens new window), tackling coding challenges daily will improve your coding and problem-solving skills in general!
Here (opens new window) you find a detailed list of similar coding challenges (pick your favorite!).
# IDE
Even though the official Python IDLE is great for starting out with Python, we suggest you approach one of the following IDEs, which come with many more functionalities (and are better supported):
Virgilio tip: the former is born around Python, and maybe has more advanced functionalities, but VSCode is faster and simpler to use.
Try both, and read here (opens new window) a very detailed Reddit discussion about which to choose.
# Navigate the Official Docs
In order to become proficient with Python (like every programming language or technology), you must become comfortable with the official Python documentation (opens new window).
Working through the documentation as a beginner is a really good practice, even if a lot of things will be unclear. In fact, be able to explore the documentation of something is the key to learn it autonomously.
Here (opens new window) you have some tips to read documentations effectively.
# Stack Overflow
You should fall in love with StackOverflow (opens new window), a question-answer website about programming in general. Read How can I use Stack Overflow effectively as a beginner? (opens new window) and be ready to become grateful to awesome people around the world who answer questions daily on the website.
# Hold your Cheatsheet
Here you can find a very good Python Cheatsheet, hold it with you!
Python Cheatsheet (opens new window)
# Develop Small Projects
First of all, read this post from r/LearnProgramming:
TIP
The biggest lesson in the world of programming is the following: the best way to learn is getting your hands dirty!
Of course, to start with it makes sense to follow tutorials and guides (better still a structured book), but you need to experiment, make mistakes, and re-iterate this process to to be really able to improve your coding skills.
Once you are done with the book Automate the Boring Stuff (opens new window), start a small project and develop it by yourself!
The best case is that you find a topic or a task in which you have an interest, and tackle it!
If you lack imagination don't worry, here you have a list of 1000+ project ideas you can develop in Python 😃
Having a project keeps you motivated, don't underestimate it!
# Learn Git and GitHub
Git is a versioning system that allows you to always have every change in your code under control, be able to go back, and be sure that your code will never be lost!
Git is defined as Distributed Version Control System: What does it mean?
From this article (opens new window):
Control System: This basically means that Git is a content tracker. So Git can be used to store content — it is mostly used to store code due to the other features it provides.
Version Control System: The code which is stored in Git keeps changing as more code is added. Also, many developers can add code in parallel. So Version Control System helps in handling this by maintaining a history of what changes have happened. Also, Git provides features like branches and merges, which I will be covering later.
Distributed Version Control System: Git has a remote repository that is stored in a server and a local repository that is stored on your local machine. This means that the code is not just stored in a central server, but the full copy of the code is present in all the developers’ computers. Git is a Distributed Version Control System since the code is present in every developer’s computer.
Any existing software project that is not under version control is considered a dead project, and the responsible developers are considered crazy.
Data Science projects (which make heavy use of software) are no different, indeed!
They also have the additional problem of data versioning, which is the raw material on which you work most.
Here (opens new window) you can find a simple guide to Git.
Learn it, it's freaking worth (and necessary).
Documenting well your work with Git is crucial: read How to Write a Git Commit Message (opens new window).
Must read: Ten Simple Rules for Taking Advantage of Git and GitHub (opens new window)
# Ask Questions
A rule of thumb to learn fast and effectively is to ask questions and read other's questions and answers.
Join communities of people interested in the topic (e.g. Reddit): here you can find discussions, search by keywords (e.g. "matrix multiplication"), and ask questions, with experts who will answer and help you.
Some tips regarding questions:
- Try to form specific, well-written questions, to minimize the time used by the respondent.
- Do not ask a question whose answer is found with a quick search on google.
- If the questions are too general or show laziness they'll likely remain unanswered...
Some subreddits you can subscribe to are:
- r/Python (opens new window)
- r/LearnPython (opens new window)
- r/LearnProgramming (opens new window)
- r/Programming (opens new window)
- r/Computer Science (opens new window)
Other good places to post and read (well structured) questions are:
# Conclusions
The path to becoming a good programmer is long and requires commitment and dedication, but it will give you satisfaction (and value) like very few other things in life!
Programming is creativity, problem-solving, art!
If you follow the advice of this guide you won't have any problem to become proficient with Python, and moreover, you will have learned the main concepts of programming!
This means that to learn another language, the biggest obstacle will be to learn the syntax, but the concepts will remain more or less the same!
In the next guide we will see how to use the Jupyter Notebooks, a Python application designed specifically for Data Science and experimental programming!