# Index

The purpose of this guide is not so much to collect every existing best practice in Data Science (a very difficult task) but rather to give you the method by which to look for new best practices and put them into practice.

However, several resources are listed that should be more than sufficient to develop advanced and robust practices for your Data Science projects.

Interesting read:

# The Mutant Runner

Exploring the Purgatorio, you've encountered a lot of links and websites that list good practices for Data Science, but it happens that those lists can be either contradictory or incomplete.

Why is it so difficult to build good and consistent practices for Data Science?

  • Knowledge is fragmented among the many researchers, professors, and practitioners
  • Data Science development best practices are often hidden in skillful teams at top companies and are hardly shared
  • Data Science problems are rarely similar, and never the same
  • Algorithms that improve the state of the art are published in conferences continuously
  • New methods to evaluate algorithms are proposed
  • Tools and libraries are developed and improved, new ones are created for every need

So, if developing and adopting good Data Science practices is not trivial, some ways allow us to get around the obstacle.


# Producing Good Data Science Software

First, doing Data Science means applying programming to statistics and mathematics.

This can be done in various forms, such as data visualization, statistical analysis, or by building predictive models (and more...).

The only certainty is that you are almost always writing software!

Now, software design, coding, and maintenance offer challenges widely faced by software engineering over the last 40-50 years of history, and there are advanced best practices to address the biggest challenges offered by the complexity of modern software.

To learn software development best practices, check these links:

These Reddit threads:

And buy one of these books, they are definitely worth the price:

With these resources, you should be well equipped to understand and tackle the challenges of modern software programs, but most importantly you can transfer these concepts to the Data Science problems (that are software problems too).

# How to Discover and Adopt Data Science Best Practices

In addition to all the challenges of traditional software, Data Science offers additional ones to deal with, caused by the reasons listed in The Mutant Runner section.

What Virgilio suggests to you to discover and apply the specific good practices of Data Science... is simply to seek them out!

Virgilio was born as a place to collect all these kinds of resources and concepts, but it's obviously impossible to expect it to contain everything!

So, when dealing with a specific problem, Google for best practices about it, maybe adding the website source you want to look into:


For example, if you are dealing with an image classification project, you should search:

  • "Image classification best practices"
  • "Image classification best practices Reddit"
  • "Image classification best practices StackExchange"

This kind of approach, especially skimming Reddit threads for hidden gem-comments, can give you invaluable insights from experts!

And if you can't find anything, about it, just post a question!

# Data Science Best Practices

This is a pretty good but sure not definitive list of the best links Virgil has found, listing the best practices currently widespread in the field of Data Science.

Be sure to check the points of the Automation and Reproducibility Virgilio Guide!

# General Rules

# Deep Learning Best Practices

This is the most useful set of resources about Deep Learning in production you can find over the Internet, be sure to take it!

# Deliver Successful Projects

# Interesting Reads

# Conclusions

In this guide, you've seen that Data Science problems are at their core Software problems, and you've learned that there's no such thing as a well-defined and stable set of best practices, and that they always evolve over time.