# Index

# Trusting is Good Not Trusting is Better

Application monitoring is a key part of running software in production.

Without it, the only way of finding an issue is through sheer luck or because a client has reported it.

Both of which are less than ideal, to say the least!

You wouldn't deploy an application without monitoring, so why do it for Machine Learning models?

So let's Start!

The first resource you should go through is the following:

It's a very detailed and comprehensive blog post and it addresses these topics:

  • What makes ML systems monitoring hard?
  • How can we monitor the usage and behavior of the model?
  • What are the key metrics to track and alert on?
  • What are the key principles for monitoring the ML system?

Once you are done with the previous blog post, you can refer to the related chapter of the book Machine Learning Engineering (opens new window) for a more detailed guide (strongly recommended buy, but you can read it for free online).

See also:

With these two resources, you should understand broadly the reasons and challenges of monitoring the Machine Learning models in production, and you should have reasonable strategies to tackle them.

As you know, the two main challenges regarding the model monitoring are:

  • How does the system behave?
  • How is the system used?

Let's see some specific resources for these two challenges.

# Monitoring the Behavior

When monitoring the behavior of the ML model in production, you should consider many aspects:

# Monitoring the Usage

If monitoring the behavior of the model can be technically hard, you should also be sure that your users are leveraging the model in the correct way.

With "users" we can refer to everything that consumes the output of the ML model, it can be a system, a human, or an ensemble of systems and humans.


If you're serving your model through an API (recommended way), you can refer to the API monitoring best practices in general (not specific for Machine Learning).


The major issue you can encounter when dealing with people is that they choose to not use the ML model.

This can happen for a variety of reasons, maybe they don't have confidence enough in the system, or they don't understand how to use it.

Take a look at this awesome guide from Google's engineers:

The People + AI Guidebook was written to help user experience (UX) professionals and product managers follow a human-centered approach to AI.

This detailed resources can get you started about the following topics:

  • User Needs + Defining Success
  • Data Collection + Evaluation
  • Mental Models
  • Explainability + Trust
  • Feedback + Control
  • Errors + Graceful Failure

General Tips

  • None of the above techniques is a silver bullet
  • Use only those things that work for you and are applicable in your use-cases
  • Don't literary follow any of the ideas, try them out and see how they work for you

# Conclusions

After walking through the resources listed here, you should be comfortable with the challenges and caveats of monitoring your Machine Learning model in production.

As you've seen, there are both technical challenges (data drift, input data check, output data check) and "human-related" challenges. In particular, Google's People + AI Guidebook will show you most of the human-related ones.

In the next section, Now Go Build, we'll give you a list of tips, best practices, and suggestions about how to put in practice everything you've learned in the Purgatorio!