You got your model into production.
It’s making predictions and serving them to stakeholders.
The pipeline is automated.
Now it’s time to kick back and relax, your work is done.
I like dreaming too.
Alright, back to reality. Let’s discuss model drift: what it is, why it occurs, how to detect it, and how to address it before it secretly destroys performance and the stakeholders’ trust in the model along with it.
What is Model Drift?
Model drift is the deterioration in performance of a predictive model over time, and even the most powerful, accurate models are at risk of it. Model drift is not a reflection of poor training techniques or bad data gathering, but rather something that all data scientists must keep a watchful eye for.
Image by VectorElements on Unsplash
Let’s look at an example. A binary classifier model is trained on two years of historical data. Performance is good, AUC in the low 0.9s, precision and recall both sufficiently high. The model passes the peer review stage and makes it into the production environment. Here, it starts making predictions live. After 90 days, the data scientist queries the predictions the model has made in production and runs them through a validation script that calculates performance metrics. Performance is right on par with expectations from POC (proof of concept), and is relayed to stakeholders: “The model is performing as expected. Predictions are accurate.”
Fast forward two years. A request comes in to investigate the model. It is being reported to consistetly be making incorrect predictions, and the stakeholders are losing trust in the model. There is even mention of potentially using their old Excel spreadsheet method if things keep up this way. The data scientist queries the past 6 months of data and runs it through the validation script. The data scientist rubs their eyes, checks their notes, and is flabberghasted. AUC is sitting at 0.6, precision and recall both dramatically low. “How could this be? I trained a good model. I even validated the model after it went live! What happened?” the data scientist questions. Model drift is what happened. It snuck in, undetected for months and wreaked havoc on predictions.
This is the harsh reality many predictive models face in production. Let’s talk about why it happens.
Why Does Model Drift Happen?
Boiled down, model drift occurs because models live in the real world. The model was trained on one reality, and that reality has shifted in some way since it has been deployed into production.
One of the most common causes of model drift is a change in how data is recorded. When data was initially gathered for training, predictive features and the target looked one way, and now, they are different. The algorithm learned the specific relationship between them, but now, that relationship has changed. The model hasn’t learned how to handle the new relationship, so it carries on making predictions the best it can given how it was trained.
Model drift typically falls into two categories:
Data Drift (features change)
Concept Drift (relationships change/population shift)
Let’s look as some examples.
Example #1: Data Drift
Height and weight are used to predict risk of diabetes. The data scientist gathered two years of patient data, making sure to pull each patient’s height in inches, weight in pounds, and whether or not that patient ended up getting diabetes a year after being measured. Two years later, a new measurement process requires nurses to document height in centimeters and weight in kilograms and the model begins making wildy inaccurate predictions because of it. For example, a patient who is 6 feet tall used to have height documented at 72 inches, but now has height documented at 183 centimeters. This patient weighs 200 pounds, which is now documented as 91 kilograms. The model does not know a conversion needs to happen in order to account for the change in units. It is expecting to be supplied the features in the units in which it was trained, so it predicts as if the person is 183 inches (over 15 feet) tall, and 91 pounds. No wonder the prediction makes no sense!
Example #2: Concept Drift
A risk of readmission model is built for a hospital system by their team of data scientists. Three years post go-live, their system acquires four large hospitals in the neighboring state. These hospitals have a diversely different patient demographic, one that is significantly dissimilar to the original population the model was trained on. When the model is rolled out to the new hospitals, providers notice it is making many false positive and false negative predictions. The model should be retrained to include data from these new hospitals.
How to Detect and Fix Model Drift
Model drift can occur gradually, with performance degrading slowly over a long period of time, or it can happen quickly, with performance dropping off suddenly and obviously. This variable nature can make it difficult to prepare for and even harder to detect without the right tools.
Image by author
Monitoring performance in production regularly is the best way to detect model drift.
If you’re not monitoring your model in production, you won’t notice drift until stakeholders do.
A quick dashboard or notebook that can be run every couple of weeks can be a simple way to visualize model performance and catch any deterioration over time. Simply plot precision, recall, AUC, MAE, MSE, or any other appropriate performance metrics for your model on the y-axis, and the date on the x-axis. What you should expect is slight variation week to week, but large deviations from average signal something has changed, and drift could be occurring. A feature missingness and feature distribution plot can also help you do a deep dive into the individual predictors, helping you determine the cause of the drift. This could look like the count of NA or NULL values per feature over time, or the average value per feauture over time.
I actually caught model drift in one of my models using the above method. I noticed a drop off in precision in my Difficult IV Access model. After a few weeks of consistently lower-than-expected precision values, I became suspicious. My manager suggested looking into feature missingness as a potential cause. Lo and behold, the third-most important feature, history of malnutrition, had a huge uptick in NULL values the very same week my model’s performance began to deteriorate. We discovered the SQL driving the creation of the feature in production had had some adjustments made, and a join was not behaving as intended. We updated the SQL and precision returned to normal levels from that day on.
Image by Sayyam Abbasi on Unsplash
This brings me to my final point: how to fix model drift. There are several ways to fix drift, each one appropriate in different scenarios. As you saw above, one way to fix drift is to repair the inputs/data to the same format it existed in for model training. This is the simplest, quickest way to fix drift, and should be the default if possible. This can be done anywhere in the data load process, from the database ETL, to the downstream notebook code where preditions are made. If height is recorded in centimeters, and your model is expecting it to be in inches, a conversion can be made prior to predictions.
Sometimes, though, the data cannot be changed. Perhaps data governance has defined a data point more formally, and now units are standardized, and those units are different than those your model was trained on. Or, a workflow prevents data from being loaded in the same format. Another solution, though it requires slightly more effort, is to retrain the model. Retraining the model on new data allows it to re-learn the relationship between the variables, establishing a model that performs reliably on the new data it is being supplied. Changes in the population almost always require model retraining.
Wrapping Up
Model drift can sneak up on any unsuspecting data scientist. Let it go on long enough and it can destroy performance and user trust. But, it isn’t something to fear. With the right tools, detecting drift is possible, and fixing it is attainable. Being able to recognize when model drift is occurring, and having the know-how to identify the cause and determine the fix is what separates the data scientists that are just happy to get a model into production, from those who know how to build a model that can have a lasting impact.

