The Good, the Bad and the Misleading Predictive Models

Noah Sultan, PhD
4 min readApr 9, 2024

--

Michael Scott’s famous Quote from The Office

Michael Scott, regional manager at Dunder Mifflin, asked his analytics guy, Oscar, to build a predictive model to improve Ad performance.

The Company’s Ad strategy up to that moment was “spray and pray”. It is a method developed by Michael in which the company spread the Ad as much as possible and pray that it will resonate with a portion of the audience. Michael believes that “you miss 100% of the shots you don’t take”.

Now that corporate wants to reduce Ad costs, the strategy needs to be more effective by choosing clients and Ads that have higher potential of click through rate (CTR).

Oscar decided to build a Machine Learning (ML) model that predicts whether a client will buy a product or not if they see an Ad. The model is trained on previous campaigns knowing which profiles did click after seeing an Ad. Through training, the model learns which profiles tend to click and which Ads are more effective than others. This is called supervised learning.

Supervised vs. Unsupervised learning

Supervised learning is when we know exactly what we want the model to learn and we train it using examples. For instance, a model that classifies emails to spam and no-spam is a supervised classification model, it is trained on examples of spams and no-spams. In the same manner, a model that predicts if a client will click or not is also a supervised classification model, it is trained with examples of clients that clicked and those who did not click. Unsupervised learning is when we let the algorithm search for patterns or clusters in the data without giving it any examples.

Oscar built the predictive model and told Michael “we have a model that predicts with Accuracy 99% if a visitor will click or not after seeing an Ad!” Michael with his wit realized that this is not that good of accuracy. Most of the time, 99% of the people that see an Ad of the company do not click on it. So, if we built the most basic predictive model that always predicts ‘No-click’ without considering the profile of the visitor the accuracy will be 99%.

The correct way to measure the model performance should rather be based on precision and recall :

  • Precision measures when the model predicted someone to click, how often was it right?
  • Recall measures out of all actual clickers, how many did the model correctly identify?

Implementing a bad model will hurt the company more than benefiting it (if there was any benefit). A model that has high accuracy but very low recall is one of the worst possibilities that could happen because it will not show the Ad to most of the few visitors that would have actually clicked. Michael has learnt about precision and recall from a Data Scientist while attending Scranton ML conference.

The Data Scientist

One of the best definitions of a data scientist is “someone who knows statistics more than any software engineer and software engineering more than most statisticians”. Compared to other data roles, statistical knowledge is the most defining characteristic of a data scientist. This statistical knowledge is also what allows any data practitioner to select the right metric and the right model for each scenario.

Using the right metric: accuracy, precision and recall

In the example of Ad click, 99% of people do not click on Ads of the company. Using accuracy is very misleading although it might seem intuitive. A good model is one that has high recall and precision scores. There is always a trade-off between the two metrics. If we want to predict most actual clickers (high recall), we are forced to have lower precision and vice versa. Based on context, we might value one metric over the other, but it is important to have both high precision and high recall.

Using the right model: Regression vs. Classification

Having this precision and recall trade-off in mind, we might prefer to use a regression model instead of classification. The difference between classification and regression is in the output format. In classification, we predict a discrete value (click, no-click, etc.). While in regression, we predict a continuous value (ex. the probability of the client clicking when they see an Ad). Instead of predicting a click or no-click, the model could predict the probability of a click. In this case, the business side will have more freedom to select the threshold at which the Ad will be shown (for instance, show the Ad when the probability of click is > 0.8).

Outro

For organizations that want to mix in some data approaches, it is very important to avoid these mistakes which seem intuitive but cost much. It is better not to build a model rather than building a bad or a misleading predictive model. It is necessary to identify the data profile which will best suit the needs of the company. The skills needed to automate data pipelines are different than the ones needed to drive business values from your data. Going back to Michael Scott, he decided to hire a data scientist, a guy called Jim. Jim built a classification model with both high recall and high precision resulting in better sales for Dunder Mifflin. Michael rewarded him with a dundie award, “fine work award”.

--

--

Noah Sultan, PhD
Noah Sultan, PhD

Written by Noah Sultan, PhD

LinkedIn Top Data Voice | Data Scientist | Creating AI apps, 1 per weekend | PhD in Machine Learning | 📍 Paris | linkedin.com/in/eisultan

No responses yet