Fooled by Correlation. Why High Correlation is Not Enough to Predict the Future
For most people, a high correlation between two variables is a good enough reason to make a prediction about the future, to take a business decision, or even to draw a scientific conclusion. Recently Nassim Taleb showed, in a youtube video, that correlation measures are not supposed to be used in the presence of nonlinearities, which is the case most of the time.
As an example, when 2 variables are associated only half the time (as shown in next figure), correlation will not be 50% but will show ~90%.
You can double check for yourself by using that simple Python code : -
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 11)
y = np.piecewise(x, [x < 5, x >= 5], [lambda x: x, lambda x: 5])
plt.plot(x,y)
print("R = " + str(np.corrcoef(x,y)[0][1]))
How to avoid falling in the correlation trap?
Visual examination is one fast solution. The below image was created specifically to show the importance of visualisation., and that numerical calculations are not enough.
Nowadays, where are many predictive models that are far from accurate, and many scientific papers that contradict each other, we need to familiarise ourselves with these kinds of biases.
Next time, before using the correlation value to make a decision, visualise your data. Then, decide if you want to change your mind or not.