Cointegration vs Spurious Correlation: Perceive the Distinction for Correct Evaluation | by Egor Howell | Jul, 2023

Why correlation doesn’t equal causation for time collection

Picture by Wance Paleri on Unsplash

In time collection evaluation, it’s invaluable to know if one collection influences one other. For instance, it’s helpful for commodity merchants to know if a rise in commodity A results in a rise in commodity B. Initially, this relationship was measured utilizing linear regression, nonetheless, within the Nineteen Eighties Clive Granger and Paul Newbold confirmed this method yields incorrect outcomes, significantly for non-stationary time collection. Because of this, they conceived the idea of cointegration, which received Granger a Nobel prize. On this put up, I wish to talk about the necessity and software of cointegration and why it is a vital idea Knowledge Scientists ought to perceive.


Earlier than we talk about cointegration, let’s talk about the necessity for it. Traditionally, statisticians and economists used linear regression to find out the connection between totally different time collection. Nonetheless, Granger and Newbold confirmed that this method is wrong and results in one thing referred to as spurious correlation.

A spurious correlation is the place two time collection could look correlated however actually they lack a causal relationship. It’s the basic ‘correlation doesn’t imply causation’ assertion. It’s harmful as even statistical checks could properly say that there’s a casual relationship.


An instance of a spurious relationship is proven within the plots beneath:

Plot generated by writer in Python.

Right here now we have two time collection A(t) and B(t) plotted as a operate of time (left) and plotted towards one another (proper). Discover from the plot on the suitable, that there’s some correlation between the collection as proven by the regression line. Nonetheless, by trying on the left plot, we see this correlation is spurious as a result of B(t) constantly will increase whereas A(t) fluctuates erratically. Moreover, the typical distance between the 2 time collection can be growing…

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button