Two fundamental ways to look at the relationship between two (or more) variables:
Correlation
Two variables have co-movement. If we know the value of one, we know something about the value of the other one.
Causation
There is a “cause-effect” link between the two and, as a result, they display co-movement.
Both are useful, but for different purposes
Causation implies correlation but not the other way around
It is vital to keep this distinction in mind for meaningful and credible analysis
Sign correlation? Causal link?
Take a guess (2mins)…
[Source]
Positive or negative correlation? Causal link?
Take a guess (2mins)…
Positive or negative correlation? Causal link?
Take a guess (2mins)…
Essentially when the core interest is to find out if something causes something else
Exploratory analysis
Distracting if not enough knowledge about the dataset
Predictive settings
Interest not in understanding the underlying mechanisms but want to obtain best possible estimates of a variable you do not have by combining others you do have (e.g. Kriging)
Causation implies Correlation
Correlation does not imply Causation
Why?
There is a causal link between the two variables but it either runs the oposite direction as we think, or runs in both
E.g. Education and income
Two variables are correlated because they are both determined by other, unobserved, variables (factors) that confound the effect
E.g. Ice cream and cold beverages consumption
Is there any way to overcome reverse causality and confounding factors to recover causal effects?
The key is to get an “exogenous source of variation”
Randomized Control Trials
Treated Vs control groups. Probability of treatment is independent of everything else
Quasi-natural experiments
Like a RCT, but that just “happen to occur naturally” (natural dissasters, exogenous law changes…)
Econometric techniques
For the interested reader: space-time regression, instrumental variables, propensity score matching, differences-in-differences, regression discontinuity…
Establishing causality is much harder than identifying correlation, but sometimes it’s needed to move forward!
Correlation precludes causation and, in some cases, it is all that is needed.
It is important to always draw conclusions based on analysis, know what the data can and cannot tell, and stay honest.
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" property="dct:title">Geographic Data Science'18</span> by <a xmlns:cc="http://creativecommons.org/ns#" href="http://darribas.org" property="cc:attributionName" rel="cc:attributionURL">Dani Arribas-Bel</a> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.