Cause and effect relationships. Sounds simple enough, right? If you water a plant, it grows. In digital marketing we can leverage the power of statistics to help us understand cause and effect relationships through data. Quantitative analysis can help us reach conclusions about marketing performance or A/B tests. But we must be careful not to fall into the trap: don’t confuse correlation with causation.
So what’s the difference and why should you care? Let’s get a few definitions out of the way. In your A/B tests you’ll have an independent variable – usually unique visitors, and a dependent variable – or the thing you’re trying to improve, like clicks on a button. Your dependent variable is impacted, or depends, on the independent variable – you won’t get any clicks on the button without unique visitors.
Correlation is the predictable movement of an dependent variable with a corresponding independent variable. If it takes 100 unique visitors to get a click on that button, and with every set of 100 unique visitors, you’re able to generate exactly one click, we’d say there’s a high correlation, or prediction that with the next set of 100 unique visitors, we’ll get a click on that button. Correlations can be negative as well, but still predictive – meaning when the dependent variable increases at a given increment, there is a corresponding and predictable decrease in the dependent variable.
So this sounds cool, right. I do x, I get y. But correlation doesn’t imply causation – meaning often times things look related because they’re correlated, but one doesn’t necessarily cause the other. Take the example below. The dependent variable is life expectancy. The independent variable seems to predict life expectancy pretty well. Maybe it’s hospitals per capita? Doctors per capita? Nope, it’s people per television.
Makes sense. As countries become more developed and wealthier, people can afford TVs more easily, so the number of people per TV goes down. But are more TVs actually causing people to live longer? Of course not. What also comes with development and wealth is better healthcare and medicine. So, we can see that while people per TV and life expectancy are correlated, this is not a cause and effect relationship – more TV’s don’t increase life expectancy.
Here’s a similar look at life expectancy with people per physician as the independent variable. The R-square value is actually lower here than in TVs, indicating less of the variability in the dependent variable can be explained by this independent variable.
Crazy, right? Based on this dataset, life expectancy is better predicted by people per TV than people per physician. But again, this doesn’t mean the relationship is causal. TVs are not responsible for longer lifespans.
So during your analysis – always ask yourself – is this relationship truly causal?
Data source: Televisions, Physicians, and Life Expectancy.