You remember the old jingle, "step on a crack, break your momma's back." But clearly you stepping on a crack doesn't cause your mother's back to break. It's just a simple example of correlation without causation. saulgranda/Getty Images
You would think by now that we could say unequivocally what causes what. But the question of causation vs. correlation, which has haunted science and philosophy from their earliest days, still dogs our heels for numerous reasons.
Humans are evolutionarily predisposed to see patterns, and psychologically inclined to gather information that supports preexisting views, a trait known as confirmation bias. We confuse coincidence with correlation, and correlation with causality.
What's the Difference Between Causation and Correlation?
The difference between causation and correlation is that in a causal relationship, one event is directly responsible for another, while in a correlation, two events exist simultaneously, but their relationship may be due to a third variable.
It's incorrect to say that correlation implies causation. For A to cause B, we tend to say that, at a minimum:
Taken alone, however, these three requirements cannot prove cause; they are, as philosophers say, necessary but not sufficient. In any case, not everyone agrees with them.
Speaking of philosophers, David Hume argued that causation doesn't exist in any provable sense [source: Cook]. Karl Popper and the Falsificationists maintained that we cannot prove a relationship, only disprove it, which explains why statistical analyses do not try to prove a correlation; instead, they pull a double negative and disprove that the data are uncorrelated, a process known as rejecting the null hypothesis [source: McLeod].
With such considerations in mind, scientists must carefully design and control their experiments to weed out bias, circular reasoning, self-fulfilling prophecies and confounding variables. They must respect the requirements and limitations of the methods used, draw from representative samples where possible and not overstate their results.
Instead of undertaking the difficult (and maybe impossible) task of establishing causality, most scientific research focuses on the strength of correlations. Correlations can be positive or negative, weak or strong. The statistical correlation coefficient, which ranges from -1 to 1, shows the strength and direction of the correlation.
If you plot data points on a graph where one variable occupies the X-axis and another occupies the Y-axis, the variables correlate if they have a linear relationship.
10 Examples of Correlation and Causation
Because the human brain tends to seek out causal relationships, scientists are extra careful about creating highly controlled experiments — but they still make mistakes. Here are ten examples illustrating how hard it is to identify causation.
Researchers investigating worker productivity on the factory floor in the early 20th century discovered the Hawthorne effect, or the idea that participant knowledge of an experiment can influence its results.
Baker Library Historical CollectionPeople are a pain to research. They react not only to the stimulus being studied, but also to the experiment itself. Researchers today try to design experiments to control for such factors, but that wasn't always the case.
Take the Hawthorne Works in Cicero, Illinois. In a series of experiments from 1924 to 1932, researchers studied worker productivity effects associated with altering the Illinois factory's environment, including changing light levels, tidying up the place and moving workstations around.
Just when they thought they were on to something, they noticed a problem: The observed increases in productivity dropped almost as soon as the researchers left the works, indicating that the workers' knowledge of the experiment — not the researchers' changes — had fueled the boost. Researchers still call this phenomenon the Hawthorne Effect [source: Obrenović].
A related concept, the John Henry effect, occurs when members of a control group try to beat the experimental group by kicking their efforts into overdrive. They need not know about the experiment; they need only see one group receive new tools or additional instruction. Like the steel-driving man of legend, they want to prove their capabilities and earn respect [sources: Saretsky; Vogt].
If the pill lands on black 26 times in a row on the roulette wheel, would you be more likely to bet on red or black on that 27th turn?
Image Source/Getty ImagesThe titular characters of Tom Stoppard's film "Rosencrantz and Guildenstern Are Dead" begin the film baffled and finally frightened as each of 157 consecutive flips of a coin comes up heads. Guildenstern's explanations of this phenomenon range from time loops to "a spectacular vindication of the principle that each individual coin, spun individually, is as likely to come down heads as tails . "
Evolution wired humans to see patterns, and our ability to properly process that urge seems to short-circuit the longer we spend gambling. We can rationally accept that independent events like coin flips keep the same odds no matter how many times you perform them.
But we also view those events, less rationally, as streaks, making false mental correlations between randomized events. Viewing the past as prelude, we keep thinking the next flip ought to be tails.
Statisticians call this the gambler's fallacy, aka the Monte Carlo fallacy, after a particularly illustrative example that occurred in that famed Monaco resort town.
During the summer of 1913, bettors watched in increasing amazement as a casino's roulette wheel landed on black 26 times in a row. Inflamed by the certainty that red was "due," the punters kept plunking down their chips. The casino made a mint [sources: Lehrer; Oppenheimer and Monin; Vogt].
Superstitions take all forms in sports. Here we see Boston Bruins defenseman Zdeno Chara kissing the back of his helmet for good luck during Game 7 of the Stanley Cup Finals against the St. Louis Blues June 12, 2019, at TD Garden in Boston. Chara's luck wore out, though, and the Blues beat the Bruins 4-1 to win the Stanley Cup that night.
Michael Tureski/Icon Sportswire via Getty ImagesNo discussion of streaks, magical thinking or false causation would be complete without a flip through the sports pages. Stellar sports seasons arise from such a mysterious interplay of factors — natural ability, training, confidence, the occasional X factor — that we imagine patterns in performance, even though studies repeatedly reject streak shooting and "successful" superstitions as anything more than imaginary.
The belief in streaks or slumps implies that success "causes" success and failure "causes" failure or, perhaps more reasonably, that variation in some common factor, such as confidence, causes both. But study after study fails to bear this out [source: Gilovich, et al].
The same holds true for superstitions, although that never stopped retired NBA player and Dallas Mavericks guard Jason Terry from sleeping in the opposing teams' game shorts before each game, or NHL center and retired Ottawa Senators player Bruce Gardiner from dunking his hockey stick in the toilet to break the occasional slump [source: Exact Sports].
The sophomore slump, too, typically arises from a too-good first year. Performance swings tend to even out in the long run, a phenomenon statisticians call regression toward the mean [source: Barnett, et al]. In sports, this averaging out is aided by the opposition, which adjusts to counter the new player's successful skill set.
The story of hormone replacement therapy, once widely used to treat symptoms of menopause, turned out not to be so straightforward after all.
BSIP/Universal Images Group/Getty ImagesRandomized controlled trials are the gold standard in statistics, but sometimes — in epidemiology, for example — ethical and practical considerations force researchers to analyze available cases.
Unfortunately, such observational studies risk bias, hidden variables and, worst of all, study groups that might not accurately reflect the population. Studying a representative sample is vital; it allows researchers to apply results to people outside of the study, like the rest of us.
A case in point: hormone replacement therapy (HRT) for women. Beyond treating symptoms associated with menopause, it was once hailed for potentially reducing coronary heart disease (CHD) risk, thanks to a much-ballyhooed 1991 observational study [source: Stampfer and Colditz].
But later randomized controlled studies, including the large-scale Women's Health Initiative, revealed either a negative relationship, or a statistically insignificant one, between HRT and CHD [source: Lawlor, et al.].
Why the difference? For one thing, women who use HRT tend to come from higher socioeconomic strata and receive better quality of diet and exercise — a hidden explanatory relationship for which the observational study failed to fully account [source: Lawlor, et al].
You can follow the NFL and you can follow the stock market. But using the 16 original NFL teams' winning streak to pick your stocks probably isn't a winning strategy.
Alistair Berg/Getty ImagesIn 1978, sports reporter and columnist Leonard Koppett mocked the causation-correlation confusion by wryly suggesting that Super Bowl outcomes could predict the stock market. It backfired: Not only did people believe him, but it worked — with frightful frequency.
The proposal, now commonly known as the Super Bowl Indicator, went as follows: If one of the 16 original National Football League teams — those in existence before the NFL's 1966 merger with the American Football League — won the Super Bowl, the stock market would rise throughout the rest of the year. If a former AFL team won, it would go down [source: Bonsal].
From 1967 to 1978, Koppett's system went 12 for 12; up through 1997, it boasted a 95 percent success rate. It stumbled during the dot-com era (1998–2001) and notably in 2008, when the Great Recession hit, despite a win by the New York Giants (NFC). Still, as of 2022, the indicator had a 73 percent success rate [source: Chen].
Some have argued that the pattern exists, driven by belief; it works, they say, because investors believe it does, or because they believe that other investors believe it.
This notion, though clever in a regressive sort of way, hardly explains the 12 years of successful correlations predating Koppett's article. Others argue that a more relevant pattern lies in the stock market's large-scale upward trend, barring some short-term major and minor fluctuations [source: Johnson].
Given enough data, patience and methodological leeway, correlations are almost inevitable. That's how big data works.
Weiquan Lin/Getty ImagesBig data — the process of looking for patterns in data sets so large they resist traditional methods of analysis — rates big buzz in the boardroom [source: Arthur]. But is bigger always better?
It's a rule that's drummed into most researchers in their first stats class: When encountering a sea of data, resist the urge to go on a fishing expedition. Given enough data, patience and methodological leeway, correlations are almost inevitable, if unethical and largely useless.
After all, the mere correlation between two variables does not imply causation; nor does it, in many cases, point to much of a relationship.
For one thing, researchers cannot use statistical measures of correlation willy-nilly; each contains certain assumptions and limitations that fishing expeditions too often ignore, to say nothing of the hidden variables, sampling problems and flaws in interpretation that can gum up a poorly designed study.
But big data is increasingly being used and hailed for its invaluable contributions to areas such as creating customized learning programs; wearable devices that provide real-time feed to your electronic health records; and music streaming services that give you targeted recommendations [source: IntelliPaat]. Just don't expect too much out of big data in the causality department.
For every person rallying on Capitol Hill to raise the minimum wage, there's a congressperson on the Hill who disagrees there's a need for that change.
Congressional Quarterly/CQ-Roll Call, Inc via Getty ImagesAny issue dealing with money is bound to be deeply divisive and highly politicized, and minimum wage increases are no exception. The arguments are varied and complex, but essentially one side contends that a higher minimum wage hurts businesses, which drives down job availability, which hurts the poor.
The other side responds that there's little evidence for this claim, and that the 76 million Americans working at or below minimum wage, which some argue is not a living wage, would benefit from such an increase. They argue that the federal minimum wage for covered, nonexempt employees ($7.25 per hour in September 2023) has lowered Americans' purchasing power by more than 20 percent [sources: U.S. Department of Labor; Cooper, et al].
As literary critic George Shaw reportedly quipped, "If all the economists were laid end to end, they'd never reach a conclusion," and the minimum-wage debate seems to bear that out [source: Quote Investigator]. For every analyst who says minimum wage increases drive jobs away, there is another who argues against such a correlation.
In the end, both sides share a fundamental problem: namely, the abundance of anecdotal evidence many of their talking heads rely on for support. Secondhand stories and cherry-picked data make for weak tea in any party, even when presented in pretty bar charts.
The family that eats dinner together stays off drugs together. Um, sounds good, but it's not quite true.
MoMo Productions/Getty ImagesBetween fitness apps, drugs and surgeries, weight loss in the United States is a $78 billion-per-year industry, with millions of Americans bellying up to the weight-loss bar annually [source: Research and Markets]. Not surprisingly, weight loss studies — good, bad or ugly — get a lot of press in the U.S.
Take the popular idea that eating breakfast beats obesity, a sugar-frosted nugget derived from two main studies: One, a 1992 Vanderbilt University randomized controlled study, showed that reversing normal breakfast habits, whether by eating or not eating, correlated with weight loss; the other, a 2002 observational study by the National Weight Control Registry, correlated breakfast-eating with successful weight-losers — which is not the same as correlating it with weight loss [sources: Brown, et al.; Schlundt, et al.; Wyatt, et al.].
Unfortunately, the NWCR study failed to control for other factors — or, indeed, establish any causal connection from its correlation. For example, a person who wants to lose weight might work out more, eat breakfast or go whole-hog protein, but without an experimental design capable of dialing in causal links, such behaviors amount to nothing more than commonly co-occurring characteristics [source: Brown, et al].
A similar problem plagues the numerous studies linking family dinners with a decreased risk of drug addiction for teens. Although attractive for their simple, appealing strategy, these studies frequently fail to control for related factors, such as strong family connections or deep parental involvement in a child's life [source: Miller, et al].
Researchers studying suicide across genders have to be aware that suicidal men and women often use different methods, so the success of their outcomes vary widely.
SONGPHOL THESAKIT/Getty ImagesWe often hear that men, especially young men, are more likely to commit suicide than are women. In truth, such statements partake of empirical generalization — the act of making a broad statement about a common pattern without attempting to explain it — and mask several known and potential confounding factors.
Take, for example, a Youth Risk Behaviors Survey from 2021 found that girls in grades 9-12 attempted suicide almost twice as often as male students (13 percent vs. 7 percent) [source: American Foundation for Suicide Prevention].
How, then, can a higher correlation exist between the opposite sex and suicide? The answer lies in suicide attempts by methodology: While the most common method of suicide for both sexes in 2020 was by firearm (57.9 percent for men and 33.0 percent for women), women were almost equally likely to die by poisoning or suffocation [source: National Institute of Mental Health].
Even if we could dispose of such confounding factors, the fact would remain that maleness, per se, is not a cause. To explain the trend, we need to instead identify factors common to men, or at least suicidal ones.
The same point applies to the comparatively high rates of suicide reported among divorced men. Divorce doesn't cause men to commit suicide; if anything, it's more indicative of an underlying causal relationship with factors such as male role inflexibility, their social networks, the increasing importance of child care and men's desire for control in relationships [source: Scourfield and Evans].
People have been protesting vaccine mandates for decades. But with the outbreak of COVID-19, the divide took on new significance.
Michael Nigro/Pacific Press/LightRocket via Getty ImagesNo correlation/causation list would be complete without discussing parental concerns over vaccination safety. Before the COVID-19 pandemic hit the world in 2020, the main issue was a fear among some parents that the measles, mumps and rubella vaccination was causally linked to autism spectrum disorders. This notion was popularized by celebrities like Jenny McCarthy.
Despite the medical community debunking the 1998 Andrew Wakefield paper that inspired the falsehood, and despite subsequent studies showing no causal link, some parents remain fearful of an autism connection or other vaccine-related dangers [sources: Park; Sifferlin; Szabo].
Then COVID-19 arrived, and to date has killed millions around the globe. Scientists raced to create an effective vaccine and they succeeded; the first U.S. COVID-19 vaccine was available in December 2020 under the FDA's emergency use authorization [source: FDA]. But it also quickly became intertwined with the extreme polarization of U.S. politics and misinformation.
Many parents, especially Republicans, feared the vaccines were unsafe because they were developed so quickly, and because there might be as-yet-unknown long-term side effects. There were also incorrect fears about the vaccine affecting future fertility. Those have now been proven false [source: Kelen and Maragakis].
As of January 2022, just 28 percent of 5- to 11-year-olds had received at least one dose of the vaccine, disappointing many in the medical field [sources: Hamel, Kates]. The number of vaccinated children is growing; by May 2023, 40 percent of 5- to 11-year-olds had received at least done dose [source: CDC].
These are no harmless misunderstandings. Despite debunking a link between autism and childhood vaccines, many parents remain leery of the shots. In 2019, there were 1,282 cases of measles in 31 states, the highest number in the U.S. since 1992. The majority of these cases were among the unvaccinated [source: CDC].
Whether that correspondence is coincidental, correlative or causal is well worth considering. And the effects of the current COVID-19 vaccination hesitation remain to be seen.
Lots More Information