So What Is a p-Value, and Why Is It Judging Me?
And, for that matter, what's a 95% confidence interval?
Let’s say you’re scrolling through a news story about a brand-new health treatment or educational trick, and buried in the text is a sentence like: “The results were statistically significant, p < 0.05.” For many readers, that’s where the brain quietly exits the chat. But don’t worry—we’re going to unpack what that p-value actually means, why it’s not the boss of you, and what confidence intervals have to do with it.
Let’s start with a fictional example. Imagine a school trying a new study technique to boost student test scores. They split students into two groups: one sticks with the usual methods, and the other tries this shiny new approach. At the end of the term, the average score in the traditional group is 75, while the new-method group pulls in an 82. That’s a 7-point gap, and everyone’s wondering—was it the method, or just dumb luck?
That’s where the p-value comes in. It tells us how likely it is to see a result this different, or even more different, if, in reality, the new method doesn’t do anything special. A p-value of 0.03 says there’s a 3% chance this 7-point difference happened randomly. Since 3% is less than the usual 5% cut-off researchers like, they’d consider this “statistically significant.”
Sounds great, right? Maybe. Maybe not.
See, the p-value is like a door bouncer at the science club. It decides if a finding gets into the VIP section of “important results.” But it doesn’t tell us how strong the finding is, how big the effect is, or whether it actually matters in the real world. It just tells us whether we can reject the idea that nothing’s going on.
But What’s a Confidence Interval, and Why Is It Louder Than the P-Value?
Now imagine that, along with our 7-point bump, researchers report a 95% confidence interval from 2 to 12. That’s a range where the real effect probably lies. It says: “Hey, if we repeated this study 100 times, 95 of those times the real average difference would land somewhere between 2 and 12 points.”
Importantly, that “95% confident” phrase is about the method, not the specific interval. It doesn’t mean there’s a 95% chance the real effect is in this one particular range—it means the process used to calculate it is reliable, like a well-calibrated measuring tape.
And this is where confidence intervals outshine p-values: they show the possible size of the effect. A narrow interval means we’ve got a pretty precise estimate. A wide one means there’s more uncertainty. An interval of 6 to 8 points tells a very different story than one stretching from 1 to 13—even if both are “statistically significant.”
Think of confidence intervals as the “how much” and “how sure are we” tools in your stats toolbox. They let you picture not just whether something works, but how well it might work. And if you’re a doctor, a policymaker, or just someone trying to decide whether to try that new study technique, that’s the info you want.
Let’s Put Them Together and See What Happens
Here’s how these two play together: when a 95% confidence interval doesn’t include zero (or any value meaning “no effect”), then the p-value will be less than 0.05. If the interval does include zero, the p-value will be higher than 0.05. They’re doing a statistical duet—maybe not Grammy-winning, but at least in harmony.
Take a fictional example from medical research. A study finds a new medication reduces hospital stays by two days on average. The confidence interval runs from 0.5 to 3.5 days, and the p-value is 0.02. The p-value tells us this result probably isn’t just luck, and the confidence interval tells us what kind of improvement to expect (somewhere between barely noticeable and genuinely helpful).
And while we’re at it, let’s be honest: the smaller the study, the more chaotic these numbers can be. A bigger study usually means tighter confidence intervals and smaller p-values, because there’s more data to go on. Fewer wobbles, clearer answers.
When One Number Isn’t Enough
If you only look at p-values, you might end up making big decisions based on tiny effects. If you only look at confidence intervals, you might miss whether those effects are actually reliable. That’s why good research reports both.
Think of it like dating: the p-value is the first impression (Are they even interesting?), while the confidence interval is the deeper conversation (Okay, but are they worth it?). You need both.
Regulators, like the FDA, often want quick yes-or-no answers, which is where p-values shine. But a physician trying to decide which drug to recommend will care more about the size of the benefit and how certain that estimate is. That’s where confidence intervals earn their keep.
The same goes for educators and business analysts. If you’re deciding whether to roll out a new curriculum or revamp your entire website, knowing that “it probably works” isn’t enough. You want to know whether it’s a small bump or a game-changer—and how confident you can be in those numbers.
So What Should You Do With All This?
The next time you read a study—whether it’s about vaccines, school reforms, or miracle vegetables—don’t stop at the p-value. Look for that confidence interval. Ask: How big is the effect? How sure are they about it? Could the benefit be tiny, or even nonexistent?
Science is a flashlight, not a crystal ball. P-values and confidence intervals don’t give us perfect answers. But together, they shine a brighter light on the questions that matter.
Especially in public health, where lives, policies, and lots of funding ride on our best guesses, we owe it to ourselves to understand the tools behind those guesses.