One of the underlying mathematical principles in biostatistics (and statistics in general) is that you’re comparing what you observed in a trial/test/study to what you would expect “if all things were equal.” They answer, “What is the probability that what I’m seeing is due to chance?”
In previous posts (and some future ones), I explained how we use different distributions based on the type of data we’re analyzing. But I also explained that the Central Limit Theorem kicks in when we’re looking at samples that are big enough.
And that is exactly what you’ll see in this example. We have 965 students in a school (our sample), and 90 of them are sick with a vaccine-preventable disease. Of those 90, 45 are vaccinated and 45 are unvaccinated. In the whole school, 845 students are vaccinated, and 120 are not.
Using this information, we will perform three main tests of the data to show that the observed distribution of disease is associated with the vaccination status of the students:
We will calculate the Chi Square statistic and use the Chi Square distribution to see the probability of the statistic we get. If that probability is less than 5% (0.05), then we can say that the distribution of illness by vaccination status is statistically significant… There is an association between being vaccinated and getting sick.
We will calculate the attack rates, relative risk, and odds ratio, and test to see if the probability (and odds) of illness given vaccination status are the same or could be the same.
We will calculate the difference in proportions and test the hypothesis that the two proportions are different.
Throughout, pay attention to the formulas. See what is similar and different between the formulas, and how we have to account for the fact that samples have natural variability to them (through the standard error).
The PDF of the notes you see me write in the video can be downloaded by clicking here. It helps you see the whole thing when the video cuts off what I’m writing.
Share this post