"

7 Estimating The Mean Of A Population

Sampling Distributions and the Central Limit Theorem

When we think about estimating the mean of a population, we might think about how a random sample is similar to the population. If we could sample almost all the population, we would expect the distribution of the sample to be very much like the distribution of the entire population. However, sampling an entire population is not feasible unless it’s a very small population. If the population is small, we don’t need to estimate it. We can simply find the mean of whatever we’re interested in.

For some large population such as crows in the state of Washington in the US, if we wanted to estimate the mean weight for an adult female crow, we might randomly sample 20 and see what the distribution of the weight of those 20 crows are. We could use the mean of the sample as a rough estimate of what the mean of the population is. However, we know that if we randomly selected a different group of 20 adult female crows, we probably wouldn’t get the same mean weight.

Think back to the previous chapter when we looked at the sampling simulation (you might want to go back and try that interactive again). The sample means varied from sample to sample, and when we created a histogram of the sample means, we saw that the outline of that histogram looked more and more Normal shaped as we kept including more sample means. We also noticed that this effect was more pronounced as the sample size grew larger. The larger the sample size, the fewer means we needed to make the histogram of the means appear Normal shaped. This is a fundamental result in statistics. In fact, there’s a name for this: The Central Limit Theorem.

The Central Limit Theorem

The Central Limit Theorem states that, as the sample size increases, the histogram of the sampling distribution approaches a Normal model with a mean the same as that of the population, and a standard deviation equal to σ/√n. This standard deviation of the sampling distribution is usually referred to as the standard error of the mean.

Note that the mean of the sampling distribution is the same as the mean of the population from which we sampled. However, the standard deviation of the sampling distribution is not the same as the population from which we sampled, but it’s related by the expression σ/√n.

If we think about the expression σ/√n, we realize that the standard deviation of the sampling distribution of the mean (the standard error of the mean) gets smaller as the square root of n gets larger. This is what we saw in the sampling simulation at the end of the previous chapter. As the sample size got larger, the sampling distribution had less spread. So, the Normal model for the sampling distribution (the outline of the sampling distribution) has a smaller standard deviation.

This is a very powerful and useful theorem because it let’s us use the properties of the Normal model to get answers about the sampling distribution in the same manner as we used it for the distribution of the population from which we sampled.