Okay, we've seen that a point estimate, like the sample mean xˉ, gives us a single number as our best guess for an unknown population parameter, like the population mean μ. For instance, if we calculate the average height from a sample of people to be 175 cm, that's our point estimate for the average height of all people in the population we're interested in.
But how much faith should we put in that single number? If we took a different sample, we'd likely get a slightly different sample mean, maybe 173 cm or 176 cm. A point estimate doesn't tell us anything about this uncertainty or how much the estimate might vary from sample to sample. This is where interval estimation comes in.
Instead of providing just one number, an interval estimate gives us a range of plausible values for the population parameter. This range is called a confidence interval.
A confidence interval is a range calculated from sample data that is likely to contain the true value of an unknown population parameter. It's defined by two numbers: a lower bound and an upper bound. For example, instead of just saying the average height is 175 cm, we might say we are "95% confident that the true average height of the population is between 172 cm and 178 cm."
This interval [172 cm, 178 cm] is the confidence interval. The "95%" part is the confidence level.
The confidence level (commonly 90%, 95%, or 99%) represents how confident we are in the process used to generate the interval. It's crucial to interpret this correctly:
Think of it like trying to toss rings onto a fixed peg (the true population parameter). The confidence level tells you the success rate of your ring-tossing method. If you have a 95% success rate, it means that over many tosses, 95% of your rings will land around the peg. For any single toss (any single calculated interval), the ring either landed on the peg or it didn't.
The true population mean (dashed red line) is fixed but unknown. Each vertical line represents a 95% confidence interval calculated from a different random sample. Most intervals (gray lines) contain the true mean, but occasionally (like sample 16) an interval misses it due to random sampling variation. With a 95% confidence level, we expect about 1 out of 20 intervals to miss the true value purely by chance.
The width of the confidence interval gives us a sense of the precision of our estimate. A narrower interval suggests a more precise estimate, while a wider interval indicates more uncertainty. Three main factors influence the width:
Confidence intervals provide a much richer picture than point estimates alone. They give us a plausible range for the true parameter value and explicitly communicate the level of uncertainty associated with our sample-based estimate. This concept is fundamental for interpreting results not just in statistics, but also when evaluating the reliability of performance metrics for machine learning models trained or tested on sample data.
© 2025 ApX Machine Learning