Hypotheses in statistical testing consist of the null hypothesis (H0), representing the default state or no effect, and the alternative hypothesis (H1), representing the condition for which you are trying to find evidence. When sample data is used to decide between these two, a judgment is made based on incomplete information (the sample, not the whole population). Naturally, this means that errors can sometimes occur. There are two specific ways these errors can manifest in statistical testing, known as Type I and Type II errors. Understanding these is primary to interpreting test results correctly.
A Type I error occurs when you reject the null hypothesis (H0) when it is actually true. Think of it as a "false positive" or a false alarm.
Example: Suppose H0 is "The new website design does not increase the conversion rate" and H1 is "The new design does increase the conversion rate". A Type I error would mean concluding the new design is better (rejecting H0) when, in reality, it provides no improvement or is even worse, and the observed difference in your sample was just due to chance. The consequence might be wasting resources implementing an ineffective design.
A Type II error occurs when you fail to reject the null hypothesis (H0) when it is actually false (meaning the alternative hypothesis, H1, is true). This is like a "false negative" or failing to detect an effect that is really there.
Example: Using the same website design scenario (H0: no increase, H1: increase). A Type II error would mean concluding the new design is not better (failing to reject H0) when, in fact, it does improve the conversion rate. The consequence here is a missed opportunity to implement a beneficial change.
When conducting a hypothesis test with a fixed sample size, there's an inherent trade-off between α and β.
This relationship is visualized below:
Decision outcomes in hypothesis testing, illustrating Type I (α) and Type II (β) errors.
Choosing the significance level α often depends on the relative consequences of making each type of error in a specific context. If a Type I error is very costly (e.g., approving a faulty medical device), you might choose a very small α. If a Type II error is more concerning (e.g., missing a potentially effective treatment), you might accept a slightly higher α to increase the test's power (1−β).
In machine learning, this translates directly to model evaluation and feature selection:
Understanding this framework is essential before we proceed to calculate p-values and perform specific tests like t-tests and Chi-squared tests using Python libraries in the upcoming sections. These errors quantify the risks associated with making decisions based on statistical evidence.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with