Practice calculating and interpreting probabilities, including basic probability, conditional probability, and independence. Working through these examples helps solidify understanding. Probability provides the foundation for quantifying uncertainty, which is fundamental in machine learning.Problem 1: Rolling a Fair DieImagine you roll a standard, fair six-sided die once. The sample space, representing all possible outcomes, is $S = {1, 2, 3, 4, 5, 6}$.Let's define two events:Event A: Rolling an even number. $A = {2, 4, 6}$.Event B: Rolling a number greater than 4. $B = {5, 6}$.Calculate the following probabilities:$P(A)$: The probability of rolling an even number.$P(B)$: The probability of rolling a number greater than 4.$P(A \cap B)$: The probability of rolling a number that is both even AND greater than 4.$P(A \cup B)$: The probability of rolling a number that is either even OR greater than 4 (or both).Solution:Calculating $P(A)$: Event A has 3 favorable outcomes ${2, 4, 6}$. The total number of outcomes is 6. $$P(A) = \frac{\text{Number of outcomes in A}}{\text{Total number of outcomes}} = \frac{3}{6} = 0.5$$Calculating $P(B)$: Event B has 2 favorable outcomes ${5, 6}$. The total number of outcomes is 6. $$P(B) = \frac{\text{Number of outcomes in B}}{\text{Total number of outcomes}} = \frac{2}{6} = \frac{1}{3} \approx 0.333$$Calculating $P(A \cap B)$: We need the outcomes that are in both A and B. Looking at the sets $A = {2, 4, 6}$ and $B = {5, 6}$, the only outcome they share is 6. So, the intersection is $A \cap B = {6}$. This event has 1 favorable outcome. $$P(A \cap B) = \frac{\text{Number of outcomes in } A \cap B}{\text{Total number of outcomes}} = \frac{1}{6} \approx 0.167$$Calculating $P(A \cup B)$: We can use the formula for the probability of a union: $P(A \cup B) = P(A) + P(B) - P(A \cap B)$. $$P(A \cup B) = \frac{3}{6} + \frac{2}{6} - \frac{1}{6} = \frac{3 + 2 - 1}{6} = \frac{4}{6} = \frac{2}{3} \approx 0.667$$ Alternatively, we can find the union set $A \cup B = {2, 4, 5, 6}$, which has 4 outcomes. $$P(A \cup B) = \frac{\text{Number of outcomes in } A \cup B}{\text{Total number of outcomes}} = \frac{4}{6} = \frac{2}{3} \approx 0.667$$Problem 2: Drawing Balls from a Bag (Without Replacement)A bag contains 8 balls: 5 are red (R) and 3 are blue (B). You draw two balls from the bag, one after the other, without putting the first ball back in.Calculate the following probabilities:$P(B_2 | R_1)$: The probability that the second ball drawn is blue, given that the first ball drawn was red.$P(R_1 \cap R_2)$: The probability that both the first ball and the second ball are red.Solution:Calculating $P(B_2 | R_1)$: "Given that the first ball drawn was red ($R_1$)" means we assume $R_1$ has already happened. When we go to draw the second ball, there are now only 7 balls left in the bag. Since the first was red, 4 red balls and 3 blue balls remain. The probability of drawing a blue ball as the second ball ($B_2$), given this situation, is: $$P(B_2 | R_1) = \frac{\text{Number of blue balls remaining}}{\text{Total number of balls remaining}} = \frac{3}{7} \approx 0.429$$Calculating $P(R_1 \cap R_2)$: This asks for the probability that the first ball is red AND the second ball is red. We can use the multiplication rule for conditional probability: $P(R_1 \cap R_2) = P(R_1) \times P(R_2 | R_1)$.First, find $P(R_1)$: Initially, there are 5 red balls out of 8 total. $$P(R_1) = \frac{5}{8}$$Next, find $P(R_2 | R_1)$: This is the probability the second is red, given the first was red. If the first was red, there are 7 balls left, and 4 of them are red. $$P(R_2 | R_1) = \frac{4}{7}$$Now, multiply these probabilities: $$P(R_1 \cap R_2) = P(R_1) \times P(R_2 | R_1) = \frac{5}{8} \times \frac{4}{7} = \frac{20}{56} = \frac{5}{14} \approx 0.357$$Problem 3: Spam Filter AnalysisImagine a simple analysis of 100 emails based on whether they were classified as Spam (S) or Not Spam (NS), and whether they contained the word "discount" (D) or not (ND). The results are summarized below:Contains "discount" (D)Does Not Contain "discount" (ND)TotalSpam (S)201030Not Spam (NS)56570Total2575100Using this data, calculate the following:$P(S)$: The overall probability that an email in this dataset is Spam.$P(D)$: The overall probability that an email contains the word "discount".$P(S|D)$: The probability that an email is Spam, given that it contains the word "discount".Are the events "Email is Spam" (S) and "Email contains 'discount'" (D) independent in this dataset? Explain why or why not.Solution:Calculating $P(S)$: From the table, 30 out of 100 emails are Spam. $$P(S) = \frac{\text{Total Spam Emails}}{\text{Total Emails}} = \frac{30}{100} = 0.3$$Calculating $P(D)$: From the table, 25 out of 100 emails contain the word "discount". $$P(D) = \frac{\text{Total Emails with 'discount'}}{\text{Total Emails}} = \frac{25}{100} = 0.25$$Calculating $P(S|D)$: This is the probability of an email being Spam given that it contains "discount". We focus only on the column where emails contain "discount" (total 25 emails). Within that group, 20 are Spam. $$P(S|D) = \frac{\text{Number of Spam emails with 'discount'}}{\text{Total emails with 'discount'}} = \frac{20}{25} = 0.8$$ Alternatively, using the formula $P(S|D) = P(S \cap D) / P(D)$: $P(S \cap D)$ is the probability of an email being both Spam AND containing "discount", which is $20/100 = 0.2$. $$P(S|D) = \frac{0.2}{0.25} = \frac{20}{25} = 0.8$$Checking for Independence: Two events S and D are independent if $P(S|D) = P(S)$.We calculated $P(S|D) = 0.8$.We calculated $P(S) = 0.3$. Since $0.8 \neq 0.3$, the events S (Email is Spam) and D (Email contains 'discount') are not independent in this dataset. Knowing that an email contains "discount" significantly increases the probability that it is Spam (from 30% up to 80%). This dependence is exactly what spam filters try to learn and exploit.These exercises cover calculating simple probabilities, applying the union rule, understanding conditional probability through sequential events (drawing balls) and contingency tables (email analysis), and testing for independence. Being comfortable with these calculations is a necessary step before moving on to more complex probabilistic models used in machine learning.