Introduction to Chi-Square Tests
Goodness of Fit Test
Statistical tests are formulated in terms of null hypothesis $H_0$ and alternative hypotheses, $H_1$. For a $\chi^2-$goodness of fit test the null hypothesis is the statement that the model is appropriate. The alternative hypothesis is the statement that the model is not appropriate. The $\chi^2-$ value defined next is computed from the data and is used to decide whether to reject the null hypothesis and discard the model.
Formula:
Chi-Square Goodness of Fit Test
The degrees of freedom for this test is $df=k-m-1$ where $k$ is the number of categories (cells) and $m$ is the number of parameters estimated in the model.
Example 1
A die is rolled 60 times and the following frequencies are obtained: $$\begin{array}{|c|c|c|c|c|c|c|} \hline \text{Face} & 1 & 2 & 3 & 4 & 5 & 6 \\ \hline \text{Frequency} & 8 & 10 & 12 & 9 & 11 & 10 \\ \hline \end{array}$$ Test the hypothesis that the die is fair at $10\%$ level of significance.
If the die is fair, then each outcome should occur with probability $\frac{1}{6}$. Since the die is rolled $60$ times, the expected frequencies are $60\cdot \frac{1}{6}=10$ for each face. The test statistic is $\chi^2 = \frac{(8-10)^2}{10} + \frac{(10-10)^2}{10} + \frac{(12-10)^2}{10} + \frac{(9-10)^2}{10} + \frac{(11-10)^2}{10} + \frac{(10-10)^2}{10} = 1$. The degrees of freedom is $6-1=5$. The $P-$value is $P(\chi^2 > 2.4) = 0.66$. Since $0.66 > 0.05$, we fail to reject the null hypothesis and conclude that the die is fair.
Solution
Example 2
A candy company produces bags with candies in four colors: red, green, blue, and yellow. They claim each color appears equally often. To verify this, a consumer group randomly selects $200$ candies and observes: $$\begin{array}{|c|c|c|c|c|} \hline \text{Color} & R & G & B & Y \\ \hline \text{Frequency} & 55 & 45 & 50 & 50 \\ \hline \end{array}$$
At a $5\%$ significance level, test whether there is any evidence that the distribution of candy colors differs from the company's claim.
The expected frequencies are $50$ for each color. The test statistic is $\chi^2 = \frac{(55-50)^2}{50} + \frac{(45-50)^2}{50} + \frac{(50-50)^2}{50} + \frac{(50-50)^2}{50} = 1.4$. The degrees of freedom is $4-1=3$. The $P-$value is $P(\chi^2 > 1.4) = 0.71$. Since $0.71 > 0.05$, we fail to reject the null hypothesis and conclude that the distribution of candy colors does not differ from the company's claim.
Solution
Long Example 1
A teacher suspects a die may be biased and asks students to roll it $120$ times. The observed outcomes are: $$\begin{array}{|c|c|c|c|c|c|c|} \hline \text{Face} & 1 & 2 & 3 & 4 & 5 & 6 \\ \hline \text{Frequency} & 15 & 25 & 20 & 18 & 22 & 20 \\ \hline \end{array}$$
Assuming the die is fair, each outcome should have an equal probability. Conduct a chi-square goodness-of-fit test at a $5\%$ significance level.
Long Example 2
A pet store surveys 100 customers to find out which type of pet they prefer. The store expects preferences to be evenly distributed across four categories: cats, dogs, fish, and birds. However, the survey results are as follows: $$ \begin{array}{|c|c|c|c|c|} \hline \text{Pet} & \text{Cats} & \text{Dogs} & \text{Fish} & \text{Birds} \\ \hline \text{Frequency} & 30 & 20 & 25 & 25 \\ \hline \end{array} $$
At a $5\%$ significance level, test whether there is any evidence that the distribution of pet preferences differs from the store's claim.