Introduction to Chi-Square Tests
Goodness of Fit Test
Definition:
Coefficient of Dispersion
The coefficient of determination $CD$ for a sample of $n$ observations is given by $$CD = \frac{s}{\bar{x}}$$ where $s$ is the sample standard deviation and $\bar{x}$ is the sample mean.
Remark
The $CD$ value can be interpreted as follows:
In general, a $CD$ value of less than $0.2$ suggests a normal distribution, while a $CD$ value greater than $0.2$ suggests a non-normal distribution.
Chi-Square Goodness of Fit Test
Statistical tests are formulated in terms of null hypothesis $H_0$ and alternative hypotheses, $H_1$. For a $\chi^2-$goodness of fit test the null hypothesis is the statement that the model is appropriate. The alternative hypothesis is the statement that the model is not appropriate. The $\chi^2-$ value defined next is computed from the data and is used to decide whether to reject the null hypothesis and discard the model.
Formula:
Chi-Square Goodness of Fit Test
The degrees of freedom for this test is $df=k-m-1$ where $k$ is the number of categories (cells) and $m$ is the number of parameters estimated in the model.
Long Example 1
A teacher suspects a die may be biased and asks students to roll it $120$ times. The observed outcomes are: $$\begin{array}{|c|c|c|c|c|c|c|} \hline \text{Face} & 1 & 2 & 3 & 4 & 5 & 6 \\ \hline \text{Frequency} & 15 & 25 & 20 & 18 & 22 & 20 \\ \hline \end{array}$$
Assuming the die is fair, each outcome should have an equal probability. Conduct a chi-square goodness-of-fit test at a $5\%$ significance level.
Long Example 2
A pet store surveys 100 customers to find out which type of pet they prefer. The store expects preferences to be evenly distributed across four categories: cats, dogs, fish, and birds. However, the survey results are as follows: $$ \begin{array}{|c|c|c|c|c|} \hline \text{Pet} & \text{Cats} & \text{Dogs} & \text{Fish} & \text{Birds} \\ \hline \text{Frequency} & 30 & 20 & 25 & 25 \\ \hline \end{array} $$
At a $5\%$ significance level, test whether there is any evidence that the distribution of pet preferences differs from the store's claim.
Example 3
A die is rolled 60 times and the following frequencies are obtained: $$\begin{array}{|c|c|c|c|c|c|c|} \hline \text{Face} & 1 & 2 & 3 & 4 & 5 & 6 \\ \hline \text{Frequency} & 8 & 10 & 12 & 9 & 11 & 10 \\ \hline \end{array}$$
Example 4
A candy company produces bags with candies in four colors: red, green, blue, and yellow. They claim each color appears equally often. To verify this, a consumer group randomly selects $200$ candies and observes: $$\begin{array}{|c|c|c|c|c|} \hline \text{Color} & R & G & B & Y \\ \hline \text{Frequency} & 55 & 45 & 50 & 50 \\ \hline \end{array}$$
Example 5
A car company claims that its car colors are equally distributed amoung five colors: red, black, blue, white, and brown. To verify this, a consumer group randomly selects $100$ cars and observes: $$\begin{array}{|c|c|c|c|c|c|} \hline \text{Color} & Red & Black & Blue & White & Brown \\ \hline \text{Frequency} & 10 & 25 & 20 & 30 & 15\\ \hline \end{array}$$
Example 6
A beverage company claims that customer preferences for its five drink flavors — Cola, Lemon, Orange, Grape, and Mango — are equally likely. A marketing researcher surveys 200 customers and records the following responses:$$\begin{array}{|c|c|c|c|c|c|} \hline \text{Flavor} & Cola & Lemon & Orange & Grape & Mango \\ \hline \text{Frequency} & 60 & 25 & 45 & 30 & 40\\ \hline \end{array}$$
Chi-Square Test of Independence
Formula:
Chi-Square Test for Independence
Remark
Example 1
The following table shows the distribution of bison in Yellowstone National Park by age and location. $$\begin{array}{|c|c|c|c|} \hline \text { Age } & \text { North } & \text { South } & \text { Total } \\ \hline 0-1 & 10 & 20 & 30 \\ \hline 2-3 & 15 & 25 & 40 \\ \hline 4-5 & 20 & 30 & 50 \\ \hline \text { Total } & 45 & 75 & 120 \\ \hline \end{array} $$
Example 2
A public health researcher wants to investigate whether there is an association between age group and usage of a new over-the-counter pain medication. A sample of 150 individuals is surveyed and categorized as follows: $$\begin{array}{|c|c|c|c|} \hline \text { Age Group } & \text { Used Medication } & \text { Did Not Use Medication } & \text { Total } \\ \hline 18-35 & 30 & 10 & 40 \\ \hline 36-55 & 25 & 25 & 50 \\ \hline 56+ & 10 & 50 & 60 \\ \hline \text { Total } & 65 & 85 & 150 \\ \hline \end{array} $$
Example 3
An auto industry analyst wants to know whether car type (SUV, Sedan, or Truck) is associated with a preferred fuel type (Gasoline or Electric). A random sample of 120 car buyers is surveyed, and the results are shown below: $$\begin{array}{|c|c|c|c|} \hline \text { Car Type } & \text { Gasoline } & \text { Electric } & \text { Total } \\ \hline \text { SUV } & 30 & 10 & 40 \\ \hline \text { Sedan } & 25 & 15 & 40 \\ \hline \text { Truck } & 20 & 20 & 40 \\ \hline \text { Total } & 75 & 45 & 120 \\ \hline \end{array} $$