Introduction to Hypothesis Testing

Many experiments require that we decide between which of two competing claims or statements about a parameter is true.

To decide which one is correct, a hypothesis test is carried out to ascertain whether or not there is enough statistical evidence in favour of a certain belief or hypothesis about a population parameter.

A hypothesis test consists of: collecting data from a sample, evaluating the data, and deciding if there is sufficient evidence in the data to reject the null hypothesis.

Definition:

Null Hypothesis ($H_0$)

The null hypothesis, denoted as $H_0$ is the commonly accepted fact, historical value, or the claimed value of a parameter.

Definition:

Alternate Hypothesis ($H_1$)

The alternate hypothesis, denoted as $H_1$, is the opposite of the null hypothesis. It is used to discredit the null hypothesis.

Statistical tests operate in a manner similar to US criminal trials; a person is presumed innocent until proven guilty. In a statistical test, the null hypothesis, $H_0$, is presumed to be true until otherwise shown to be false with statistical evidence. In situations like these, the alternate hypothesis, $H_1$, will be favoured over the null hypothesis.

The statistical evidence needed to discredit the null hypothesis are: a test statistic and its sampling distribution.

Definition:

Test Statistic

A test statistic is a value calculated from the sample data that is used to determine whether or not the null hypothesis should be rejected.

Depending on the parameter that is being tested, the test statistic and the test-distribution will be different. The table below shows the test statistic and distribution that needs to be consulted for different parameters when considering single populations.

The table below shows the test statistic and distribution that needs to be consulted for different parameters when considering single populations. $$\begin{array}{lll} \text { Parameter } & \text { Test Statistic } & \text { Distribution } \\ \hline \text { Population mean, } \mu(\sigma \text { known }) & Z=\frac{\bar{x}-k}{\sigma / \sqrt{n}} & \text { Normal ( } Z \text {-table }) \\ \text { Population mean, } \mu(\sigma \text { unknown }) & T=\frac{\bar{x}-k}{s / \sqrt{n}} & \text { Student's }(t \text {-table }) \\ \text { Population proportion, } p & Z=\frac{\hat{p}-k}{\sqrt{\frac{p(1-p)}{n}}} & \text { Normal ( } Z \text {-table }) \end{array}$$

The test statistic is central to hypothesis testing because it quantifies how far the sample statistic is from the hypothesized parameter, relative to the expected variability. By comparing it to a threshold (critical value or $P$-value), we make a decision about the plausibility of the null hypothesis.

Good statistical practice requires that we stipulate in advance how much evidence against $H_0$ will be required in order to reject it. One way to stipulate a rejection criteria is to set a level of significance.

Definition:

Level of Significance

The level of significance, denoted as $\alpha$, is the probability of rejecting the null hypothesis when it is true.

As with parameter estimation, the level of significance is used to set up a rejection zone, where the test-statistic is unlikely to be found. The rejection zone serves as a visual guide for when to reject the null hypothesis in favour of the alternate one. In general, the level of significance, $\alpha$, is small and the most commonly used ones are: $1 \%, 2 \%$, and $5 \%$

Alternatively a $P-$value can be used to make a decision about the null hypothesis.

Definition:

$P-$value

The $P-$value is the probability of observing a test statistic as extreme as the one computed from the sample data, assuming that the null hypothesis is true.

In other words, the $P$-value evaluates how well the sample data supports the argument that the null hypothesis is true and measures how compatible the sample data is with the null hypothesis.

Thus, a high $P$-value indicates that the sample data is likely with a true null and a low $P$-value indicates that the sample data is unlikely with a true null.

Furthermore, a low $P$-value suggests that the sample provides enough evidence that to reject the null hypothesis for the entire population.

So if the $P$-value is less than the significance level, $\alpha$, then we can reject the null hypothesis in favour of the alternate hypothesis. On the other hand, if the $P$-value is greater than the level of significance, $\alpha$, then we fail to reject the null hypothesis.

Guidelines On How To Conduct A Hypothesis Test

There are a number of population parameters that we can test for (e.g. $\mu, \sigma, p \dots)$, but at the end of the day, all hypothesis tests follow the same structure. Here's how they are carried out in general.

Let $\theta=$ the population parameter of interest
$k=$ the historical value of the parameter.

1. Identify the parameter of interest.

2. Set up the null hypothesis, $H_0$.

The null hypothesis always follows the form: $H_0: \theta=k$

3. Set up the alternate hypothesis, $H_1$.

Depending on the data obtained from the sample or on how the problem is framed, choose only one of the following alternate hypotheses to be $H_1$ $$ \begin{array}{lll} H_1: \theta < k & \Rightarrow & \text { left tail test } \\ H_1: \theta > k & \Rightarrow & \text { right tail test } \\ H_1: \theta \neq k & \Rightarrow & \text { double tail test } \end{array} $$

After choosing the $H_1$, draw the rejection zone if the level of significance, $\alpha$, is given.

4. Using the information obtained from the experiment, calculate the test statistic.

The test-statistic for each population parameter will be calculated differently. For the purpose of this course, the parameters of interest will be the mean when the population variance is known, the mean when the population variance is unknown, and the population proportion. Each will be treated in their own section.

5. If the test statistic is not in the rejection zone (blue), then there is insufficient evidence to reject the null hypothesis.

If the test statistic is in the rejection zone, then there is sufficient evidence to reject the null hypothesis in favour of the alternate hypothesis.

6. Calculate the $P$-value and make a statistical decision.

If the $P$-value $>\alpha \quad \Rightarrow \quad$ Fail to reject $H_0$.

If the $P$-value $<\alpha \quad \Rightarrow \quad$ Reject $H_0$ in favour of $H_1$.

Geometrically, the $P$-value corresponds to the size of the area specified by the test-statistic. So the rejection or non-rejection of the null hypothesis, basically boils down to comparing how large the region specified by the test-statistic is, against the size of the region(s) set by the level of significance

If the area specified by the test-statistic (red) is larger than the one set by the level of significance (blue), then we fail to reject the null hypothesis. Conversely, if the area generated by the test-statistic is smaller than the prescribed level of significance, then we reject the null hypothesis in favour of the alternate.

In principle, we can determine if the null hypothesis will be rejected or not, without drawing a graph. This is done by just comparing the absolute values of the test-statistic against the critical value set by $\alpha$ as follows:

If $|\text{test-statistic}|<|\text{critical value}| \quad \Rightarrow \quad$ Fail to reject $H_0$.

If $|\text{test-statistic}|>|\text{critical value}| \quad \Rightarrow \quad$ Reject $H_0$ in favour of $H_1$.

7. State the conclusion in the context of the problem.

Example

A company claims that the average lifetime of its light bulbs is 1,200 hours. A consumer protection group suspects that the average lifetime is actually less than 1,200 hours. What is the appropriate alternative hypothesis?

A. $H_1: \mu=1200$

B. $H_1: \mu \neq 1200$

C. $H_1: \mu<1200$

D. $H_1: \mu>1200$

C. $H_1: \mu<1200$

Solution

Example

A pharmaceutical company claims that its new drug has no effect on blood pressure, and the average systolic blood pressure remains 120 mmHg . A doctor wants to test if the drug has any effect, either increasing or decreasing blood pressure. What is the appropriate alternative hypothesis?

A. $H_1: \mu=120$

B. $H_1: \mu \neq 120$

C. $H_1: \mu<120$

D. $H_1: \mu>120$

B. $H_1: \mu\neq 120$

Solution

Example

A researcher believes that the mean weight of apples in a certain orchard has increased compared to last year's mean of $150$ grams. What is the appropriate alternative hypothesis?

A. $H_1: \mu=150$

B. $H_1: \mu \neq 150$

C. $H_1: \mu<150$

D. $H_1: \mu>150$

D. $H_1: \mu> 150$

Solution

Example

The average test score of a math class is believed to be 75 points, but a teacher hypothesizes that the new teaching method will result in scores that are higher than 75 points. What is the appropriate alternative hypothesis?

A. $H_1: \mu=75$

B. $H_1: \mu \neq 75$

C. $H_1: \mu<75$

D. $H_1: \mu>75$

D. $H_1: \mu> 75$

Solution

Example

The average daily sales in a small café are $\$ 800$, but the manager suspects that sales have decreased since a new competitor opened nearby. What is the appropriate alternative hypothesis?

A. $H_1: \mu=800$

B. $H_1: \mu \neq 800$

C. $H_1: \mu<800$

D. $H_1: \mu>800$

C. $H_1: \mu< 800$

Solution

Example

A political poll states that $60 \%$ of voters support a new policy. A researcher believes that the actual proportion of voters who support the policy is different from $60 \%$. What is the appropriate alternative hypothesis?

A. $H_1: p=0.6$

B. $H_1: p \neq 0.6$

C. $H_1: p<0.6$

D. $H_1: p>0.6$

B. $H_1: p\neq 0.6$

Solution

Example

A manufacturing company claims that $90 \%$ of its products meet quality standards. An auditor suspects that the actual proportion of products meeting quality standards is less than $90 \%$. What is the appropriate alternative hypothesis?

A. $H_1: p=0.9$

B. $H_1: p \neq 0.9$

C. $H_1: p<0.9$

D. $H_1: p>0.9$

C. $H_1: p<0.9$

Solution

Example

A machine fills cereal boxes with an average of $500 g$ per box. The standard deviation is known to be 10 g . A random sample of 30 boxes has a mean weight of $505 g$. The test is conducted at a $5 \%$ significance level to determine if the mean weight is different from 500 g .

The calculated test statistic is $z=2.74$, and the critical values are $\pm 1.96$. What decision should be made?

A. Reject the null hypothesis because $|z|>1.96$.

B. Fail to reject the null hypothesis because $|z|<1.96$.

C. Reject the null hypothesis because $z>1.96$.

D. Fail to reject the null hypothesis because $z<1.96$.

A. Reject the null hypothesis because $|z|>1.96$.

Solution

Example

A researcher claims that the average score on a math test is 75 . A random sample of $25$ students has a mean score of 78 and a standard deviation of 5 . The test is conducted at a $1 \%$ significance level to determine if the average score is higher than 75.

The calculated test statistic is $t=3.00$, and the critical value is $t_{0.01,24}=2.492$. What decision should be made? What decision should be made?

A. Reject the null hypothesis because $t>2.492$.

B. Fail to reject the null hypothesis because $t<2.492$.

C. Reject the null hypothesis because $|t|>2.492$.

D. Fail to reject the null hypothesis because $|t|<2.492$.

A. Reject the null hypothesis because $t>2.492$.

Solution

Example

A survey reports that $40 \%$ of adults prefer online shopping. A random sample of 100 adults finds that $4 8 \%$ prefer online shopping. The test is conducted at a $5 \%$ significance level to determine if the true proportion is different from $40\%$.

The calculated test statistic is $z=1.60$, and the critical values are $\pm 1.96$. What decision should be made?

A. Reject the null hypothesis because $z>1.96$.

B. Fail to reject the null hypothesis because $|z|<1.96$.

C. Reject the null hypothesis because $z<-1.96$.

D. Fail to reject the null hypothesis because $|z|>1.96$.

B. Fail to reject the null hypothesis because $|z|<1.96$..

Solution

Example

A factory claims that the mean lifetime of a type of battery is 300 hours. The standard deviation is known to be 50 hours. A random sample of 36 batteries has a mean lifetime of 290 hours. The test is conducted at a $5 \%$ significance level to determine if the mean lifetime is less than 300 hours.

The calculated test statistic is $z=-1.20$, and the critical value is $z_{0.05}=-1.645$. What decision should be made?

A. Reject the null hypothesis because $z<-1.645$.

B. Fail to reject the null hypothesis because $z>-1.645$.

C. Reject the null hypothesis because $|z|>1.645$.

D. Fail to reject the null hypothesis because $|z|<1.645$.

B. Fail to reject the null hypothosis because $z>-1.645$.

Solution

Example

It is claimed that $70\%$ of students regularly use the library. A random sample of $200$ students finds that 150 students ( $75 \%$ ) regularly use the library. The test is conducted at a $1 \%$ significance level to determine if the proportion is higher than $70 \%$.

The calculated test statistic is $z=1.83$, and the critical value is $z_{0.01}=2.33$. What decision should be made?

A. Reject the null hypothesis because $z>2.33$.

B. Fail to reject the null hypothesis because $z<2.33$.

C. Reject the null hypothesis because $|z|>2.33$.

D. Fail to reject the null hypothesis because $|z|<2.33$.

B. Fail to reject the null hypothosis because $z<2.33$.

Solution

Example

A nutritionist tests whether the average calorie content of a type of snack bar is less than 200 calories. The p -value of the test is 0.02 , and the significance level is $\alpha=0.05$. What decision should be made?

A. Reject the null hypothesis because $p<\alpha$.

B. Fail to reject the null hypothesis because $p>\alpha$.

C. Reject the null hypothesis because $p>\alpha$.

D. Fail to reject the null hypothesis because $p<\alpha$.

A. Reject the null hypothesis because $p<\alpha$.

Solution

Example

A factory claims that the mean diameter of its bolts is 5 mm . A test yields a $P-$value of 0.045 . The significance level is $\alpha=0.01$. What decision should be made?

A. Reject the null hypothesis because $p<\alpha$.

B. Fail to reject the null hypothesis because $p>\alpha$.

C. Reject the null hypothesis because $p>\alpha$.

D. Fail to reject the null hypothesis because $p<\alpha$.

B. Fail to reject the null hypothesis because $p>\alpha$.

Solution

Example

A researcher tests whether the proportion of students who own a tablet is greater than $40 \%$. The sample yields a $P-$value of 0.18 . The significance level is $\alpha=0.05$. What decision should be made?

A. Reject the null hypothesis because $p<\alpha$.

B. Fail to reject the null hypothesis because $p>\alpha$.

C. Reject the null hypothesis because $p>\alpha$.

D. Fail to reject the null hypothesis because $p<\alpha$.

B. Fail to reject the null hypothesis because $p>\alpha$.

Solution

Example

A study tests whether the mean weight of a certain fish species is 20 kg . A random sample produces a $P-$value of 0.008 . The test is conducted at a significance level of $\alpha=0.01$. What decision should be made?

A. Reject the null hypothesis because $p<\alpha$.

B. Fail to reject the null hypothesis because $p>\alpha$.

C. Reject the null hypothesis because $p>\alpha$.

D. Fail to reject the null hypothesis because $p<\alpha$.

A. Reject the null hypothesis because $p<\alpha$.

Solution

Example

A poll claims that $50 \%$ of voters support a new policy. A random sample produces a test statistic with a $P-$value of 0.12 . The test is conducted at a significance level of $\alpha=0.10$. What decision should be made?

A. Reject the null hypothesis because $p<\alpha$.

B. Fail to reject the null hypothesis because $p>\alpha$.

C. Reject the null hypothesis because $p>\alpha$.

D. Fail to reject the null hypothesis because $p<\alpha$.

B. Fail to reject the null hypothesis because $p>\alpha$.

Solution

Example

A company claims that the average processing time for an application is 15 minutes. A random sample yields a $P-$value of 0.03 in a two-tailed test. The significance level is $\alpha=0.05$. What decision should be made?

A. Reject the null hypothesis because $p<\alpha$.

B. Fail to reject the null hypothesis because $p>\alpha$.

C. Reject the null hypothesis because $p>\alpha$.

D. Fail to reject the null hypothesis because $p<\alpha$.

A. Reject the null hypothesis because $p<\alpha$.

Solution

Errors

Since we are making a statistical decision based on a random variable, no hypothesis will be $100 \%$ correct. As such, there is always a chance of making an incorrect conclusion. There are two types of errors: Type I and Type II.

Definition:

Type I Error

A Type I error occurs when the null hypothesis is rejected when it is actually true. The probability of making a Type I error is equal to the level of significance, $\alpha$.

Definition:

Type II Error

A Type II error occurs when the null hypothesis is not rejected when it is actually false. The probability of making a Type II error is denoted as $\beta$.

The probability of making a Type II error is dependent on the sample size, the level of significance, and the effect size. The effect size is the difference between the hypothesized parameter and the true parameter value. The larger the effect size, the smaller the probability of making a Type II error.

Here is a table that summarises the two types of errors along with there associated probabilities: $$\begin{array}{l|ll} \text { Decision based on sample } & H_0 \text { is true } & H_0 \text { is false } \\ \hline \text { Fail to reject } H_0 & \begin{array}{l} \text { Correct decision } \\ P=1-\alpha \end{array} & \begin{array}{l} \text { Type II error } \\ P=\beta \end{array} \\ \text { Reject } H_0 & \begin{array}{l} \text { Type I error } \\ P=\alpha \end{array} & \begin{array}{l} \text { Correct decision } \\ P=1-\beta \end{array} \end{array}$$

The risks of these two errors are inversely related and determined by the level of significance and the power for the test.

Definition:

Power

The power of a test is the probability of rejecting the null hypothesis when it is false. The power of a test is equal to $1-\beta$.

Factors Influencing the Power of a Test

To lower the risk of committing a Type I error, we can lower the level of significance, $\alpha$. However, lowering the level of significance means that we would be less likely to detect a true difference if one really exists.

To lower the risk of of committing a Type II error we need ensure that the test has enough power. This is achieved using a sample size that is large enough to detect a practical difference when one truly exists.

In addition to sample size, other factors which affect power are:

Level of significance, $\alpha$

Variability or variance in the measured response variable.

Magnitude of the effect variable.

Power is increased when a researcher increases sample size, as well as when a researcher increases effect sizes and significance levels. In reality, a researcher wants both Type I and Type II errors to be small. In terms of significance level and power, this means we want a small significance level (close to 0 ) and a large power (close to 1 ).

Example

An industrial engineer tests whether the average time to assemble a product differs from the standard time of 45 minutes. The null hypothesis is $H_0: \mu=45$, and the alternative hypothesis is $H_1: \mu\neq 45$. What is a Type II error in this context?

A. Concluding that the average assembly time is different from 45 minutes when it is not.

B. Concluding that the average assembly time is 45 minutes when it actually differs.

C. Concluding that the average assembly time is shorter than 45 minutes when it is longer.

D. Concluding that the average assembly time is longer than 45 minutes when it is shorter.

B. Concluding that the average assembly time is 45 minutes when it actually differs.

Solution

Example

A public health official tests whether the proportion of households with access to clean water in a region is different from $70 \%$. The null hypothesis is $H_0: p=0.70$, and the alternative hypothesis is $H_1: p \neq 0.70$ What is a Type I error in this context?

A. Concluding that the proportion of households with access to clean water is different from $70 \%$ when it does not.

B. Concluding that the proportion of households with access to clean water is $70 \%$ when it actually differs.

C. Concluding that the proportion of households with access to clean water is less than $70 \%$ when it is higher.

D. Concluding that the proportion of households with access to clean water is higher than $70 \%$ when it is lower.

A. Concluding that the proportion of households with access to clean water is different from $70 \%$ when it does not.

Solution

Example

A clinical trial tests whether a new drug reduces blood pressure below the current average of 120 mmHg . The null hypothesis is $H_0: \mu=120$, and the alternative hypothesis is $H_1: \mu<120$. What is a Type I error in this context?

A. Concluding that the drug reduces blood pressure when it does not.

B. Concluding that the drug does not reduce blood pressure when it does.

C. Failing to detect that the drug reduces blood pressure.

D. Not conducting enough trials to determine the effect.

A. Concluding that the drug reduces blood pressure when it does not.

Solution

Example

A manufacturer claims that $95 \%$ of its products meet quality standards. The null hypothesis is $H_0$ : $p=0.95$, and the alternative hypothesis is $H_1: p<0.95$. What is a Type II errof in this context?

A. Rejecting $H_0$ when $p=0.95$.

B. Concluding that the proportion meeting quality standards is $95 \%$ when it is actually less.

C. Concluding that the proportion meeting quality standards is less than $95 \%$ when it is not.

D. Failing to detect a decrease in the proportion meeting quality standards.

Solution

Example

A researcher tests whether the average temperature in a region has increased above the historical average of $15^{\circ} C$. The null hypothesis is $H_0: \mu=15$, and the alternative hypothesis is $H_1: \mu>15$.What is a Type I error in this context?

A. Concluding that the average temperature has increased when it has not.

B. Concluding that the average temperature has not increased when it has.

C. Failing to detect an increase in the average temperature.

D. Misreporting the sample size for the study.

A. Concluding that the average temperature has increased when it has not.

Solution

Example

A health department tests whether the proportion of people vaccinated in a community is less than the required $80 \%$ for herd immunity. The null hypothesis is $H_0: p=0.80$, and the alternative hypothesis is $H_1: p<0.80$. What is a Type II error in this context?

A. Concluding that the vaccination rate is less than $80 \%$ when it is not.

B. Concluding that the vaccination rate is $80 \%$ when it is actually lower.

C. Failing to detect that the vaccination rate is lower than $80 \%$.

D. Failing to conduct a sufficiently large survey.

C. Failing to detect that the vaccination rate is lower than $80 \%$.

Solution

Example

A biologist tests whether the proportion of a bird species migrating each year is different from $60 \%$. The null hypothesis is $H_0: p=0.60$, and the alternative hypothesis is $H_1: p \neq 0.60$. What is the Type I error in this context?

A. Concluding that the migration proportion differs from $60 \%$ when it does not.

B. Concluding that the migration proportion is $60 \%$ when it actually differs.

C. Failing to detect a difference in the migration proportion.

D. Miscalculating the proportion of migrating birds.

A. Concluding that the migration proportion differs from $60 \%$ when it does not.

Solution

Example

An agricultural scientist tests whether a new fertilizer increases crop yield above the standard average of 50 bushels per acre. The null hypothesis is $H_0: \mu=50$, and the alternative hypothesis is $H_1: \mu>50$ If the test leads to rejecting $H_0$ when $H_0$ is actually true, what type of error is this?

A. Type I error

B. Type II error

C. Sampling error

D. Measurement error

A. Type I error

Solution

Tests on the Mean When Population Variance is Known

The population mean, $\mu$, is a parameter that is often of interest to researchers.

As with interval estimation, assumptions for normality and implications from the Central Limit Theorem hold:

That is

1. $\bar{X} \sim N\left(\mu_{\bar{x}}, \sigma_{\bar{x}}\right) \quad ; \quad$ provided that $n \geq 30$

2. If $n<30$, then the underlying distribution of $\bar{X}$ needs to be normal.

Since the sampling distribution of the means follows a normal distribution, the standard normal ( $Z$-table) will be serve as the null (statistical) distribution for determining critical values and computing $P$-values.

How To Conduct A Test on The Mean When $\sigma$ Is Known

Let $k=$ the claimed or historical value of the population mean, $\mu$

1. State the null hypothesis
$H_0:\mu=k$

2. State the alternate hypothesis
$H_1: \mu < k $
$H_1: \mu > k $ or
$H_1: \mu \neq k $

3. Draw the rejection zone if the level of significance, $\alpha$, is given, and determine the critical value(s), $Z_c$, associated with the rejection region.

4. Using the information obtained from the sample, calculate the test statistic, $Z_t$, $$Z_t=\frac{\bar{x}-k}{\sigma / \sqrt{n}}$$

5. Make a statistical decision. Do this with a graph or by comparing the test statistic to the critical value(s).

If the test statistic falls in the rejection region, reject the null hypothesis.

If the test statistic falls in the non-rejection region, do not reject the null hypothesis.

6. Make a conclusion in the context of the problem.

7. Calculate the $P$-value and make a decision based on the $P$-value.

If the $P$-value is less than $\alpha$, reject the null hypothesis.

If the $P$-value is greater than $\alpha$, do not reject the null hypothesis.

Remark

In the case of a double tailed test, the $P$-value needs to be doubled because there are two rejection zones (and hence two areas of opportunity) where the test-statistic could fall into.

Rule of Thumb

As with the parameter estimation, the conclusion to hypothesis test needs to be carefully worded. Here are two generally accepted forms on how to word to the conclusion.

`` At the _____ level of significance, there is sufficient/insufficient evidence to indicate that the mean is _____ ``

`` At the _____ level of significance, there is/not enough evidence to indicate that the mean is _____ ``

** Write the alternate hypothesis in English, and in the context of problem in the last blank.

Remark

It is important not to say that ``that the true mean is $\bar{x}$``. This is because the value of the mean used to discredit the null hypothesis is a sample statistic and hence a random variable. Thus, its value is liable to change every time we draw a new sample from the population.

Remark

It is generally considered bad practice to say that we ``accept the null hypothesis`` because doing so implies a level of certainty that the null hypothesis is true, which is not warranted by the hypothesis testing framework. Instead, we say we ``fail to reject the null hypothesis.`` Here's why:

Hypothesis testing starts with the assumption that the null hypothesis $\left(H_0\right)$ is true.

The goal is to determine whether there is enough evidence in the sample data to reject $H_0$.

If we fail to reject $H_0$, it does not prove that $H_0$ is true; it simply means there isn't sufficient evidence against it.

Failing to reject $H_0$, could occur for several reasons such as

insufficient sample size

or a weak effect size.

Example

Consider a hypothesis test where $H_0: \mu=29$ and $H_1: \mu \neq 29$. A random sample of $25$ observations taken from a population produced a sample mean of $25.3$ . The population is normally distributed with $\sigma=8$.

Example

Consider a hypothesis test where $H_0: \mu=30$ and $H_1: \mu<30$. A random sample of $36$ observations taken from a population produced a sample mean of $27.6$ . The population has a standard deviation of $\sigma=10$.

Example

Consider a hypothesis test where $H_0: \mu=54$ and $H_1: \mu>54$. A random sample of 40 observations taken from a population produced a sample mean of $56.78$ . The population has a standard deviation of $\sigma=5.25$.

Example

A certain colleague of mine who teaches Differential Equations suspects that the $10$ ounce bag of fancy Swiss cheese he gets at the supermarket actually weighs less than $10$ ounces. He took a random sample of $20$ such packages and found that the mean weight for the sample was $9.955$ ounces. The population follows a normal distribution with a standard deviation of $0.15$ ounces.

Example

A study claims that senior citizens living in Mirabel spend an average of $14$ hours gardening during the weekend. A random sample of $200$ people showed that these senior citizens spend an average of $14.65$ hours on gardening during the weekend. Suppose that the standard deviation is known to be $3$ hours.

Example

The life in hours of a battery is known to be normally distributed with standard deviation $\sigma=1.25$ hours. A random sample of 10 batteries has a mean life of $\bar{x}=40.5$ hours.

Example

A melting point test of $n=10$ samples of a binder used in manufacturing a rocket propellant resulted in $\bar{x}=154.2^{\circ} F$. Assume that the melting point is normally distributed with $\sigma=1.5^{\circ} F$.

Example

An engineer who is studying the tensile strength of a steel alloy intended for use in golf club shafts knows that tensile strength is approximately normally distributed with $\sigma=60 psi$. A random sample of 12 specimens has a mean tensile strength of $\bar{x}=3450$ psi.

Example

A phone company claims that the mean duration of their longdistance calls made by residents is 10 minutes. A random sample of 100 long-distance calls made by its residential customers found that the mean duration for these calls was 10.20 minutes. Suppose that the standard deviation is known to be 3.80 minutes.

Example

At a dairy farm, a machine is set to fill 32 -ounce cartons with milk. However, the machine does not put exactly 32 -ounces into each carton; the amount varies from carton to carton but the volume is known to be normally distributed. When the machine is working correctly, the mean volume dispensed into each carton is 32 ounces, with a standard deviation of 1.5 ounces. A quality control inspector takes 25 cartons can finds that the average volume of milk in the containers is 31.93 ounces.

Tests on the Mean When Population Variance is Unknown

The procedure for conducting a hypothesis test on the mean when the population variance is unknown is similar to a test when the population variance is known. The only difference is that the test statistics changes from a $Z$-statistic to a $t$-statistic.

$$Z_t=\frac{\bar{x}-k}{\sigma / \sqrt{n}} \Rightarrow T_t=\frac{\bar{x}-k}{s / \sqrt{n}} $$ the $t$-distribution serves as the null (statistical) distribution, and the $P$-values are estimated differently than when the population variance is known.

How To Conduct A Test on The Mean When $\sigma$ Is Unknown

Let $k=$ the claimed or historical value of the population mean, $\mu$

1. State the null hypothesis
$H_0:\mu=k$

2. State the alternate hypothesis
$H_1: \mu < k $
$H_1: \mu > k $ or
$H_1: \mu \neq k $

3. Draw the rejection zone if the level of significance, $\alpha$, is given, and determine the critical value(s), $t_{c, n-1}$, associated with the rejection region. The degrees of freedom associated with this test is $df=n-1$

4. Using the information obtained from the sample, calculate the test statistic, $T_t$, $$T_t=\frac{\bar{x}-k}{s / \sqrt{n}}$$

5. Make a statistical decision. Do this with a graph or by comparing the test statistic to the critical value(s).

If the test statistic falls in the rejection region, reject the null hypothesis.

If the test statistic falls in the non-rejection region, do not reject the null hypothesis.

6. Make a conclusion in the context of the problem.

7. Estimate the $P$-value and make a decision based on the $P$-value.

If the $P$-value is less than $\alpha$, reject the null hypothesis.

If the $P$-value is greater than $\alpha$, do not reject the null hypothesis.

Remark

To estimate the $P$ -value for the test, be on the row for degrees of freedom that is associated with the hypothesis test. Next, find the two values for which the test-statistic is sandwiched between. Once you have isolated these two values, move up the column until you hit the row that says $\alpha$. Read off the values; these will serve as the range for estimating the size of the $P$-value

Remark

In the case of a double tailed test, the $P$-value needs to be doubled because there are two rejection zones (and hence two areas of opportunity) where the test-statistic could fall into.

Rule of Thumb

As with the parameter estimation, the conclusion to hypothesis test needs to be carefully worded. Here are two generally accepted forms on how to word to the conclusion.

`` At the _____ level of significance, there is sufficient/insufficient evidence to indicate that the mean is _____ ``

`` At the _____ level of significance, there is/not enough evidence to indicate that the mean is _____ ``

** Write the null hypothesis in English, and in the context of problem in the last blank.

Example

Consider a hypothesis test where $H_0: \mu=205$ and $H_1: \mu>205$. A random sample of 14 observations taken from a population that is normally distributed produced a sample mean of 212.37 and a standard deviation of 16.35 .

Example

Consider a hypothesis test where $H_0: \mu=50$ and $H_1: \mu<50$. A random sample of 8 observations taken from a population that is normally distributed produced a sample mean of 44.98 and a standard deviation of 6.77.

Example

Consider a hypothesis test where $H_0:\mu=10.70$ and $H_1: \mu \neq 10.70$. A random sample of 47 observations taken from a population produced a sample mean of 12.025 and a standard deviation of 4.90 .

Example

The President of a university claims that the mean time spent partying by students at this university is less 11 hours per week. A random sample of 40 students taken from this university showed that they spent an average of 10.5 hours partying, with a standard deviation of 2.3 hours.

Example

A team of physicists is studying the vibration frequency of a newly designed tuning fork. The manufacturer claims that the tuning fork vibrates at an average frequency of 256 Hz . The physicists suspect that the actual mean frequency differs from the claimed value. A random sample of 15 tuning forks produced a sample mean of 253 Hz and a standard deviation of 3.5 Hz.

Example

The body temperatures for 25 female subjects resulted in a sample average of $\bar{x}=98.264^{\circ} F$ and a standard deviation of $s=0.4821^{\circ}F$.

Example

A manufacturer of running shoes knows that the average lifetime for a particular model of shoes is 15 months. Someone in the research and development division of the shoe company claims to have developed a longer lasting product. This new product was worn by 36 individuals and lasted on average for 17 months. The variability of the original shoe is estimated based on the standard deviation of the new group which is 5.5 months.

Example

The company claims that their cookies have a shelf life of 5 years. A random sample of 200 cookies were taken from the warehouse found that the average shelf life of the sample was 58 months with a standard deviation of 4.5 months. Assume that the population is normally distributed.

Tests on the Population Proportion

The null distribution for the test on the population proportion is the standard normal distribution. The test statistic is the z-score. The test statistic is calculated as follows:

$$Z=\frac{\hat{p}-k}{\sqrt{\frac{k(1-k)}{n}}}$$

where $\hat{p}$ is the sample proportion, $p$ is the claimed or historical value of the population proportion, and $n$ is the sample size.

How To Conduct A Test On The Population Proportion

Let $k=$ the claimed or historical value of the population mean, $\mu$

1. State the null hypothesis
$H_0:p=k$

2. State the alternate hypothesis
$H_1: p < k $
$H_1: p > k $ or
$H_1: p \neq k $

3. Draw the rejection zone if the level of significance, $\alpha$, is given, and determine the critical value(s), $Z_{c}$, associated with the rejection region.

4. Using the information obtained from the sample, calculate the test statistic, $Z_t$, $$Z_t=\frac{\hat{p}-k}{ \sqrt{\frac{k(1-k)}{n}}} \quad ;\quad \hat{p}=\frac{x}{n}$$

5. Make a statistical decision. Do this with a graph or by comparing the test statistic to the critical value(s).

If the test statistic falls in the rejection region, reject the null hypothesis.

If the test statistic falls in the non-rejection region, do not reject the null hypothesis.

6. Make a conclusion in the context of the problem.

7. Calculate the $P$-value and make a decision based on the $P$-value.

If the $P$-value is less than $\alpha$, reject the null hypothesis.

If the $P$-value is greater than $\alpha$, do not reject the null hypothesis.

Remark

The sample size must be sufficiently large for the sampling distribution of the sample proportion to be approximately normal. This is checked using the success-failure condition: $nk \geq 5$ and $n(1-k) \geq 5$.

Rule of Thumb

As with the parameter estimation, the conclusion to hypothesis test needs to be carefully worded. Here are two generally accepted forms on how to word to the conclusion.

`` At the _____ level of significance, there is sufficient/insufficient evidence to indicate that the population proportion is _____ ``

`` At the _____ level of significance, there is/not enough evidence to indicate that the population percentage is _____ ``

** Write the null hypothesis in English, and in the context of problem in the last blank.

Example

A food company is planning to market a new type of frozen yoghurt. However, before marketing this yoghurt, the company wants to find the percentage of people who like it. The company's management has decided only to market this yoghurt if at least $35\%$ of people like it. The company's research team selected a random sample of 400 people and asked then to taste this yoghurt. Of these, 150 said that they liked it.

Example

A study in 2015 claimed that $11\%$ of all children in the US currently live with at least on grandparent. In 2020, a random sample of 1600 children found that 180 did currently live with at least grandparent.

Example

A company that sell computer parts claims that more $90\%$ of their orders are mailed within 72 hours of them being received. The quality control department took a random sample of 150 orders and found that 140 were mailed within 72 hours of the order being placed.

Example

A biologist is studying the population of a specific species of frogs in a wetland area. Previous research suggests that $40 \%$ of frogs in this region carry a gene that makes them resistant to a common fungal infection. The biologist hypothesizes that the proportion of resistant frogs has increased due to recent conservation efforts. A random sample of 150 frogs was taken and 72 were found to carry the gene.

Example

A biologist is studying a population of butterflies in a particular region. Historically, it is known that $35 \%$ of these butterflies carry a genetic marker that makes them resistant to a certain plant toxin. A random sample of 100 butterflies was taken and 30 were found to carry the genetic marker.

Example

A tech company claims that its new AI chatbot correctly answers $85 \%$ of user queries. The company has recently implemented an update, and the development team believes that the proportion of correct responses has improved. A random sample of 200 queries was taken, and 180 were found to be answered correctly.

Example

A chemical manufacturing company produces a catalyst that is supposed to speed up a reaction in $95 \%$ of trials. Due to a change in the production process, the company claims that the proportion of successful reactions has decreased.To test this claim, a random sample of $120$ trials using the new catalyst and observes that the reaction is successful in $110$ trials.

Example

A company specializing in facial recognition software claims that their Al algorithm correctly identifies faces $98\%$ of the time. A recent update to the algorithm was released, and engineers are concerned that the update might have changed the accuracy of the system. A random sample of 500 faces was taken, and the algorithm correctly identified 480 of them.

Example

A researcher claims that at least $10 \%$ of all football helmets have manufacturing flaws that could potentially cause injury to the wearer. A sample of 200 helmets revealed that 24 helmets contained such defects.

Introduction to Hypothesis Testing

Definition:

Null Hypothesis ($H_0$)

Definition:

Alternate Hypothesis ($H_1$)

Definition:

Test Statistic

Definition:

Level of Significance

Definition:

$P-$value

Guidelines On How To Conduct A Hypothesis Test

Example

Example

Example

Example

Example

Example

Example

Example

Example

Example

Example

Example

Example

Example

Example

Example

Example

Example

Errors

Definition:

Type I Error

Definition:

Type II Error

Definition:

Power

Factors Influencing the Power of a Test

Example

Example

Example

Example

Example

Example

Example

Example

Tests on the Mean When Population Variance is Known

How To Conduct A Test on The Mean When $\sigma$ Is Known

Remark

Rule of Thumb

Remark

Remark

Example

At the $5 \%$ level of significance, is there enough evidence to reject the null hypothesis?

Example

At the $1 \%$ level of significance, is there enough evidence to reject the null hypothesis?

Example

At the $5 \%$ level of significance, is there enough evidence to reject the null hypothesis?

Example

At the $\alpha=0.01$ level of significance does the data indicate that the average weight in this type of packaged cheese weighs less than 10 ounces? Compute a $P$-value for your test and write a conclusion in the context of the problem.

Example

At the $0.025$ level of significance does the data indicate that the average amount of time spent on gardening by seniors living in Mirabel is more than $14$ hours during the weekend? Compute a $P$ - value and write a conclusion in the context of the problem.

Example

Is there evidence to support the claim that battery life exceeds 40 hours? Use $\alpha=0.05$. What is the $P$-value for this test?

What is the Type II error in the context of this problem?

What is the Type I error in the context of this problem?

Example

Test $H_0: \mu=155$ versus $H_1: \mu \neq 155$ using $\alpha=0.01$. What is the $P$-value for this test?

What is the Type II error in the context of this problem?

What is the Type I error in the context of this problem?

Example

Test the hypothesis that mean strength is less than 3500 psi . Use $\alpha=0.01$.

Construct and explain how a one-sided confidence interval could be used to support the result obtained in (a).

Example

At the 0.02 level of significance does the data indicate at that the average long distance call made by these residential customers is longer than 10 minutes? Compute the $P$-value for this test.

Construct and explain how a one-sided confidence interval could be used to support the result obtained in (a).

Example

At the $5 \%$ level of significance, does the data indicate that the average amount of milk dispensed into the cartons by the machine is different from 32 ounces?

Construct and explain how a two-sided confidence interval could be used to support the result obtained in (a).

Tests on the Mean When Population Variance is Unknown