The Normal Distribution

The normal distribution, also known as the Gaussian distribution, is one of the most widely used continuous probability distributions in statistics. Its importance lies in its central role in probability theory and statistics, particularly in the Central Limit Theorem (CLT), which we will explore in the section on Sampling Distributions.
The normal distribution is closely tied to the CLT, as it serves as the limiting distribution for the sum of a large number of independent and identically distributed random variables. The CLT states that, under certain conditions, the sum of these variables will approximate a normal distribution, regardless of their original distributions. This result is fundamental because many real-world phenomena can be modeled as the sum of numerous random variables. By virtue of the CLT, we can often assume that these sums will follow a normal distribution.
In the sciences, the normal distribution is indispensable for analyzing data and making predictions. In physics, for instance, measurement errors frequently follow a normal distribution, enabling scientists to quantify uncertainty and improve precision. In biology, traits such as human height, blood pressure, and test scores often exhibit normality, allowing researchers to study populations and draw meaningful conclusions. Moreover, the normal distribution underpins key statistical methods like hypothesis testing and regression analysis, which are widely used across disciplines—from environmental research to medicine—to uncover insights and drive discoveries.

Definition:

The Normal Distribution

A random variable, $X$, is said to be Gaussian/normally distributed if it has probability density $$f(x)=\frac{1}{\sqrt{2 \pi \sigma}} e^{-(x-\mu)^2 / 2 \sigma^2} \quad-\infty < x < \infty $$

where
$\mu =$ the mean
$\sigma=$ the standard deviation of the distribution.

Remark

If $X$ is a normally distributed random variable, it is often abbreviated as $X \sim N(\mu, \sigma^2)$, where $N$ denotes the normal distribution.
Theorem:
Expected Value and Variance of the Normal Distribution
For a normally distributed random variable $X \sim N(\mu, \sigma^2)$, the expected value (mean) and variance are given by: $$E(X) = \mu $$ and the variance is given by $$Var(X) = \sigma^2$$

Properties of the Normal Distribution

The normal distribution is defined by two key parameters: the mean, $\mu$, and the standard deviation, $\sigma$,. The mean determines the center of the distribution, where the function reaches its absolute maximum at $x = \mu$. The standard deviation controls the spread of the curve: a larger standard deviation results in a flatter, more spread-out curve, while a smaller standard deviation produces a narrower, more concentrated curve. Notably, the normal distribution is continuous and unbounded, extending infinitely in both directions, from negative to positive infinity.
The area underneath the curve, corresponds to the probability that the random variable $X$ will fall into that range. Since the distribution is symmetric, $50\%$ of the area will lie to the left of the mean, and $50\%$ will lie to the right.

The Empirical Rule and Chebyshev's Theorem

The standard deviation can be viewed as a measuring stick to determine how much of the data or values of $X$, deviate away from the mean. As such, the Empirical Rule and Chebyshev's Theorem describes the minimum proportion of the measurements that must lie within one, two, or more standard deviations away from the mean.
The Empirical Rule states that for a normal distribution,

  • approximately $68\%$ of the data falls within one standard deviation of the mean
  • $95\%$ within two standard deviations
  • $99.7\%$ within three standard deviations
The Empirical Rule only applies to dataset/distributions that are bellshaped, and even then is stated in terms of approximations. A stronger result that applies to every data set is known as Chebyshev's Theorem.
Theorem:
Chebyshev's Theorem
For any distribution, the proportion of measurements within $k$ standard deviations of the mean is at least $$1-\frac{1}{k^2}$$ for $k>1$.

Standardization

Often times, we are interested in calculating probability of events that are not just one, two, or three standard deviations away from the mean, so the Empirical Rule is of little use in situations like these.
Moreover, if we wanted to calculate the probability that the value of $X$ takes on an interval of values, then we would be forced to integrate $$ f(x)=\frac{1}{\sqrt{2 \pi \sigma}} e^{-(x-\mu)^2 / 2 \sigma^2} $$ over that interval. This can be a cumbersome task, especially if the interval is not symmetric about the mean.
Another common challenge arises when comparing two or more normally distributed random variables. Since the mean $\mu$ of each variable can be located anywhere along the $x$-axis and the standard deviation $\sigma$ can take any positive value, each random variable may have a distinct center and spread. These differences in means and standard deviations make direct comparisons between the distributions difficult.
To address these challenges, we can standardize the random variable $X$ by transforming it into a new variable $Z$, known as the standard normal variable.

Definition:

Standard Normal Variable

Let $X$ be a normally distributed random variable with mean $\mu$ and standard deviation $\sigma$. Then the standard value more commonly knonwn as the $Z$-score or $Z$-value of $X$ is $$Z = \frac{X-\mu}{\sigma}\quad \Rightarrow \quad X=Z\sigma + \mu $$ where

$X=$ the value of the random variable
$\mu =$ the mean
$\sigma=$ the standard deviation of the distribution.

Remark

Standardization does the following:
  • Relocates the mean, $\mu$, to 0 .
  • Rescales the standard deviation, $\sigma$, to 1 .
  • Takes all values of $X$, and reconfigures them into values of $Z$. Positive z-scores indicate values above the mean, while negative z-scores represent values below the mean.

Example

For each of the following questions, use the $Z-$table to find the probability of the given event.

Example

For each of the following questions, use the $Z-$table to find the probability of the given event.

Example

For each of the following questions, use the $Z-$table to find the probability of the given event.

Example

For each of the following questions, find the value of $Z$ which satisfies the following inequalities.

Example

For each of the following questions, find the value of $Z$ which satisfies the following inequalities.

Example

For each of the following questions, find the value of $Z$ which satisfies the following inequalities.

Example

Let $X$ be a continuous random variable that is normally distributed with a mean of $\mu=5$ and a standard deviation of $\sigma=4$. Calculate the following:

Example

A software company tracks the response times of its servers to user requests. The response time (in milliseconds) is a random variable, $X$, that is normally distributed with a mean of 120 ms and a standard deviation of 15 ms . The company wants to ensure a fast user experience by analyzing the distribution of response times.

Example

In a physics laboratory, researchers are studying the speeds of particles traveling through a medium. The speed of the particles, $X$, is normally distributed with a mean of $2,500 m/s$ and a standard deviation of $200 m/s$.

Example

In a biology lab, scientists are studying the lengths of a specific type of leaf on a plant. The lengths, $X$, are normally distributed with a mean of $15\,cm$ and a standard deviation of $2.5\,cm$ .

Example

An Al company measures the time it takes for its machine learning model to process an image. The processing time, $X$, is normally distributed with a mean of 0.8 seconds and a standard deviation of 0.1 seconds.

The Normal Approximation to the Binomial Distribution

The Normal Approximation to the Binomial Distribution is a technique that simplifies the calculation of probabilities for large binomial distributions. Recall, that the binomial distribution, which models the number of $X$ successes in $n$ independent trials of a binary event, can become increasingly difficult to compute directly as $n$ grows.

However, under certain conditions, the binomial distribution closely resembles a normal distribution. This similarity allows us to leverage the properties of the normal distribution to estimate binomial probabilities more efficiently.

Conditions for the Normal Approximation

The Normal Approximation to the Binomial Distribution is most accurate when the number of trials, $n$, is sufficiently large and the probability of success, $p$, is not too close to $0$ or $1$. Specifically, the following conditions should be met:

1. $\quad n \cdot p \geq 5$
2. $\quad n \cdot (1-p) \geq 5$

These conditions ensure that the binomial distribution is approximately symmetric and bell-shaped, which is a characteristic of the normal distribution.

Correction Factors and Calculating Probabilities with the Normal Approximation

Since the binomial distribution is discrete and the normal distribution is continuous, a continuity correction is required to improve accuracy. This involves adjusting the boundaries of the binomial range by adding or subtracting 0.5 before converting $X$ into a $Z$-value.

The table below outlines the appropriate correction factors based on the type of binomial probability being calculated. $$\begin{array}{|c|c|} \hline \text { Condition } & \text { Correction Factor } \\ \hline P(X=a) & P(a-0.5 < X < a+0.5) \\ P(X > a) & P(X > a+0.5) \\ P(X \geq a) & P(X > a-0.5) \\ P(X < a) & P(X < a-0.5) \\ P(X \leq a) & P(X < a+0.5) \\ \hline \end{array}$$

Example

Suppose a biologist is studying a population of beetles, where the probability of a beetle having a particular genetic trait is $p=0.3$.

Example

A physicist is testing a batch of 1,000 light-emitting diodes (LEDs). Each LED has a probability $p=$ 0.98 of functioning correctly.

Example

A computer scientist is testing a large batch of 2,000 processors for reliability. Each processor has a probability $p=0.995$ of passing a reliability test. The scientist wants to determine:

Example

An Al company is testing a new speech recognition algorithm using a dataset of audio clips. Each clip is classified as either ``recognized correctly`` or ``not recognized.`` Based on prior testing, the algorithm has a $95 \%$ chance ( $p=0.95$ ) of correctly recognizing a clip.