Probability Distributions and Probability Mass Functions

In probability theory, discrete random variables are variables that can take on a finite or countable number of distinct values. These variables often represent outcomes of experiments or events, such as the number of heads in coin tosses or the roll of a die.

Probability Distributions

To describe the behavior of a discrete random variable, we use a probability distribution, which assigns a probability to each possible value the variable can take.

Definition:

Probability Distribution

The probability distribution of a random variable, $X$, is a description of the probabilities associated with the possible values of $X$

Probability Mass Functions

A key tool for describing discrete probability distributions is the probability mass function (PMF). The PMF is a function that specifies the probability of the random variable taking on each specific value. For example, if $X$ is the number of heads in two coin tosses, its PMF would show the probabilities of $X=0, X=1$, and $X=2$.

Definition:

Probability Mass Function (PMF)

The probability mass function (PMF) of a discrete random variable, $X$, is a function that gives the probability that $X$ is equal to a specific value, $x$. It is denoted by $P(X=x)$.

The PMF of a discrete random variable can be represented in various ways, such as a table, a graph, or a formula. The PMF must satisfy two properties:

Theorem:

Properties of a Probability Mass Function

For a discrete random variable $X$ with PMF $P(X=x)$, the following properties must hold:

$P(X=x) \geq 0$ for all values of $x$
$\displaystyle \sum_{i=1}^n P(X=x) = 1$

The first property ensures that the probabilities are non-negative, while the second property ensures that the sum of all probabilities is equal to 1. These properties are essential for a function to be a valid PMF.

The PMF provides a complete description of the probability distribution of a discrete random variable. It allows us to calculate the probabilities of specific events and analyze the behavior of the random variable.

In practice, the PMF is used to calculate probabilities, expected values, and other statistical measures related to discrete random variables. It is a fundamental concept in probability theory and statistics, forming the basis for many important results and applications.

Example

Let $X$ be a random variable with the following probability distribution: $$\begin{array}{c|ccccc} x & -2 & -1 & 0 & 1 & 2 \\ \hline f(x) & 0.2 & 0.4 & 0.1 & 0.2 & 0.1\end{array}$$

Example

Let $X$ be a random variable with the following probability distribution:$$f(x)=\frac{2x+1}{25} \quad x=0,1,2,3,4$$

Example

An Al system is monitoring user interactions with a new app feature to evaluate its usability. Each interaction is classified as either ``successful`` (the user completes the intended task) or ``unsuccessful.`` Based on previous data, the probability of a single interaction being successful is $0.85$ , and user interactions are independent.

Example

A fair six-sided die is rolled twice. Let $X$ be the random variable representing the sum of the two rolls. The probability mass function of $X$ can be calculated by considering all possible outcomes of the two rolls and their associated probabilities.

Example

In a biological research lab, scientists are studying the viability of two types of seeds in a controlled environment. Suppose the probability that a seed from species A germinates successfully is $0.85$ , and the probability that a seed from species B germinates successfully is $0.92$ . Assume that the germination of seeds from the two species is independent.

Expected Value of a Discrete Random Variable

For a random variable, $X$, two numbers are usually used to summarize its probability distribution: the mean and the variance. The mean is a measure of central tendency and the variance is a measure of the spread/dispersion.

The expected value of a discrete random variable is a measure of the center of the distribution. It is the weighted average of all possible values of the random variable, where the weights are the probabilities of the values. The expected value is also known as the mean of the random variable.

Definition:

Expected Value

Let $X$ be a discrete random variable with probability mass function, $P(X)$.

The expected value (mean) of $X$ is defined as $$\mathbb{E}[X] = \sum_{i=1}^n x_i\cdot P(X=x_i)$$

Example

Toss a die. What is the expected number of dots observed?

Solution

Let $X$ be the number of dots observed. The probability mass function of $X$ is

$$P(X) = \begin{cases} 1/6 & \text{if } X=1 \\ 1/6 & \text{if } X=2 \\ 1/6 & \text{if } X=3 \\ 1/6 & \text{if } X=4 \\ 1/6 & \text{if } X=5 \\ 1/6 & \text{if } X=6 \end{cases} $$ The expected number of dots observed is

$\begin{align}\mathbb{E}[X] &= \sum_{i=1}^6 x_i \cdot P(X=x_i)\\ &= 1\cdot \frac{1}{6} + 2\cdot \frac{1}{6} + 3\cdot \frac{1}{6} + 4\cdot \frac{1}{6} + 5\cdot \frac{1}{6} + 6\cdot \frac{1}{6}\\ &= 3.5\end{align}$

The average number of dots observed is $3.5$.

Example

Toss two dice. What is the expected number of dots observed?

Solution

Let $X$ be the number of dots observed. The probability mass function of $X$ is $$\begin{array}{c|ccccccccccc} X=x & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12 \\ \hline P(X=x) & \frac{1}{36} & \frac{2}{36} & \frac{3}{36} & \frac{4}{36} & \frac{5}{36} & \frac{6}{36} & \frac{5}{36} & \frac{4}{36} & \frac{3}{36} & \frac{2}{36} & \frac{1}{36}\end{array}$$

The expected number of dots observed is,

$$\begin{align} \mathbb{E}[X] &=\sum_{i=1}^{11} x_i\cdot P(X=x_i)\\ &= 2\cdot \frac{1}{36} + 3\cdot \frac{2}{36} + 4\cdot \frac{3}{36} + 5\cdot \frac{4}{36}+ \cdots + 12\cdot \frac{1}{36}\\ &= 7\end{align}$$

The average number of dots observed is $7$.

Remark

Alternatively:

Let
$X_1=$ the number of dots observed on the first die
$X_2=$ the number of dots observed on the second die

$X=$ the sum of the dots observed on the two dice.

Then, $$\mathbb{E}[X] = \mathbb{E}[X_1]+\mathbb{E}[X_2] = 3.5 + 3.5 = 7$$

Example

In the 1990's a nudity war broke out on Brazilian TV. One network skyrocketed to top ratings with prime-time soap operas featuring full-frontal nudity, while another countered with uncut nudity in films. Meanwhile, a third network decided to keep its clothes on and air high-quality literary adaptations instead-earning it the honor of having the worst ratings in TV history.

Let $x$ be the number of soap operas that a Brazilian person watches per week. Based on a sample survey of adults the following probability distribution was prepared:

$$\begin{array}{c|ccccc} \hline x & 0 & 1 & 2 & 3 & 4 & 5 \\ \hline P(X=x) & 0.36 & 0.24 & 0.18 & 0.10 & 0.07 & 0.05 \\ \hline \end{array}$$

Example

The Marquis de Favras was a French aristocrat and staunch supporter of the royal family during the French Revolution. Branded an enemy of the state he was sent to guillotine. Upon reading his death warrant, he quipped ``I see that you have made three spelling mistakes``. Let $X$ represent the number of spelling mistakes in a randomly selected document. The probability distribution of $X$ is given by:

$$\begin{array}{l|ccccccccc}\hline x & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ \hline P(X=x) & 0.05 & 0.10 & 0.20 & 0.25 & 0.15 & 0.10 & 0.08 & 0.05 & 0.02 \\ \hline \end{array}$$

Example

In 1932, the 'Emu War' in Western Australia saw the army deploy mounted machine guns against thousand-strong herds of rampaging thirsty emus. The emus won.

The probability that a random emu will run away from a human is $0.7$

Key Results and Properties of the Expected Value

When dealing with the expected value of a discrete random variable, several important results and properties are frequently applied.

Expected Value of a Sum:

If $X$ and $Y$ are two random variables, the expected value of their sum is: $$\mathbb{E}[X+Y]=\mathbb{E}[X]+\mathbb{E}[Y]$$ This is true regardless of whether $X$ and $Y$ are independent.

For more variables: $$\mathbb{E}[X_1+X_2+\cdots+X_n]=\mathbb{E}[X_1]+\mathbb{E}[X_2]+\cdots+\mathbb{E}[X_n]$$

Linear Transformation:

If $X$ is a random variable and $a$ and $b$ are constants, then: $$\mathbb{E}(aX+b)=a\mathbb{E}[X]+b$$

Expected Value of a Constant:

If $X$ is a random variable and $k$ is a constant, then: $$\mathbb{E}[k]=k$$

Example

A café sells two types of desserts: cookies and cakes. The random variable $X$ represents the revenue from cookie sales in a day (in dollars), while $Y$ represents the revenue from cake sales. Suppose the following information is given:

$\mathbb{E}[X]=50$ : The expected daily revenue from cookies is $\$ 50$.
$\mathbb{E}[X]=80$ : The expected daily revenue from cakes is $\$ 80$.

$X$ and $Y$ are independent.

The café owner wants to analyze their daily revenue, which includes transformations and combinations of these random variables.

Example

A bakery sells two types of products: bread and cakes. The random variable $B$ represents the daily profit (in dollars) from bread, and $C$ represents the daily profit from cakes. The bakery manager has the following information:

$\mathbb{E}[B]=200$ : The expected daily profit from bread is $\$ 200$.
$\mathbb{E}[C]=150$ : The expected daily profit from cakes is $\$ 150$.

Variance and Standard Deviation of a Discrete Random Variable

The variance of a discrete random variable is a fundamental concept in probability and statistics that measures how much the values of the variable deviate from its expected value (mean). It provides a numerical representation of the spread or dispersion of the variable's possible outcomes, offering insights into the variability of the data.

Definition:

Variance of a Discrete Random Variable

Let $X$ be a discrete random variable with expected value, $\mathbb{E}[X]$. The variance of $X$ is defined as:$$\operatorname{Var}(X)=\sum_x(x-\mathbb{E}[X])^2\cdot P(X=x)$$ where

$\mathbb{E}[X]$ is the expected value of $X$
$P(X=x)$ represents the probability that $X$ takes on the value $x$.

Theorem:

Variance

The variance of a discrete random variable $X$ can also be expressed as:$$\operatorname{Var}(X)=\mathbb{E}[X^2]-(\mathbb{E}[X])^2$$

As we saw in the section on Numerical Measures of Dispersion, the standard deviation, provides a more interpretable measure of dispersion.

Definition:

Standard Deviation of a Discrete Random Variable

The standard deviation of a discrete random variable $X$ is the square root of its variance, denoted as $\sigma_X=\sqrt{\operatorname{Var}(X)}$.

Example

Consider a discrete random variable $X$ with the following probability distribution:$$\begin{array}{c|c|c}x&P(X=x)\\\hline 1&0.2\\2&0.3\\3&0.5\end{array}$$

Example

Consider a discrete random variable $Y$ with the following probability distribution:$$\begin{array}{c|c|c}y&P(Y=y)\\\hline 0&0.1\\1&0.2\\2&0.3\\3&0.4\end{array}$$

Example

A random variable $X$ represents the number of successful tasks completed by a robot in a day. The probability distribution is as follows: $$\begin{array}{c|c|c}x&P(X=x)\\\hline 0&0.1\\1&0.2\\2&0.3\\3&0.2\\4&0.1\\5&0.1\end{array}$$

Key Results and Properteies of the Variance

The variance of a discrete random variable has several key properties and results that are important in understanding its behavior and implications:

Theorem:

Variance of a Sum (Independent Variables)

If $X$ and $Y$ are independent random variables, the variance of their sum is the sum of their variances:$$\operatorname{Var}(X+Y)=\operatorname{Var}(X)+\operatorname{Var}(Y)$$ Independence ensures that there is no covariance term.

Theorem:

Variance of a Sum (Dependent Variables)

If $X$ and $Y$ are dependent random variables, the variance of their sum is:$$\operatorname{Var}(X+Y)=\operatorname{Var}(X)+\operatorname{Var}(Y)+2\operatorname{Cov}(X,Y)$$ where $\operatorname{Cov}(X,Y)$ is the covariance between $X$ and $Y$.

Theorem:

Variance of a Constant Times a Random Variable

For a random variable $X$ and a constant $a$, the variance of the product of $a$ and $X$ is:$$\operatorname{Var}(aX)=a^2\operatorname{Var}(X)$$

Theorem:

Variance of a Linear Combination

For random variables $X$ and $Y$, and constants $a$ and $b$, the variance of the linear combination $aX+bY$ is:$$\operatorname{Var}(aX+bY)=a^2\operatorname{Var}(X)+b^2\operatorname{Var}(Y)+2ab\operatorname{Cov}(X,Y)$$ where $\operatorname{Cov}(X,Y)$ is the covariance between $X$ and $Y$.

Theorem:

Variance of a Constant Random Variable

The variance of a constant random variable $X=k$ is zero:$$\operatorname{Var}(k)=0$$

Example

A bakery sells cakes and cookies, and the sales for each are modeled as random variables:

$X$ : The daily revenue from cakes (in dollars), with $\mathbb{E}[X]=50$ and $\operatorname{Var}(X)=25$
$Y$ : The daily revenue from cookies (in dollars), with $\mathbb{E}[Y]=30$ and $\operatorname{Var}(Y)=16$.

Assume the revenues from cakes and cookies are independent.

Example

A café tracks its daily revenue from coffee and muffins, which are modeled as random variables:

$C$ : The daily revenue from coffee sales (in dollars), with $\mathbb{E}[C]=80$ and $\operatorname{Var}(C)=36$
$M$ : The daily revenue from muffin sales (in dollars), with $\mathbb{E}[M]=50$ and $\operatorname{Var}(M)=25$.

The sales of coffee and muffins are dependent, with a covariance of $\operatorname{Cov}(C,M)=12$.

Example

A biologist is studying the relationship between sunlight exposure and plant growth. The growth of a plant ( $G$, in centimeters per week) depends on the hours of sunlight ( $H$, in hours per day) it receives. The relationship is modeled as: $$ G=2 H+5$$

where
$H$ : Hours of sunlight per day, with $\mathbb{E}[H]=6$ and $\operatorname{Var}(H)=1.5$.
$G$ : Plant growth (in $cm /$ week), which is dependent on $H$.

Binomial Distribution

The binomial distribution is a specific kind of discrete distribution. It is used to model the probability of obtaining a specified number of successes over a fixed number of trials, where each trial is independent and only has two outcomes.

Defintion:

Binomial Random Variable

A binomial random variable with parameters, $n$, (number of trials) and $p$, (probability of success) is a discrete random variable with pmf $$P(x)=\, C^n_x\,p^x(1-p)^{n-x}\qquad\qquad x=0,1,2,\dots, n$$

Note:$$ \sum_{x=1}^{n} p(x)=\sum_{x=1}^{n}\, C^n_x\, p^{x}(1-p)^{n-x}=(p+(1-p))^{n}=1^{n}=1$$

Features of a Binomial Experiment

The experiment consists of $n$ identical trials.

Each trial results in one of two outcomes: success or failure.

The probability of success, $p$, is the same for each trial.

The trials are independent.

The number of trials for the experiment, $n$, is fixed.

The Mean and Variance of A Binomial Random Variable

The expected value and variance of a binomial random variable provide important insights into its behavior.

Formula:

Mean of A Binomial Random Variable

If $X$ is a binomial random variable representing the number of successes in $n$ independent trials, where the probability of success in each trial is $p$, then the expected value of $X$ is given by: $$\mathbb{E}[X]=\mu=n\cdot p$$

The expected value indicates the average number pf successes in $n$ trials.

The variance is a meausre reflecting the variability in the number of successes

Formula:

Variance and Standard Deviation of A Binomial Random Variable

The variance of a binomial random variable is given by: $$\sigma^2=V(X)=n\cdot p\cdot q$$ where $q=1-p$ is the probability of failure. The standard deviation of $X$ is the square root of the variance. Thus, the standard deviation of $X$ is: $$\sigma=\sqrt{n\cdot p\cdot q}$$

The variance depends on both $n$ and $p$ : larger $n$ increases variability, while probabilities closer to 0.5 maximize variance due to greater uncertainty in the outcomes.

Example 1

There`s something about tall men that women find irresistible. An analysis of data obtained from Yahoo! revealed that for women, height is an important factor when it came to decisions on dating. In fact, only $8 \%$ of women would consider going on a date with a man who is shorter than $5^{\prime} 8''$.

Of the next 10 women that you encounter, what is the probability that exactly four of them would consider dating a man that is shorter than $5^{\prime}8''$?

Let $X$ be the number of the number of women who would consider dating a man that is shorter than $5'8''$
The pmf is $$P(X=x)=C^{10}_x\cdot (0.08)^x\cdot (1-0.08)^{10-x}\quad;\quad X=0,1,\dots,10$$
The probability that exactly four of them would consider dating a man who is under $5'8''$ is $P(X=4)=C^{10}_4\cdot (0.08)^4\cdot (0.92)^6=0.236$

Solution

Example 2

A survey conducted by the National Sleep Foundation found that $40\%$ of Americans get less than the recommended amount of sleep. If 10 Americans are randomly selected, what is the probability that exactly 3 of them get less than the recommended amount of sleep?

Let $X$ be the number of Americans who get less than the recommended amount of sleep.
The pmf is $$P(X=x)=C^{10}_x\cdot (0.4)^x\cdot (0.6)^{10-x}\quad;\quad X=0,1,\dots,10$$
The probability that exactly 3 of them get less than the recommended amount of sleep is $P(X=3)=C^{10}_3\cdot (0.4)^3\cdot (0.6)^7=0.215$

Solution

Long Example 1

On a trip to Finland in 2005, Italian Prime Minister, Silvio Berlusconi, insulted the country by saying that the Finns ate nothing but ``marinated reindeer`` meat and that their cuisine was something to be ``endured`` and not to be enjoyed. Not ones to take the diss lying down, the Finns entered an international pizza competition and won first place! Their winning entry, which featured wild mushrooms and smoked reindeer was called ``Pizza Berlusconi``. Revenge never tasted so good!

A recent poll found that $80\%$ of Finns regularly ordered Pizza Berlusconi when eating out. At a pizza restaurant in Helsinki, six people are about to sit down for lunch.

Example

Ark Encounter, a multi-million-dollar Noah's Ark replica/theme park in Williamstown, Kentucky, is all about surviving floods... theoretically. While the original Ark was designed to withstand 40 days and 40 nights of rain, the Kentucky version didn`t quite live up to the hype. Turns out, after 40-50 inches of rain fell in May '24, the ark itself fell victim to flood damage.

In a twist so ironic that it feels divinely scripted, Ark Encounter's lawyers are now suing their insurers for refusing to cover water damage.

The probability that an insurance company will honour and compensate a policy holder whose property was damaged by flooding is $65 \%$. In the next 30 claims where flooding was the cause of property damage, what is

Example

In 2017, Lee De Paauw from Queensland, Australia tried to impress a girl by jumping into a river after drinking a large quantity of wine. He was immediately mauled by a 3-metre crocodile. And despite managing to fight off the crocdile, he still failed to win a date with the girl of his interest.

Based on historical data, there's a 10% chance of encountering a crocodile when someone enters a specific river. In a bold and questionable move, a group of 15 daredevils has decided to jump into this crocodile-infested river to test their luck.

Probability Distributions and Probability Mass Functions