Presentation of Data

Once data has been collected, it needs to be organized and presented in a meaningful way so that trends and other pertinent information can be extracted from it. This chapter covers how to: construct a frequency table, calculate cumulative frequencies, and create two types of graphical representations for data; the histogram and the bar chart.

Frequency Table

A frequency table is a table that shows how many times a particular value or range of values occurs in a data set. It is a useful way to organize data and identify patterns.

Definition:

Frequency Table

A frequency table organizes data into classes or intervals, and shows how many data points fall into each category.

How To Make A Frequency Table

Sort the data into ascending order.
Decide on the number of classes or intervals to be used (typically between $5$ and $10$ classes).
Calculate the class width: $$\begin{align*} \text{class width} &=\frac{Largest\, value - Smallest\, value}{ number \, of\, classes} \\ \quad & \Rightarrow \quad \text{and round up } \end{align*} $$
Work out the class limits.

To generate the lower limit of each class, start with the smallest value in the data set. Move down the table, and add the class width to each preceding value until the number of desired classes have been populated with its respective lower endpoint.

To determine the upper limit for the first class, subtract 1 from the lower limit of the second class. Move down the table and close off each class by adding the class width to the preceding value.
Tally the data

Rule of Thumb

The number of classes is usually between $5$ and $10$

Using fewer than five classes, could result in the loss of too much information
Using more than ten classes could result in poor data synthesis.

Remark

The above technique for creating a frequency table ensures that:

Each data point is included in one and only one class.
The classes are mutually exclusive (ie. none of the classes overlap each other).
The classes are exhaustive (ie. all of the data is captured and accounted for).

Example 1

Consider the following data set: $$1, 2, 3, 4, 5, 6, 7, 8, 9, 10 $$ If we decide to use $5$ classes. Determine the class width.

The class width is calculated as follows:

$$\text{class width}=\frac{Largest - Smallest}{Number\, of \, classes}=\frac{10 - 1}{5} = 1.8 \quad \Rightarrow \quad 2$$

Therefore, the class width is $2$.

Solution

Example 2

Consider the following data set: $$27, 30, 33, 45, 45, 46, 49, $$ $$55, 56, 58, 64, 67, 70 $$ If we decide to use $7$ classes. Determine the class limits.

The class width is calculated as follows:

$$\text{class width} = \frac{Largest - Smallest}{Number\, of \, classes}=\frac{70 - 27}{7} = 6.14 \quad \Rightarrow \quad 7$$

Therefore, the class width is $7$.

Solution

Example 3

Consider the following data set: $$27, 31, 35, 46, 46, 49, $$ $$ 49, 53, 55, 58, 60, 77 $$ If we decide to use $5$ classes. Determine the class limits.

The class width is calculated as follows:

$$\text{class width} = \frac{Largest - Smallest}{Number\, of \, classes}=\frac{77 - 27}{5} = 10 \quad \Rightarrow \quad 11$$

Therefore, the class width is $11$.

Solution

Example 4

Consider the following data set: $$27, 30, 33, 45, 45, 46, 49, $$ $$ 55, 56, 58, 64, 67, 70 $$ If we decide to use $6$ classes.

Example 5

Consider the following data set: $$12, 23, 33, 43, 44, 47, 49, 52, 56, 58, 60, 65, 75 $$ If we decide to use $7$ classes.

Frequency Distributions

A table that just displays the frequency counts may not be the best way to present data. More meaningful information and trends can be revealed if the counts in each class are converted into proportions or percentages.

Definition:

Frequency Distribution

A frequency distribution is an extension of a frequency table. In addition to exhibiting the frequency counts in each class, a frequency distribution also includes: midpoints for each class, classboundaries, relative frequencies, cumulative frequencies.

The midpoint is the middle number of each class. It is obtained by adding the upper and lower limits together and then dividing by two. The midpoint is used to estimate the average or mean for grouped data.
The relative frequency displays the proportion of data that falls into each category. It can be represented in decimal format or in percentages.

Formula:

Relative Frequencies

For each class, the relative frequency is calculated by dividing the frequency of that class by the total number of observations. $$\begin{align} \text{Relative Frequency} &=\frac{f}{n} \end{align}$$ where

$f=$ the frequency of the class
$n=$ the total number of observations.

Example 6

Wayne Enterprises is a US based conglomerate with interests in technology, manufacturing, and weapons research. The company's CEO, Bruce Wayne, likes to keep track of how long each of his employees are away on their lunch break. Here are the results of the 60 employees who work for him rounded to the nearest minute. $$\begin{array}{lllllllllllllll} 35 & 35 & 35 & 35 & 35 & 36 & 36 & 36 & 36 & 36 & 36 & 36 & 36 & 36 & 36 \\ 36 & 37 & 37 & 38 & 38 & 38 & 39 & 40 & 40 & 40 & 40 & 40 & 40 & 40 & 40 \\ 41 & 41 & 41 & 41 & 41 & 41 & 42 & 42 & 43 & 43 & 43 & 43 & 43 & 43 & 44 \\ 44 & 44 & 44 & 44 & 45 & 45 & 45 & 45 & 47 & 48 & 49 & 50 & 50 & 50 & 55\end{array}$$

Cumulative Frequencies

The cumulative frequency is the sum of the frequencies of all classes up to and including the current class. It is used to determine the number of observations that fall below a certain value. The cumulative frequency for the last class should equal the total number of observations.

Definition:

Less-Than Cumulative Frequencies (LTCF)

The less-than cumulative frequencies reports the number of data values that are less than the upper limit of each class.

Example 7

Consider the lunch break data for Wayne Enterprises again. $$\begin{array}{lllllllllllllll} 35 & 35 & 35 & 35 & 35 & 36 & 36 & 36 & 36 & 36 & 36 & 36 & 36 & 36 & 36 \\ 36 & 37 & 37 & 38 & 38 & 38 & 39 & 40 & 40 & 40 & 40 & 40 & 40 & 40 & 40 \\ 41 & 41 & 41 & 41 & 41 & 41 & 42 & 42 & 43 & 43 & 43 & 43 & 43 & 43 & 44 \\ 44 & 44 & 44 & 44 & 45 & 45 & 45 & 45 & 47 & 48 & 49 & 50 & 50 & 50 & 55\end{array}$$

Definition:

More-Than Cumulative Frequencies (MTCF)

The more-than cumulative frequencies reports the number of data values that are greater than the lower limit of each class.

Example 8

Here is lunch break data for Wayne Enterprises again. $$\begin{array}{lllllllllllllll} 35 & 35 & 35 & 35 & 35 & 36 & 36 & 36 & 36 & 36 & 36 & 36 & 36 & 36 & 36 \\ 36 & 37 & 37 & 38 & 38 & 38 & 39 & 40 & 40 & 40 & 40 & 40 & 40 & 40 & 40 \\ 41 & 41 & 41 & 41 & 41 & 41 & 42 & 42 & 43 & 43 & 43 & 43 & 43 & 43 & 44 \\ 44 & 44 & 44 & 44 & 45 & 45 & 45 & 45 & 47 & 48 & 49 & 50 & 50 & 50 & 55\end{array}$$

Example 9

The Presidential Daily Briefing (PDB) is a top-secret document which details the most pressing issues and developments from around the world. The report is prepared everyday for the US President to review, but Donald Trump says that he doesn't need the Daily Briefing because he's ``smart``. Plus he doesn't like reading - so he has someone read it to him while he listens. He also mentioned that he would like the PBDs to be shorter and contain more ``killer graphics``. The number of pages for $75$ PDBs are as follows: $$\begin{array}{ccccccccccccccc} 50 & 51 & 51 & 52 & 52 & 53 & 53 & 54 & 55 & 55 & 56 & 56 & 57 & 57 & 57 \\ 60 & 60 & 60 & 61 & 62 & 63 & 64 & 65 & 66 & 66 & 68 & 69 & 69 & 70 & 70 \\ 70 & 70 & 70 & 71 & 71 & 71 & 72 & 73 & 73 & 74 & 75 & 75 & 75 & 78 & 79 \\ 80 & 80 & 82 & 83 & 87 & 88 & 94 & 95 & 95 & 96 & 99 & 99 & 99 & 99 & 100 \\ 100 & 101 & 101 & 112 & 112 & 113 & 114 & 115 & 117 & 117 & 120 & 121 & 124 & 124 & 124\end{array}

Graphical Representation of Data

Once data has been collected, it needs to be organized and presented in a meaningful way so that trends and other pertinent information can be extracted from it.

While tables structure information so that the data is more useful to us, it is not the most efficient way of interpreting data. To effectively analyse data, a visual representation is needed. Charts, histograms, and graphs serve as a more powerful means to uncover patterns and trends hidden withinthe data; ones which are not evident by simply looking at the values in a table.

Histograms and Bar Charts

Histograms and bar charts are two of the most common ways to visually represent data. They are used to display the frequency of data points in a data set.

In a histogram, the continuous data is grouped into intervals, and the height of each bar represents the frequency of data points in that interval. In a bar chart, the discrete or categorical data is grouped into categories, and the height of each bar represents the frequency of data points in that category.

Prior to selecting the type of graph to use, it is important to understand the data set and the type of data being represented.

Histograms are used to to illustrate data that is continuous

Bar charts are used to illustrate data that is categorical or discrete

Remark

For both types of graphs, the class limits, class boundaries, or categories are placed on the x-axis, while the frequency (either in counts, decimal, or percentages) are placed on the y-axis.

Bar Charts

Example 10

The Codfather is a restaurant located in South Carolina that serves fish and chips. According to Trip Advisor, it is the #1 eatery in town, and the data below shows the number of cod and fries lunch combos served up in 2021. $$\ \begin{array}{|c|c|} \hline \text{Month} & \text{Cod and Fries Lunches Served} \\ \hline \text{January} & 186 \\ \text{February} & 131 \\ \text{March} & 123 \\ \text{April} & 98 \\ \text{May} & 87 \\ \text{June} & 56 \\ \text{July} & 45 \\ \text{August} & 23 \\ \text{September} & 12 \\ \text{October} & 63 \\ \text{November} & 73 \\ \text{December} & 81 \\ \hline\end{array} $$

Number of Cod and Fries Lunches Served

Example 11

Life of Pie is a pie shop whose name is a pun on the novel and film ``Life of Pi.`` The data below show the types of pies which were sold this past week. $$\begin{array}{|c|c|} \hline \text{Pie Filling} & \text{Pies Sold} \\ \hline Apple & 186 \\ Blueberry & 131 \\ Custard & 123 \\ Keylime & 98 \\ Pumpkin \, Spice & 87 \\ Coconut \, Cream & 56 \\ Strawberry & 45 \\ \hline \end{array} $$

Pie Sales

Example 12

A die is rolled 100 times, and the number of dots that appears is recorded in the table below $$\begin{array}{|c|c|} \hline \text{Dots} & \text{Frequency} \\ \hline 1 & 18 \\ 2 & 13 \\ 3 & 13 \\ 4 & 15 \\ 5 & 12 \\ 6 & 16 \\ \hline \end{array} $$

Number of Dots on a Die

Histograms

Remark

Prior to sketching a histogram, the gaps that occur between the classes need to be eliminated. To achieve this, each class limit is extended outwards by half a unit to create class boundaries . In other words, each lower limit is decreased by $0.5$ and each upper limit is increased by $0.5$.

Example 13

Planet of the Grapes is a wine store with a name inspired by the movie ``Planet of the Apes.`` The data below shows the prices for the bottles of wine sold last week. $$\begin{array}{|c|c|} \hline \text{Price of Bottle} & \text{Frequency} \\ \hline 0-50 & 3 \\ 51-100 & 7 \\ 101-150 & 12 \\ 151-200 & 8 \\ 201-250 & 5 \\ 251-300 & 3 \\ \hline \end{array} $$

Solution

Since prices are continuous, the class limits are extended by $0.5$ to create class boundaries. The table and histogram below illustrates the frequency of wine bottles sold at Planet of the Grapes. $$\begin{array}{|c|c|c|} \hline \text{Price of Bottle} & \text{Class Boundaries} & \text{Frequency} \\ \hline 0-50 & -0.5 - 50.50 & 3 \\ 51-100 & 50.5-100.5 & 7 \\ 101-150 & 100.5-150.5 & 12 \\ 151-200 & 150.5-200.5 & 8 \\ 201-250 & 200.5-250.5 & 5 \\ 251-300 & 250.5-300.5 & 3 \\ \hline \end{array} $$

Price of Wine Bottles

Shapes and Attributes

A picture is worth a thousand words and a graph is no exception. The shape and attributes of a histogram/bar graph can reveal: if the data is symmetrically distributed or loaded to one side, if there are outliers in the data set, which intervals contain the most data, and how the data is spread out.

When data is skewed to the left in a histogram, the tail of the graph is on the left side. In other words, there are a small number of very low values pulling the mean toward the left.

Left (Negatively) Skewed Distribution

When data is skewed to the right in a histogram, the tail of the graph is on the right side. In other words, there are a small number of very high values pulling the mean toward the right.

Right (Positively) Skewed Distribution

A distribution is symmetric when its left and right sides are mirror images of each other around a central point

Symmetric Distribution

A distribution is uniform when all the classes have equal frequencies.

Uniform Distribution

A distribution is bimodal when it has two peaks. This means that there are two values that occur most frequently in the data set

Bimodal Distribution

Example 14

When it comes to choosing between losing a limb or living without internet of Americans who responded to a survey would rather cut off a finger than cut the internet. Americans spend more time consuming media online than they do sleeping, averaging about 10.5 hours per day connected to the internet. A random sample of 56 people in the U.S. were asked how many minutes they spend online per day. Here are the results: $$\begin{array}{llllllllllllll} 120 & 121 & 125 & 127 & 127 & 128 & 130 & 134 & 134 & 134 & 130 & 134 & 134 & 135 \\ 141 & 141 & 142 & 142 & 143 & 144 & 145 & 146 & 147 & 147 & 148 & 148 & 148 & 148 \\ 150 & 150 & 150 & 150 & 150 & 151 & 152 & 153 & 154 & 154 & 155 & 155 & 155 & 155 \\ 155 & 155 & 157 & 157 & 157 & 157 & 157 & 158 & 158 & 158 & 158 & 158 & 160 & 160 \end{array}$$

Below is the histogram that represents the data collected from the survey.

Number of Minutes Spent Online

Exercises

Question 1

What is the primary difference between the types of data visualized by a bar chart and a histogram?

Bar charts display frequencies for discrete data, with gaps between bars, while histograms show frequencies for continuous data, with bars touching to represent intervals.

Solution

Question 2

You are given data on the number of students who prefer different types of pizza toppings (e.g., pepperoni, cheese, veggie). Would you use a bar chart or a histogram to visualize this data?

Bar chart; because pizza toppings are categorical, and there is no need to group the data into intervals.

Solution

Question 3

A dataset represents the heights of 100 people measured in centimeters. Would a bar chart or a histogram be more appropriate for visualizing this data? Why?

Histogram; because height is a continuous variable, and it is best visualized by grouping the data into intervals.

Solution

Question 4

Describe a scenario where using a bar chart might lead to confusion, and explain why a histogram would be more suitable.

If visualizing continuous data like temperature with a bar chart, the gaps between bars might mislead viewers into thinking the data is discrete. A histogram would clarify the distribution over intervals.

Solution

Question 5

For the random variable ``Number of cars owned by households in a city,`` should you use a bar chart or a histogram?.

Bar chart because the ``number of cars`` is a discrete variable with specific values (e.g., 0, 1, 2).

Solution

Question 6

The sales of a store are tracked daily over a week, with each day categorized (Monday, Tuesday, etc.). Which type of chart should be used, and why?

Bar chart because the days of the week are categorical data.

Solution

Question 7

A researcher collects data on the time (in seconds) it takes for participants to complete a task. If they want to display the frequency distribution of completion times, which visualization should they use and why?

Histogram because completion times are continuous data that can be grouped into intervals.

Solution

Question 8

A teacher records the letter grades (A, B, C, D, F) of students in a class. Should the teacher use a bar chart or a histogram to represent this data? Explain

Bar chart because letter grades are categorical data with distinct categories.

Solution

Question 9

A survey collects data on the ages of participants in years. Should the data be visualized with a bar chart or a histogram?

Histogram because age is a continuous variable that can be grouped into intervals.

Solution

Question 10

Discuss how grouping data into intervals (bins) in a histogram could affect the interpretation of a continuous random variable. Would this grouping be appropriate for discrete data?

Grouping data into bins for a histogram simplifies the visualization of a continuous random variable but may obscure details if the bin width is too large. For discrete data, grouping is unnecessary as each value is already distinct.

Solution

Question 11

In 1753, a London magazine listed the following popular wig styles: the cauliflower, the pigeon's wing, the rhinoceros, the staircase, and the wild boar's back. Suppose a survey conducted at a wig shop that year recorded the preferences of 100 customers for these wig styles, resulting in the following data:

The data below shows the weights of $40$ cauliflower wigs (in grams) from 1753

$$\begin{array}{rrrrrrrrrr} \hline 570 & 572 & 573 & 575 & 576 & 578 & 580 & 583 & 591 & 595 \\ \hline 602 & 607 & 620 & 625 & 628 & 628 & 629 & 630 & 632 & 632 \\ \hline 633 & 633 & 634 & 638 & 639 & 640 & 644 & 645 & 646 & 646 \\ \hline 653 & 657 & 658 & 663 & 666 & 667 & 667 & 670 & 675 & 675 \\ \hline \end{array}$$

Question 12

Deimos, one of the moons of Mars, has a crater called Swift, and the other one, Phobos, has an area called Laputa. That's because in Jonathan Swift's Gulliver's Travels astronomers discover that Mars has 2 moons. The novel came out in 1726 - 151 years before the actual discovery.

Forty students were asked to read Gulliver's Travels last week in one sitting. The time taken by each student to read the book is recorded in the table below.

$$\begin{array}{rrrrrrrrrr} \hline 100 & 100 & 101 & 101 & 102 & 103 & 103 & 105 & 105 & 106 \\ \hline 109 & 109 & 109 & 111 & 112 & 113 & 114 & 115 & 115 & 118 \\ \hline 118 & 119 & 122 & 124 & 125 & 126 & 127 & 127 & 127 & 128 \\ \hline 129 & 131 & 133 & 133 & 133 & 133 & 135 & 140 & 140 & 142 \\ \hline \end{array}$$

Question 13

A 1970s cartoon called Rascal the Raccoon sparked a craze in Japan for keeping raccoons as pets. Their feral offspring have since caused so much damage that Tokyo now has a hotline you can call to report sightings.

The weight of 30 pet racoons, rounded to the nearest pound, is presented below $$ \begin{array}{rrrrrrrrrr} \hline 9 & 9 & 10 & 10 & 10 & 10 & 10 & 11 & 11 & 11 \\ \hline 11 & 12 & 12 & 12 & 13 & 13 & 13 & 14 & 14 & 15 \\ \hline 15 & 16 & 16 & 17 & 17 & 17 & 18 & 19 & 19 & 20 \\ \hline \end{array} $$

Question 14

A recent poll in the UK found that $12\%$ of Britons love their pets more they love their partner. $9 \%$ love their pets more than they love their children. Forty Brits were asked how much time they spent on average with their pets each day. The results are shown in the table below.

$$\begin{array}{rrrrrrrrrr} \hline 120 & 121 & 121 & 123 & 124 & 127 & 128 & 130 & 130 & 132 \\ \hline 133 & 135 & 136 & 138 & 139 & 139 & 140 & 141 & 148 & 148 \\ \hline 150 & 150 & 151 & 154 & 155 & 162 & 162 & 163 & 163 & 163 \\ \hline 164 & 168 & 168 & 170 & 176 & 177 & 177 & 178 & 179 & 180 \\ \hline \end{array}$$

Question 15

In the United States, you can mail live scorpions - if they are inside a box labeled ``live scorpions``, inside another box also labeled ``live scorpions``.

The life span of 40 American scoprpions in months is presented below $$\begin{array}{rrrrrrrrrr} \hline 37 & 38 & 38 & 38 & 38 & 40 & 41 & 41 & 41 & 42 \\ \hline 43 & 44 & 45 & 47 & 47 & 47 & 48 & 49 & 51 & 51 \\ \hline 51 & 52 & 55 & 56 & 56 & 56 & 57 & 58 & 58 & 59 \\ \hline 60 & 60 & 62 & 63 & 65 & 65 & 66 & 66 & 68 & 68 \\ \hline \end{array}$$

Question 16

A Paris company has set up a ``hospital`` to help bring people's dying office pot plants back to life.

The number of days that 40 office planty spent recuperating at the hospital is shown below $$\begin{array}{cccccccccc} \hline 6 & 9 & 9 & 9 & 10 & 10 & 11 & 12 & 12 & 13 \\ \hline 13 & 14 & 15 & 15 & 16 & 17 & 18 & 19 & 19 & 19 \\ \hline 21 & 21 & 21 & 22 & 22 & 25 & 25 & 25 & 25 & 26 \\ \hline 27 & 27 & 27 & 28 & 28 & 28 & 29 & 29 & 30 & 30 \\ \hline \end{array} $$

Presentation of Data

Frequency Table

Definition:

Frequency Table

How To Make A Frequency Table

Rule of Thumb

Remark

Example 1

Example 2

Example 3

Example 4

Determine the class width.

Determine the class limits.

Tally the data.

Example 5

Determine the class width.

Determine the class limits.

Tally the data.

Frequency Distributions

Definition:

Frequency Distribution

Formula:

Relative Frequencies

Example 6

Using five classes, calculate the class width

Create a table that shows the class limits, midpoints, frequencies distribution, and relative frequencies both in decimal and in percentages.

What is the proportion of workers whose lunch break lasted between 40 and 44 minutes?

How many workers took a lunch break that lasted between 50 and 54 minutes?

Cumulative Frequencies

Definition:

Less-Than Cumulative Frequencies (LTCF)

Example 7

Using five classes, create a table that shows the class limits, and the less-than cumulative frequencies in decimal and percentage

Using the table above, determine the number of workers who were on lunch for 54 minutes or less.

Calculate the proportion of workers were away from their desks for 44 minutes or less.

What is the percentage of workers took a lunch break that was 39 minutes or less than?

Definition:

More-Than Cumulative Frequencies (MTCF)

Example 8

Using five classes, create a table that shows the class limits, and the more-than cumulative frequencies in decimal and percentage

Using the table above, determine the percentage of workers who were on lunch for 35 minutes or more.

Calculate the proportion of workers were away from their desks for at least 45 minutes or more.

How many workers took a lunch break that lasted 50 minutes or more?

Example 9

Using six classes, calculate the class width

Create a table that shows the class limits, frequencies distribution, relative frequencies percentages, the less-than cumulative frequencies in percentages, and the more-than cumulative frequencies in percentages.

What is the proportion of PDBs that were between 76 and 88 pages?

How many PDBs contain 115 pages or more?

How many PDBs contain 101 pages or less?

What is the percentage of PDBs that contain 75 pages or less?

What is the probability that a randomly selected PBD contains 89-100 pages?

What is the probability that a randomly selected PDB that contain 63 pages or more?

What is the probability that a randomly selected PDB contains 101 pages or less?

What is the probability that a randomly selected PDB contains between 89 and 114 pages?

Graphical Representation of Data

Histograms and Bar Charts

Remark

Bar Charts

Example 10

Example 11

Example 12

A die is rolled 100 times, and the number of dots that appears is recorded in the table below $$\begin{array}{|c|c|} \hline \text{Dots} & \text{Frequency} \\ \hline 1 & 18 \\ 2 & 13 \\ 3 & 13 \\ 4 & 15 \\ 5 & 12 \\ 6 & 16 \\ \hline \end{array} $$

Histograms

Remark

Example 13

Solution

Shapes and Attributes

Example 14

Using six classes organize the data into a table showing the class limits, class boundaries, frequencies, relative frequencies (in percentages) and the more-than cumulative frequencies (both in counts and in percentages)

What percentage of the respondents said that the spent at least 141 minutes per day online?

How many of the respondents said that they spent 155 minutes or more per day online?

How many respondents reported spending at most 126 minutes per day online

By looking only at the frequencies in the table, determine if the distribution is symmetric or skewed. If skewed, which way?

If we wanted to graph the data, what type of graph would be most appropriate? Histogram or bar chart?

Exercises

Question 1

Question 2

Question 3

Question 4

Question 5