Presentation of Data
Frequency Table
Definition:
Frequency Table
How To Make A Frequency Table
- Sort the data into ascending order.
- Decide on the number of classes or intervals to be used (typically between $5$ and $10$ classes).
- Calculate the class width: $$\begin{align*} \text{class width} &=\frac{Largest\, value - Smallest\, value}{ number \, of\, classes} \\ \quad & \Rightarrow \quad \text{and round up } \end{align*} $$
- Work out the class limits.
To generate the lower limit of each class, start with the smallest value in the data set. Move down the table, and add the class width to each preceding value until the number of desired classes have been populated with its respective lower endpoint.
To determine the upper limit for the first class, subtract 1 from the lower limit of the second class. Move down the table and close off each class by adding the class width to the preceding value. - Tally the data
Rule of Thumb
- Using fewer than five classes, could result in the loss of too much information
- Using more than ten classes could result in poor data synthesis.
Remark
- Each data point is included in one and only one class.
- The classes are mutually exclusive (ie. none of the classes overlap each other).
- The classes are exhaustive (ie. all of the data is captured and accounted for).
Example 1
Consider the following data set: $$1, 2, 3, 4, 5, 6, 7, 8, 9, 10 $$ If we decide to use $5$ classes. Determine the class width.
The class width is calculated as follows:
$$\text{class width}=\frac{Largest - Smallest}{Number\, of \, classes}=\frac{10 - 1}{5} = 1.8 \quad \Rightarrow \quad 2$$
Therefore, the class width is $2$.
Solution
Example 2
Consider the following data set: $$27, 30, 33, 45, 45, 46, 49, $$ $$55, 56, 58, 64, 67, 70 $$ If we decide to use $7$ classes. Determine the class limits.
The class width is calculated as follows:
$$\text{class width} = \frac{Largest - Smallest}{Number\, of \, classes}=\frac{70 - 27}{7} = 6.14 \quad \Rightarrow \quad 7$$
Therefore, the class width is $7$.
Solution
Example 3
Consider the following data set: $$27, 31, 35, 46, 46, 49, $$ $$ 49, 53, 55, 58, 60, 77 $$ If we decide to use $5$ classes. Determine the class limits.
The class width is calculated as follows:
$$\text{class width} = \frac{Largest - Smallest}{Number\, of \, classes}=\frac{77 - 27}{5} = 10 \quad \Rightarrow \quad 11$$
Therefore, the class width is $11$.
Solution
Example 4
Consider the following data set: $$27, 30, 33, 45, 45, 46, 49, $$ $$ 55, 56, 58, 64, 67, 70 $$ If we decide to use $6$ classes.
Example 5
Consider the following data set: $$12, 23, 33, 43, 44, 47, 49, 52, 56, 58, 60, 65, 75 $$ If we decide to use $7$ classes.
Frequency Distributions
Definition:
Frequency Distribution
- The midpoint is the middle number of each class. It is obtained by adding the upper and lower limits together and then dividing by two. The midpoint is used to estimate the average or mean for grouped data.
- The relative frequency displays the proportion of data that falls into each category. It can be represented in decimal format or in percentages.
Formula:
Relative Frequencies
$f=$ the frequency of the class
$n=$ the total number of observations.
Example 6
Wayne Enterprises is a US based conglomerate with interests in technology, manufacturing, and weapons research. The company's CEO, Bruce Wayne, likes to keep track of how long each of his employees are away on their lunch break. Here are the results of the 60 employees who work for him rounded to the nearest minute. $$\begin{array}{lllllllllllllll} 35 & 35 & 35 & 35 & 35 & 36 & 36 & 36 & 36 & 36 & 36 & 36 & 36 & 36 & 36 \\ 36 & 37 & 37 & 38 & 38 & 38 & 39 & 40 & 40 & 40 & 40 & 40 & 40 & 40 & 40 \\ 41 & 41 & 41 & 41 & 41 & 41 & 42 & 42 & 43 & 43 & 43 & 43 & 43 & 43 & 44 \\ 44 & 44 & 44 & 44 & 45 & 45 & 45 & 45 & 47 & 48 & 49 & 50 & 50 & 50 & 55\end{array}$$
Cumulative Frequencies
Definition:
Less-Than Cumulative Frequencies (LTCF)
Example 7
Consider the lunch break data for Wayne Enterprises again. $$\begin{array}{lllllllllllllll} 35 & 35 & 35 & 35 & 35 & 36 & 36 & 36 & 36 & 36 & 36 & 36 & 36 & 36 & 36 \\ 36 & 37 & 37 & 38 & 38 & 38 & 39 & 40 & 40 & 40 & 40 & 40 & 40 & 40 & 40 \\ 41 & 41 & 41 & 41 & 41 & 41 & 42 & 42 & 43 & 43 & 43 & 43 & 43 & 43 & 44 \\ 44 & 44 & 44 & 44 & 45 & 45 & 45 & 45 & 47 & 48 & 49 & 50 & 50 & 50 & 55\end{array}$$
Definition:
More-Than Cumulative Frequencies (MTCF)
Example 8
Here is lunch break data for Wayne Enterprises again. $$\begin{array}{lllllllllllllll} 35 & 35 & 35 & 35 & 35 & 36 & 36 & 36 & 36 & 36 & 36 & 36 & 36 & 36 & 36 \\ 36 & 37 & 37 & 38 & 38 & 38 & 39 & 40 & 40 & 40 & 40 & 40 & 40 & 40 & 40 \\ 41 & 41 & 41 & 41 & 41 & 41 & 42 & 42 & 43 & 43 & 43 & 43 & 43 & 43 & 44 \\ 44 & 44 & 44 & 44 & 45 & 45 & 45 & 45 & 47 & 48 & 49 & 50 & 50 & 50 & 55\end{array}$$
Example 9
The Presidential Daily Briefing (PDB) is a top-secret document which details the most pressing issues and developments from around the world. The report is prepared everyday for the US President to review, but Donald Trump says that he doesn't need the Daily Briefing because he's ``smart``. Plus he doesn't like reading - so he has someone read it to him while he listens. He also mentioned that he would like the PBDs to be shorter and contain more ``killer graphics``. The number of pages for $75$ PDBs are as follows: $$\begin{array}{ccccccccccccccc} 50 & 51 & 51 & 52 & 52 & 53 & 53 & 54 & 55 & 55 & 56 & 56 & 57 & 57 & 57 \\ 60 & 60 & 60 & 61 & 62 & 63 & 64 & 65 & 66 & 66 & 68 & 69 & 69 & 70 & 70 \\ 70 & 70 & 70 & 71 & 71 & 71 & 72 & 73 & 73 & 74 & 75 & 75 & 75 & 78 & 79 \\ 80 & 80 & 82 & 83 & 87 & 88 & 94 & 95 & 95 & 96 & 99 & 99 & 99 & 99 & 100 \\ 100 & 101 & 101 & 112 & 112 & 113 & 114 & 115 & 117 & 117 & 120 & 121 & 124 & 124 & 124\end{array}
Graphical Representation of Data
Histograms and Bar Charts
In a histogram, the continuous data is grouped into intervals, and the height of each bar represents the frequency of data points in that interval. In a bar chart, the discrete or categorical data is grouped into categories, and the height of each bar represents the frequency of data points in that category.
Remark
Bar Charts
Example 10
The Codfather is a restaurant located in South Carolina that serves fish and chips. According to Trip Advisor, it is the #1 eatery in town, and the data below shows the number of cod and fries lunch combos served up in 2021. $$\ \begin{array}{|c|c|} \hline \text{Month} & \text{Cod and Fries Lunches Served} \\ \hline \text{January} & 186 \\ \text{February} & 131 \\ \text{March} & 123 \\ \text{April} & 98 \\ \text{May} & 87 \\ \text{June} & 56 \\ \text{July} & 45 \\ \text{August} & 23 \\ \text{September} & 12 \\ \text{October} & 63 \\ \text{November} & 73 \\ \text{December} & 81 \\ \hline\end{array} $$
Number of Cod and Fries Lunches Served
Example 11
Life of Pie is a pie shop whose name is a pun on the novel and film ``Life of Pi.`` The data below show the types of pies which were sold this past week. $$\begin{array}{|c|c|} \hline \text{Pie Filling} & \text{Pies Sold} \\ \hline Apple & 186 \\ Blueberry & 131 \\ Custard & 123 \\ Keylime & 98 \\ Pumpkin \, Spice & 87 \\ Coconut \, Cream & 56 \\ Strawberry & 45 \\ \hline \end{array} $$
Pie Sales
Example 12
A die is rolled 100 times, and the number of dots that appears is recorded in the table below $$\begin{array}{|c|c|} \hline \text{Dots} & \text{Frequency} \\ \hline 1 & 18 \\ 2 & 13 \\ 3 & 13 \\ 4 & 15 \\ 5 & 12 \\ 6 & 16 \\ \hline \end{array} $$
Number of Dots on a Die
Histograms
Remark
Example 13
Planet of the Grapes is a wine store with a name inspired by the movie ``Planet of the Apes.`` The data below shows the prices for the bottles of wine sold last week. $$\begin{array}{|c|c|} \hline \text{Price of Bottle} & \text{Frequency} \\ \hline 0-50 & 3 \\ 51-100 & 7 \\ 101-150 & 12 \\ 151-200 & 8 \\ 201-250 & 5 \\ 251-300 & 3 \\ \hline \end{array} $$
Solution
Since prices are continuous, the class limits are extended by $0.5$ to create class boundaries. The table and histogram below illustrates the frequency of wine bottles sold at Planet of the Grapes. $$\begin{array}{|c|c|c|} \hline \text{Price of Bottle} & \text{Class Boundaries} & \text{Frequency} \\ \hline 0-50 & -0.5 - 50.50 & 3 \\ 51-100 & 50.5-100.5 & 7 \\ 101-150 & 100.5-150.5 & 12 \\ 151-200 & 150.5-200.5 & 8 \\ 201-250 & 200.5-250.5 & 5 \\ 251-300 & 250.5-300.5 & 3 \\ \hline \end{array} $$
Price of Wine Bottles
Shapes and Attributes
Left (Negatively) Skewed Distribution
Right (Positively) Skewed Distribution
Symmetric Distribution
Uniform Distribution
Bimodal Distribution
Example 14
When it comes to choosing between losing a limb or living without internet of Americans who responded to a survey would rather cut off a finger than cut the internet. Americans spend more time consuming media online than they do sleeping, averaging about 10.5 hours per day connected to the internet. A random sample of 56 people in the U.S. were asked how many minutes they spend online per day. Here are the results: $$\begin{array}{llllllllllllll} 120 & 121 & 125 & 127 & 127 & 128 & 130 & 134 & 134 & 134 & 130 & 134 & 134 & 135 \\ 141 & 141 & 142 & 142 & 143 & 144 & 145 & 146 & 147 & 147 & 148 & 148 & 148 & 148 \\ 150 & 150 & 150 & 150 & 150 & 151 & 152 & 153 & 154 & 154 & 155 & 155 & 155 & 155 \\ 155 & 155 & 157 & 157 & 157 & 157 & 157 & 158 & 158 & 158 & 158 & 158 & 160 & 160 \end{array}$$
Number of Minutes Spent Online
Exercises
Question 1
What is the primary difference between the types of data visualized by a bar chart and a histogram?
Bar charts display frequencies for discrete data, with gaps between bars, while histograms show frequencies for continuous data, with bars touching to represent intervals.
Solution
Question 2
You are given data on the number of students who prefer different types of pizza toppings (e.g., pepperoni, cheese, veggie). Would you use a bar chart or a histogram to visualize this data?
Bar chart; because pizza toppings are categorical, and there is no need to group the data into intervals.
Solution
Question 3
A dataset represents the heights of 100 people measured in centimeters. Would a bar chart or a histogram be more appropriate for visualizing this data? Why?
Histogram; because height is a continuous variable, and it is best visualized by grouping the data into intervals.
Solution
Question 4
Describe a scenario where using a bar chart might lead to confusion, and explain why a histogram would be more suitable.
If visualizing continuous data like temperature with a bar chart, the gaps between bars might mislead viewers into thinking the data is discrete. A histogram would clarify the distribution over intervals.
Solution
Question 5
For the random variable ``Number of cars owned by households in a city,`` should you use a bar chart or a histogram?.
Bar chart because the ``number of cars`` is a discrete variable with specific values (e.g., 0, 1, 2).
Solution
Question 6
The sales of a store are tracked daily over a week, with each day categorized (Monday, Tuesday, etc.). Which type of chart should be used, and why?
Bar chart because the days of the week are categorical data.
Solution
Question 7
A researcher collects data on the time (in seconds) it takes for participants to complete a task. If they want to display the frequency distribution of completion times, which visualization should they use and why?
Histogram because completion times are continuous data that can be grouped into intervals.
Solution
Question 8
A teacher records the letter grades (A, B, C, D, F) of students in a class. Should the teacher use a bar chart or a histogram to represent this data? Explain
Bar chart because letter grades are categorical data with distinct categories.
Solution
Question 9
A survey collects data on the ages of participants in years. Should the data be visualized with a bar chart or a histogram?
Histogram because age is a continuous variable that can be grouped into intervals.
Solution
Question 10
Discuss how grouping data into intervals (bins) in a histogram could affect the interpretation of a continuous random variable. Would this grouping be appropriate for discrete data?
Grouping data into bins for a histogram simplifies the visualization of a continuous random variable but may obscure details if the bin width is too large. For discrete data, grouping is unnecessary as each value is already distinct.
Solution
Question 11
In 1753, a London magazine listed the following popular wig styles: the cauliflower, the pigeon's wing, the rhinoceros, the staircase, and the wild boar's back. Suppose a survey conducted at a wig shop that year recorded the preferences of 100 customers for these wig styles, resulting in the following data:
The data below shows the weights of $40$ cauliflower wigs (in grams) from 1753
$$\begin{array}{rrrrrrrrrr} \hline 570 & 572 & 573 & 575 & 576 & 578 & 580 & 583 & 591 & 595 \\ \hline 602 & 607 & 620 & 625 & 628 & 628 & 629 & 630 & 632 & 632 \\ \hline 633 & 633 & 634 & 638 & 639 & 640 & 644 & 645 & 646 & 646 \\ \hline 653 & 657 & 658 & 663 & 666 & 667 & 667 & 670 & 675 & 675 \\ \hline \end{array}$$
Question 12
Deimos, one of the moons of Mars, has a crater called Swift, and the other one, Phobos, has an area called Laputa. That's because in Jonathan Swift's Gulliver's Travels astronomers discover that Mars has 2 moons. The novel came out in 1726 - 151 years before the actual discovery.
Forty students were asked to read Gulliver's Travels last week in one sitting. The time taken by each student to read the book is recorded in the table below.
$$\begin{array}{rrrrrrrrrr} \hline 100 & 100 & 101 & 101 & 102 & 103 & 103 & 105 & 105 & 106 \\ \hline 109 & 109 & 109 & 111 & 112 & 113 & 114 & 115 & 115 & 118 \\ \hline 118 & 119 & 122 & 124 & 125 & 126 & 127 & 127 & 127 & 128 \\ \hline 129 & 131 & 133 & 133 & 133 & 133 & 135 & 140 & 140 & 142 \\ \hline \end{array}$$
Question 13
A 1970s cartoon called Rascal the Raccoon sparked a craze in Japan for keeping raccoons as pets. Their feral offspring have since caused so much damage that Tokyo now has a hotline you can call to report sightings.
The weight of 30 pet racoons, rounded to the nearest pound, is presented below $$ \begin{array}{rrrrrrrrrr} \hline 9 & 9 & 10 & 10 & 10 & 10 & 10 & 11 & 11 & 11 \\ \hline 11 & 12 & 12 & 12 & 13 & 13 & 13 & 14 & 14 & 15 \\ \hline 15 & 16 & 16 & 17 & 17 & 17 & 18 & 19 & 19 & 20 \\ \hline \end{array} $$
Question 14
A recent poll in the UK found that $12\%$ of Britons love their pets more they love their partner. $9 \%$ love their pets more than they love their children. Forty Brits were asked how much time they spent on average with their pets each day. The results are shown in the table below.
$$\begin{array}{rrrrrrrrrr} \hline 120 & 121 & 121 & 123 & 124 & 127 & 128 & 130 & 130 & 132 \\ \hline 133 & 135 & 136 & 138 & 139 & 139 & 140 & 141 & 148 & 148 \\ \hline 150 & 150 & 151 & 154 & 155 & 162 & 162 & 163 & 163 & 163 \\ \hline 164 & 168 & 168 & 170 & 176 & 177 & 177 & 178 & 179 & 180 \\ \hline \end{array}$$
Question 15
In the United States, you can mail live scorpions - if they are inside a box labeled ``live scorpions``, inside another box also labeled ``live scorpions``.
The life span of 40 American scoprpions in months is presented below $$\begin{array}{rrrrrrrrrr} \hline 37 & 38 & 38 & 38 & 38 & 40 & 41 & 41 & 41 & 42 \\ \hline 43 & 44 & 45 & 47 & 47 & 47 & 48 & 49 & 51 & 51 \\ \hline 51 & 52 & 55 & 56 & 56 & 56 & 57 & 58 & 58 & 59 \\ \hline 60 & 60 & 62 & 63 & 65 & 65 & 66 & 66 & 68 & 68 \\ \hline \end{array}$$
Question 16
A Paris company has set up a ``hospital`` to help bring people's dying office pot plants back to life.
The number of days that 40 office planty spent recuperating at the hospital is shown below $$\begin{array}{cccccccccc} \hline 6 & 9 & 9 & 9 & 10 & 10 & 11 & 12 & 12 & 13 \\ \hline 13 & 14 & 15 & 15 & 16 & 17 & 18 & 19 & 19 & 19 \\ \hline 21 & 21 & 21 & 22 & 22 & 25 & 25 & 25 & 25 & 26 \\ \hline 27 & 27 & 27 & 28 & 28 & 28 & 29 & 29 & 30 & 30 \\ \hline \end{array} $$