Presentation of Data
Once data has been collected, it needs to be organized and presented in a meaningful way so that trends and other pertinent information can be extracted from it. This chapter covers how to: construct a frequency table, calculate cumulative frequencies, and create two types of graphical representations for data; the histogram and the bar chart.
Frequency Table
A frequency table is a table that shows how many times a particular value or range of values occurs in a data set. It is a useful way to organize data and identify patterns.
Definition:
Frequency Table
A frequency table organizes data into classes or intervals, and shows how many data points fall into each category.
How To Make A Frequency Table
<ul> <li> Sort the data into ascending order. </li> <li> Decide on the number of classes or intervals to be used (typically between $5$ and $10$ classes). <li> Calculate the class width: $$\begin{align*} \text{class width} &=\frac{Largest\, value - Smallest\, value}{ number \, of\, classes} \\ \quad & \Rightarrow \quad \text{and round up } \end{align*} $$ </li> <li>Work out the class limits. <br/> <br/> To generate the lower limit of each class, start with the smallest value in the data set. Move down the table, and add the class width to each preceding value until the number of desired classes have been populated with its respective lower endpoint. <br/> <br/> To determine the upper limit for the first class, subtract 1 from the lower limit of the second class. Move down the table and close off each class by adding the class width to the preceding value. </li> <li> Tally the data </li> </ul>
Rule of Thumb
The number of classes is usually between $5$ and $10$ <br/> <br/> <ul> <li> Using fewer than five classes, could result in the loss of too much information </li> <li> Using more than ten classes could result in poor data synthesis. </li> </ul>
Remark
The above technique for creating a frequency table ensures that: <br/> <br/> <ul> <li> Each data point is included in one and only one class. </li> <li> The classes are mutually exclusive (ie. none of the classes overlap each other). </li> <li> The classes are exhaustive (ie. all of the data is captured and accounted for). </li> </ul>
Consider the following data set: $$1, 2, 3, 4, 5, 6, 7, 8, 9, 10 $$ If we decide to use $5$ classes. Determine the class width.
Example 1
The class width is calculated as follows: <br/> <br/> $$\text{class width}=\frac{Largest - Smallest}{Number\, of \, classes}=\frac{10 - 1}{5} = 1.8 \quad \Rightarrow \quad 2$$ <br/> <br/> Therefore, the class width is $2$.
Consider the following data set: $$27, 30, 33, 45, 45, 46, 49, $$ $$55, 56, 58, 64, 67, 70 $$ If we decide to use $7$ classes. Determine the class limits.
Example 2
The class width is calculated as follows: <br/> <br/> $$\text{class width} = \frac{Largest - Smallest}{Number\, of \, classes}=\frac{70 - 27}{7} = 6.14 \quad \Rightarrow \quad 7$$ <br/> <br/> Therefore, the class width is $7$.
Consider the following data set: $$27, 31, 35, 46, 46, 49, $$ $$ 49, 53, 55, 58, 60, 77 $$ If we decide to use $5$ classes. Determine the class limits.
Example 3
The class width is calculated as follows: <br/> <br/> $$\text{class width} = \frac{Largest - Smallest}{Number\, of \, classes}=\frac{77 - 27}{5} = 10 \quad \Rightarrow \quad 11$$ <br/> <br/> Therefore, the class width is $11$.
Example 4
Consider the following data set: $$27, 30, 33, 45, 45, 46, 49, $$ $$ 55, 56, 58, 64, 67, 70 $$ If we decide to use $6$ classes.
Example 5
Consider the following data set: $$12, 23, 33, 43, 44, 47, 49, 52, 56, 58, 60, 65, 75 $$ If we decide to use $7$ classes.
Frequency Distributions
A table that just displays the frequency counts may not be the best way to present data. More meaningful information and trends can be revealed if the counts in each class are converted into proportions or percentages.
Definition:
Frequency Distribution
A <strong> frequency distribution</strong> is an extension of a frequency table. In addition to exhibiting the frequency counts in each class, a frequency distribution also includes: midpoints for each class, classboundaries, relative frequencies, cumulative frequencies.
<ul><li> The <strong> midpoint </strong> is the middle number of each class. It is obtained by adding the upper and lower limits together and then dividing by two. The midpoint is used to estimate the average or mean for grouped data. </li> <li> The <strong> relative frequency</strong> displays the proportion of data that falls into each category. It can be represented in decimal format or in percentages.</li></ul>
Formula:
Relative Frequencies
For each class, the relative frequency is calculated by dividing the frequency of that class by the total number of observations. $$\begin{align} \text{Relative Frequency} &=\frac{f}{n} \end{align}$$ where<br/> <br/> $f=$ the frequency of the class <br/> $n=$ the total number of observations.
Example 6
Wayne Enterprises is a US based conglomerate with interests in technology, manufacturing, and weapons research. The company's CEO, Bruce Wayne, likes to keep track of how long each of his employees are away on their lunch break. Here are the results of the 60 employees who work for him rounded to the nearest minute. $$\begin{array}{lllllllllllllll} 35 & 35 & 35 & 35 & 35 & 36 & 36 & 36 & 36 & 36 & 36 & 36 & 36 & 36 & 36 \\ 36 & 37 & 37 & 38 & 38 & 38 & 39 & 40 & 40 & 40 & 40 & 40 & 40 & 40 & 40 \\ 41 & 41 & 41 & 41 & 41 & 41 & 42 & 42 & 43 & 43 & 43 & 43 & 43 & 43 & 44 \\ 44 & 44 & 44 & 44 & 45 & 45 & 45 & 45 & 47 & 48 & 49 & 50 & 50 & 50 & 55\end{array}$$
Cumulative Frequencies
The <strong> cumulative frequency</strong> is the sum of the frequencies of all classes up to and including the current class. It is used to determine the number of observations that fall below a certain value. The cumulative frequency for the last class should equal the total number of observations.
Definition:
Less-Than Cumulative Frequencies (LTCF)
The <strong>less-than cumulative frequencies</strong> reports the number of data values that are less than the upper limit of each class.
Example 7
Consider the lunch break data for Wayne Enterprises again. $$\begin{array}{lllllllllllllll} 35 & 35 & 35 & 35 & 35 & 36 & 36 & 36 & 36 & 36 & 36 & 36 & 36 & 36 & 36 \\ 36 & 37 & 37 & 38 & 38 & 38 & 39 & 40 & 40 & 40 & 40 & 40 & 40 & 40 & 40 \\ 41 & 41 & 41 & 41 & 41 & 41 & 42 & 42 & 43 & 43 & 43 & 43 & 43 & 43 & 44 \\ 44 & 44 & 44 & 44 & 45 & 45 & 45 & 45 & 47 & 48 & 49 & 50 & 50 & 50 & 55\end{array}$$
Definition:
More-Than Cumulative Frequencies (MTCF)
The <strong>more-than cumulative frequencies</strong> reports the number of data values that are greater than the lower limit of each class.
Example 8
Here is lunch break data for Wayne Enterprises again. $$\begin{array}{lllllllllllllll} 35 & 35 & 35 & 35 & 35 & 36 & 36 & 36 & 36 & 36 & 36 & 36 & 36 & 36 & 36 \\ 36 & 37 & 37 & 38 & 38 & 38 & 39 & 40 & 40 & 40 & 40 & 40 & 40 & 40 & 40 \\ 41 & 41 & 41 & 41 & 41 & 41 & 42 & 42 & 43 & 43 & 43 & 43 & 43 & 43 & 44 \\ 44 & 44 & 44 & 44 & 45 & 45 & 45 & 45 & 47 & 48 & 49 & 50 & 50 & 50 & 55\end{array}$$
Example 9
The Presidential Daily Briefing (PDB) is a top-secret document which details the most pressing issues and developments from around the world. The report is prepared everyday for the US President to review, but Donald Trump says that he doesn't need the Daily Briefing because he's ``smart``. Plus he doesn't like reading - so he has someone read it to him while he listens. He also mentioned that he would like the PBDs to be shorter and contain more ``killer graphics``. The number of pages for $75$ PDBs are as follows: $$\begin{array}{ccccccccccccccc} 50 & 51 & 51 & 52 & 52 & 53 & 53 & 54 & 55 & 55 & 56 & 56 & 57 & 57 & 57 \\ 60 & 60 & 60 & 61 & 62 & 63 & 64 & 65 & 66 & 66 & 68 & 69 & 69 & 70 & 70 \\ 70 & 70 & 70 & 71 & 71 & 71 & 72 & 73 & 73 & 74 & 75 & 75 & 75 & 78 & 79 \\ 80 & 80 & 82 & 83 & 87 & 88 & 94 & 95 & 95 & 96 & 99 & 99 & 99 & 99 & 100 \\ 100 & 101 & 101 & 112 & 112 & 113 & 114 & 115 & 117 & 117 & 120 & 121 & 124 & 124 & 124\end{array}
Graphical Representation of Data
Once data has been collected, it needs to be organized and presented in a meaningful way so that trends and other pertinent information can be extracted from it.
While tables structure information so that the data is more useful to us, it is not the most efficient way of interpreting data. To effectively analyse data, a visual representation is needed. Charts, histograms, and graphs serve as a more powerful means to uncover patterns and trends hidden withinthe data; ones which are not evident by simply looking at the values in a table.
Histograms and Bar Charts
Histograms and bar charts are two of the most common ways to visually represent data. They are used to display the frequency of data points in a data set.<br/> <br/> In a histogram, the continuous data is grouped into intervals, and the height of each bar represents the frequency of data points in that interval. In a bar chart, the discrete or categorical data is grouped into categories, and the height of each bar represents the frequency of data points in that category.
Prior to selecting the type of graph to use, it is important to understand the data set and the type of data being represented. <li> <strong> Histograms </strong> are used to to illustrate data that is continuous</li> <li> <strong> Bar charts </strong> are used to illustrate data that is categorical or discrete </li>
Remark
For both types of graphs, the class limits, class boundaries, or categories are placed on the x-axis, while the frequency (either in counts, decimal, or percentages) are placed on the y-axis.
Bar Charts
Example 10
The Codfather is a restaurant located in South Carolina that serves fish and chips. According to Trip Advisor, it is the #1 eatery in town, and the data below shows the number of cod and fries lunch combos served up in 2021. $$\ \begin{array}{|c|c|} \hline \text{Month} & \text{Cod and Fries Lunches Served} \\ \hline \text{January} & 186 \\ \text{February} & 131 \\ \text{March} & 123 \\ \text{April} & 98 \\ \text{May} & 87 \\ \text{June} & 56 \\ \text{July} & 45 \\ \text{August} & 23 \\ \text{September} & 12 \\ \text{October} & 63 \\ \text{November} & 73 \\ \text{December} & 81 \\ \hline\end{array} $$
Number of Cod and Fries Lunches Served
Example 11
Life of Pie is a pie shop whose name is a pun on the novel and film ``Life of Pi.`` The data below show the types of pies which were sold this past week. $$\begin{array}{|c|c|} \hline \text{Pie Filling} & \text{Pies Sold} \\ \hline Apple & 186 \\ Blueberry & 131 \\ Custard & 123 \\ Keylime & 98 \\ Pumpkin \, Spice & 87 \\ Coconut \, Cream & 56 \\ Strawberry & 45 \\ \hline \end{array} $$
Pie Sales
Example 12
A die is rolled 100 times, and the number of dots that appears is recorded in the table below $$\begin{array}{|c|c|} \hline \text{Dots} & \text{Frequency} \\ \hline 1 & 18 \\ 2 & 13 \\ 3 & 13 \\ 4 & 15 \\ 5 & 12 \\ 6 & 16 \\ \hline \end{array} $$
Number of Dots on a Die
Histograms
Remark
Prior to sketching a histogram, the gaps that occur between the classes need to be eliminated. To achieve this, each class limit is extended outwards by half a unit to create <strong> class boundaries </strong>. In other words, each lower limit is decreased by $0.5$ and each upper limit is increased by $0.5$.
Example 13
Planet of the Grapes is a wine store with a name inspired by the movie ``Planet of the Apes.`` The data below shows the prices for the bottles of wine sold last week. $$\begin{array}{|c|c|} \hline \text{Price of Bottle} & \text{Frequency} \\ \hline 0-50 & 3 \\ 51-100 & 7 \\ 101-150 & 12 \\ 151-200 & 8 \\ 201-250 & 5 \\ 251-300 & 3 \\ \hline \end{array} $$
Solution
Since prices are continuous, the class limits are extended by $0.5$ to create class boundaries. The table and histogram below illustrates the frequency of wine bottles sold at Planet of the Grapes. $$\begin{array}{|c|c|c|} \hline \text{Price of Bottle} & \text{Class Boundaries} & \text{Frequency} \\ \hline 0-50 & -0.5 - 50.50 & 3 \\ 51-100 & 50.5-100.5 & 7 \\ 101-150 & 100.5-150.5 & 12 \\ 151-200 & 150.5-200.5 & 8 \\ 201-250 & 200.5-250.5 & 5 \\ 251-300 & 250.5-300.5 & 3 \\ \hline \end{array} $$
Price of Wine Bottles
Shapes and Attributes
A picture is worth a thousand words and a graph is no exception. The shape and attributes of a histogram/bar graph can reveal: if the data is symmetrically distributed or loaded to one side, if there are outliers in the data set, which intervals contain the most data, and how the data is spread out.
When data is <strong> skewed to the left </strong> in a histogram, the tail of the graph is on the left side. In other words, there are a small number of very low values pulling the mean toward the left.
Left (Negatively) Skewed Distribution
When data is <strong> skewed to the right </strong> in a histogram, the tail of the graph is on the right side. In other words, there are a small number of very high values pulling the mean toward the right.
Right (Positively) Skewed Distribution
A distribution is <strong>symmetric</strong> when its left and right sides are mirror images of each other around a central point
Symmetric Distribution
A distribution is <strong>uniform</strong> when all the classes have equal frequencies.
Uniform Distribution
A distribution is <strong>bimodal</strong> when it has two peaks. This means that there are two values that occur most frequently in the data set
Bimodal Distribution
Example 14
When it comes to choosing between losing a limb or living without internet of Americans who responded to a survey would rather cut off a finger than cut the internet. Americans spend more time consuming media online than they do sleeping, averaging about 10.5 hours per day connected to the internet. A random sample of 56 people in the U.S. were asked how many minutes they spend online per day. Here are the results: $$\begin{array}{llllllllllllll} 120 & 121 & 125 & 127 & 127 & 128 & 130 & 134 & 134 & 134 & 130 & 134 & 134 & 135 \\ 141 & 141 & 142 & 142 & 143 & 144 & 145 & 146 & 147 & 147 & 148 & 148 & 148 & 148 \\ 150 & 150 & 150 & 150 & 150 & 151 & 152 & 153 & 154 & 154 & 155 & 155 & 155 & 155 \\ 155 & 155 & 157 & 157 & 157 & 157 & 157 & 158 & 158 & 158 & 158 & 158 & 160 & 160 \end{array}$$
Below is the histogram that represents the data collected from the survey.
Number of Minutes Spent Online