Describing Data

Section 4.2 Describing Data

Objectives

Students will be able to:

Define and identify categorical and quantitative data
Read and construct frequency tables and relative frequency tables
Make bar charts and pie charts for categorical variables by hand and/or using technology
Identify elements of misleading graphs: 3-dimensional graphs, perceptual distortion, misleading scales, stacked bar graphs
Make histograms for quantitative variables by hand and/or using technology
Identify the number of modes in a distribution and whether it is symmetric, skewed to the left, or skewed to the right

Once we have collected data from an observational study or an experiment, we need to summarize and present it in a way that will be meaningful to our audience. The raw data is not very useful by itself. In this section we will begin with graphical presentations of data and in the rest of the chapter we will learn about numerical summaries of data.

Subsection 4.2.1 Types of Data

There are two types of data, categorical data and quantitative data. The word data is plural because it represents many pieces of information.

Categorical (qualitative) data are pieces of information that allow us to classify the subjects into various categories.

Example 4.2.1.

We might conduct a survey to determine the name of the favorite movie that people saw in a movie theater. When we conduct such a survey, the responses would look like: Finding Nemo, Black Panther, Titanic, etc.

We can count the number of people who give each answer, but the answers themselves do not have any numerical values: we cannot perform computations with an answer like “Black Panther” because it is categorical data.

Quantitative data are responses that are numerical in nature and with which we can perform meaningful calculations.

Example 4.2.2.

A survey could ask the number of movies you have seen in a movie theater in the past 12 months (0, 1, 2, 3, 4, ...). This would be quantitative data.

Other examples of quantitative data would be the running time of the movie you saw most recently (104 minutes, 137 minutes, 110 minutes, etc.) or the amount of money you paid for a movie ticket the last time you went to a movie theater ($5.50, $9.75, $10.50, etc.).

We cannot assume that all numbers are quantitative data, and sometimes it is not so clear-cut. Here are some examples to illustrate this.

Example 4.2.3.

Suppose we gather respondents’ ZIP codes in a survey to track their geographical location. ZIP codes are numbers, but we can’t do any meaningful calculations with them (it doesn’t make sense to say that 98036 is "twice" 49018 — that’s like saying that Lynnwood, WA is "twice" Battle Creek, MI, which doesn’t make sense at all), so ZIP codes are really categorical data.
A survey about the movie you most recently saw includes the question, "How would you rate the movie?" with these possible answers:
1. It was awful.
2. It was just okay.
3. I liked it.
4. It was great.
5. Best movie ever!

Again, there are numbers associated with the responses, but these are really categories. A movie that rates a 4 is not necessarily twice as good as a movie that rates a 2, whatever that means; However, we often see that a movie got an average of 3.7 stars, which is an average of categorical ratings and it can give us important information.

Overall, it is important to look at the purpose of the study for any variables that could be classified as either categorical or quantitative. Another consideration is what you plan to do with the data. Next, we will talk about how to display each type of data.

Subsection 4.2.2 Presenting Categorical Data

Since we can’t do calculations with categorical data, we begin by summarizing the data in a frequency table or a relative frequency table.

Subsection 4.2.3 Frequency Tables

A frequency table has one column for the categories, and another for the frequency, or number of times that category occurred.

Example 4.2.4.

An insurance company determines vehicle insurance premiums based on known risk factors. If a person is considered a higher risk, their premiums will be higher. One potential factor is the color of your car. The insurance company believes that people with some color cars are more likely to get in accidents. To research this, they examine police reports for recent total-loss collisions. The data is summarized in this table.

Car Color	Frequency of Total- Loss Collisions
Blue	25
Green	52
Red	41
White	36
Black	39
Grey	23
Total	216

Subsection 4.2.4 Relative Frequency Tables

Counts are usually not as easy to interpret as percentages, so we will add a column for the relative frequencies. A relative frequency is the percentage for the category, found by dividing each frequency by the total and converting to a percentage. You’ll notice the percentages may not add up to exactly 100% due to rounding.

Example 4.2.5.

Car Color	Frequency of Total- Loss Collisions	Relative Frequency of Total-Loss Collisions
Blue	25	$25/216 = 0.116$ or 11.6%
Green	52	$52/216 = 0.241$ or 24.1%
Red	41	$41/216 = 0.190$ or 19.0%
White	36	$36/216 = 0.167$ or 16.7%
Black	39	$39/216 = 0.181$ or 18.1%
Grey	23	$23/216 = 0.107$ or 10.7%
Total	216	$216/216 = 1.0$ or 100%

It would be even more useful to have a visual to see what is going on, and this is where charts and graphs come in. For categorical data we can display our data using bar graphs and pie charts.

Subsection 4.2.5 Bar graphs

A bar graph is a graph that displays a bar for each category with the height of the bar indicating the frequency of that category. To construct a bar graph with vertical bars, we label the horizontal axis with the categories. The vertical axis will have a scale for the frequency or relative frequency.

The highest frequency in our car data is 52 collisions, so we will set our vertical axis to go from 0 to 55, with a scale of 5 units. To draw bar graphs by hand graph paper is useful, or you can use technology. It is also very helpful to label each bar with the frequency or relative frequency.

This is a bar graph. Along the x-axis it lists: blue, green, red, white, black and grey. The x-axis is labeled “Vehicle color involved in total-loss collision.” The y-axis is labeled “frequency” and goes from 0 to 55 with a scale of 5. Above each color there is a bar corresponding to the frequency. Blue 25; Green 52; Red 41; White 36; Black 39; Grey 23.

Subsection 4.2.6 Pie Charts

A natural way to visualize relative frequencies is with a pie chart. A pie chart is a circle with wedges cut of varying sizes like slices of pizza or pie. The size of each wedge corresponds to the relative frequency of the category. The slices add up to 100%, just like relative frequencies. Pie charts can often benefit from including frequencies or relative frequencies in the pie slices.

Pie charts look nice but are harder to draw by hand than bar charts since to draw them accurately we would need to compute the angle each wedge cuts out of the circle, then measure the angle with a protractor. A spreadsheet is much better suited to drawing pie charts.

A pie chart for the same data as the previous bar chart. The relative frequencies are given in each pie piece: Green 24%; Red 19%; Black 18%; White 17%; Blue 11%; Grey 11%.

Subsection 4.2.7 Using a Spreadsheet to Make Bar Charts and Pie Charts

To make a graph using a spreadsheet, place the data from the frequency table into the cells. Then select the data, go to the Insert tab, and choose the bar graph or pie chart that you would like. For this example, we will choose a pie graph.

This is a screen shot of excel that includes both the data and the Insert tab with various circle graph options.

After the spreadsheet has created your pie graph you can choose which design you prefer by clicking on the Chart Design tab. Since these pie pieces represent car colors, we matched the color of each wedge to the color of the car in our pie chart above.

To give your graph a meaningful title, click on Chart Title. There are many other settings that you can experiment with.

This is a screenshots of excel that shows the frequency table and pie chart and shows the Chart Design Tab.

Subsection 4.2.8 Misleading Graphs

Graphs can be misleading intentionally or unintentionally. It’s better to keep them simple, clear and well-labeled. People sometimes add features to graphs that don’t help convey their information.

Example 4.2.6.

A 3-dimensional bar chart like the one shown is usually not as effective as a 2-dimensional graph. The extra dimension does not add any useful information.

A 3-D bar graph showing the car color data.

Here is another way that fanciness can sometimes lead to trouble. Instead of plain bars, it is tempting to substitute images. This type of graph is called a pictogram.

Subsection 4.2.9 Perceptual Distortion

A pictogram is a statistical graphic in which the size of the picture is intended to represent the frequency or size of the values being represented. We need to be careful with these, because our brains perceive the relationship between the areas, not the heights.

Example 4.2.7.

A labor union might produce this graph to show the difference between the average manager salary and the average worker salary.

The average manager salary is twice as high as the average worker salary as in a bar graph, but the image is also twice as wide. That makes it look like the manager salary is 4 times as large as the worker salary. The area needs to accurately portray the relationship, otherwise we will have a perceptual distortion.

There are two bags of money shown. One is larger labeled Mananger Salaries. The other is half as tall and half as wide and labeled workers salaries. There are no numbers or scale shown.

Subsection 4.2.10 Misleading Scale

Another type of distortion in bar charts results from setting the baseline to a value other than zero. The baseline is the bottom of the vertical axis, representing the least number of cases that could have occurred in a category. Normally, this number should be zero.

Example 4.2.8.

Compare the two graphs below showing support for same-sex marriage rights from a poll taken in December, 2008

CNN/Opinion Research Corporation Poll. Dec 19-21, 2008, from pollingreport.com/civil.htm

. At a glance, the two graphs suggest very different stories. The second graph makes it look like more than three times as many people oppose marriage rights as support them. But when we look at the scale we can see that the difference is about 12%. By not starting at zero the difference looks enlarged.

A bar graph with a scale from 0-100%; There is a bar for support at about 44% and a bar for oppose at about 56%.

A bar graph of the same data but the scale goes from 40-60%; This magnifies the difference between the support and oppose groups.

Subsection 4.2.11 Stacked Bar Graphs

Another type of graph that can be hard to read and sometimes misleading is a stacked bar graph. In a stacked bar graph, the values we are comparing are stacked on top of each other vertically.

Example 4.2.9.

The table lists college expenses for two different students and we want to compare them. A stacked bar graph shows the expenses stacked vertically, but we are interested in the differences, not the totals.

Expense	Student 1	Student 2
Rent	$500	$650
Food	$125	$125
Tuition	$1750	$1450
Books	$325	$275
Misc	$100	$175

A stacked bar graph with the bars for the expenses for student 2 placed on top of the bars for student 1.

A side-by-side bar graph with the bars for each student right next to each other.

It is much easier to interpret the differences in a side-by-side bar chart.

Subsection 4.2.12 Presenting Quantitative Data

With categorical data, the horizontal axis is the category, but with quantitative, or numerical, data we have numbers. If we have repeated values we can also make a frequency table.

Example 4.2.10.

A teacher records scores on a 20-point quiz for the 30 students in their class. The scores in points are:

19	20	18	18	17	18	19	17	20	18	20	16	20	15	17
12	18	19	18	19	17	20	18	16	15	18	20	5	0	0

Here is a frequency table with the scores grouped and put in order.

Quiz Score	Frequency of Students
0	2
5	1
12	1
15	2
16	2
17	4
18	8
19	4
20	6

Using this table, it would be possible to create a standard bar chart from this summary, like we did for categorical data. However, since the scores are numerical values, this chart wouldn’t make sense; the first and second bars would be five values apart, while the later bars would only be one value apart. Instead, we will treat the horizontal axis as a number line. This type of graph is called a histogram.

Subsection 4.2.13 Histograms

A histogram is like a bar graph, but the horizontal axis is a number line. Unlike a bar graph, the data labels are placed on the horizontal axis between the bars. The histogram below shows the quiz scores from the previous example in graphical form.

To read the histogram, we look at the horizontal axis to find the score, then look at the height of the bar to find the frequency. For example, the bar above 18 has a height of 8, which means that 8 students scored an 18 on the quiz.

If we have a large number of different data values, a frequency table or histogram including every possible value would not be practical. There would be too many bars on the histogram to reveal any patterns. For this reason, it is common with quantitative data to group data into class intervals. The histograms below show the prices of home sales in Kitsap County, Washington, for the same two-week period in 2026. They are created from the same datset; the only difference is the size of the class intervals.

Histogram of home sale prices in Kitsap County, Washington, with $250,000 class intervals. The pattern of sales is slightly less easy to discern in this case, but the numbers are easy to read. — Figure 4.2.11. Home sales histogram with $250,000 class widths

Histogram of home sale prices in Kitsap County, Washington, with $100,000 class intervals. There is insufficient detail to discern the pattern of sales. — Figure 4.2.12. Home sales histogram with $100,000 class widths

The histogram with wider bars is easier to read and reveals the broad pattern of sales, while the histogram with narrower bars reveals more detail but may be harder to read. Based on the histogram with wider bars, we see that the most frequent home price is between $500,000 and $750,000. Looking at the histogram with smaller bars, we can hone in on the bar for homes selling for between $500,000 and $600,000 as the most frequent.

Class intervals are the intervals into which we group our data, and class widths are the sizes of those intervals. The two histograms above show the same data with different class widths. The histogram on the left has class widths of 250,000, and the histogram on the right has class widths of 100,000.

Note: spreadsheet programs may call the class intervals "bins" or "buckets", and they may call the class widths "bin width" or "bucket size".

If a home sells for exactly $500,000, is that home counted in the class interval to the left of the $500,000 or the class interval to the right of the $500,000 in the histogram or both? Each home will only be counted once, but it is not possible to tell by inspection whether values that are at the dividing lines between class intervals are counted in the bar on the left or the bar on the right. In these particular histograms, they are counted in the bar on the right. So the interval between $500,000 and $600,000 includes $500,000 homes up to $599,999.99 homes.

Let’s work through another example. If we are studying the grade point averages (GPAs) of students, we might use class intervals of 0.0 to 0.49, 0.5 to 0.99, 1.0 to 1.49, and so on up to 4.0.

A frequency table for GPAs might look like this:

Class Interval	Frequency
0.0-0.49	2
0.5-0.99	4
1.0-1.49	8
1.5-1.99	12
2.0-2.49	20
2.5-2.99	30
3.0-3.49	25
3.5-3.99	10
4.0-4.49	15

The resulting histogram is below.

In this example, we could have eliminated the last class interval of 4.0-4.99 and adjusted the previous class interval to 3.5-4.0, since 4.0 is the maximum possible GPA. Both approaches are correct, and this is just one of the many subjective decisions that is made when creating graphs of data.

The size of each interval is called the class width. In the GPA example, the class width is 0.5 because each interval spans 0.5 units. For clarity and consistency, we generally define class intervals so that:

Each class width is the same.
There are between 5 and 20 classes.

In the next example, we’ll show the steps for creating a histogram by hand.

Example 4.2.13.

Suppose we have collected weights from 100 subjects who identify as male, as part of a nutrition study. For our weight data, we have values ranging from a low of 121 pounds to a high of 263 pounds, giving a total span of $263-121 = 142\text{.}$ This means that the range of the data is 142.

We decide to use 10 class intervals. To find the class width, we take the range of 142 and divide it by 10.

$142/10 = 14.2$

Using a class width of 14.2 would result in awkward values for the bounds of our intervals, so we round up to 15. Since the minimum data value is 121, we will choose 120 for the lower limit of the first class because it is a nice round number that is less than 121. The first class interval will be 120-134.99, the second will be 135-149.99, and so on up to 255-269.99. Tabulating the number of data values that fall into each class interval and then graphing gives us the following frequency table and histogram:

Interval	Frequency
120-134.99	4
135-149.99	14
150-164.99	16
165-179.99	28
180-194.99	12
195-209.99	8
210-224.99	7
225-239.99	6
240-254.99	2
255-269.99	3

A histogram of the data in the table; the x-axis is labeled weights(pounds) and goes from 120 to 270, with a scale of 15; The y-axis is labeled frequency and goes from 0 to 30 with a scale of 5. There are no spaces between the bars.

Subsection 4.2.14 Create Histograms with Technology

While Excel can be used to create histograms, it is not as user-friendly as Google Sheets for this purpose.

We will walk through the steps of creating a histogram in Google Sheets using the following data on the number of hours spent on homework per week for 30 students in a class:

2	3	5	1	4	6	2	3	7	8	17	11	9	2	3
10	14	5	21	3	6	7	8	4	5	6	2	3	15	23

To create a histogram:

Enter the data as a single column in a Google Sheets spreadsheet.
Select the data, then click on the Insert menu and select Chart.
In the Chart Editor, click on the Chart Type dropdown menu and select Histogram.
In the Chart Editor, click on the Customize tab to adjust the settings. Most important, the class intervals are called "buckets" in Google Sheets, and the class widths are called "buckets". Change the bucket size in the Histogram area of the Customize tab as seen in the second image below.

The resulting histogram will be displayed in the spreadsheet.

Histogram of hours with class widths of 2.

You can further customize the appearance of the histogram by double-clicking on the histogram to access the Chart Editor.

Subsection 4.2.15 The Shape of a Distribution

Once we have our histogram, we can use it to determine the shape of the data or distribution. When describing distributions, we are going to look at four characteristics: shape, center, spread and outliers. Center and spread (variation) will be covered in the next two sections.

Subsection 4.2.16 Modality

The modality of a distribution indicates the number of peaks or hills in its histogram.

It is unimodal if it has one peak.
It is bimodal if it has two peaks.
It is multimodal if it has multiple peaks.

Example 4.2.14.

The first graph is unimodal, the second is bimodal and the third is multimodal.

Three histograms are shown. The first histogram has one peak. The second histograms has two peaks. The third histogram has three peaks.

A bimodal distribution can result when two different populations have been grouped together and they are overlapping. It would be better to separate them into two separate graphs. For example, the grams of sugar per serving in sugar and non-sugar cereals.

Subsection 4.2.17 Symmetry

A distribution is symmetric if the left side of the graph mirrors the right side.

Example 4.2.15.

The graph on the left is symmetric and unimodal while the graph on the right is roughly symmetric and bimodal.

Two histograms are shown. The first histogram has one peak in the middle and the bars taper down symmetrically on each side. The second histogram has two peaks. Each of the peaks are similar in size and the graph is roughly symmetric.

Subsection 4.2.18 Skewness

If a distribution is not symmetric then we say it is skewed. A graph can be skewed to the left or skewed to the right. We say it is skewed in the direction of the longer tail.

Subsection 4.2.19 Skewed to the Left

A left skewed graph is also called a negatively skewed graph. The longer tail will be on the left or negative side.

This is a histogram with 9 bars. The highest bar is to the right of center and the bars go down to the left gradually.

Subsection 4.2.20 Skewed to the Right

A right skewed graph is also called a positively skewed graph. The longer tail will be on the right or positive side.

This is a histogram with 9 bars. The highest bar is to the left of center and the bars go down to the right gradually.

Subsection 4.2.21 The Normal Distribution

The normal distribution has a very specific shape. It is unimodal and symmetric with a bell-shaped graph.

Subsection 4.2.22 Outlier

Outliers are data values that are unusually far away from the rest of the data. There is often a gap between the outlier and the rest of the graph. This visual determination of outliers is often subjective and depends on the situation.

Example 4.2.16.

In the graph to the right we have a unimodal distribution that is skewed to the right. There appears to be an outlier near 20.

This distribution has majority of it’s data on the left there is a gap in the graph with one bar further away.

Exercises 4.2.23 Exercises

1.

True or False: The bars of a histogram should always touch.

2.

True or False: The bars of a bar graph should always touch.

3.

Is the data described categorial or quantitative?

In a study, you ask the subjects their age in years.
In a study, you ask the subjects their gender.
In a study, you ask the subjects their ethnicity.
The daily high temperature of a city over several weeks.
A person’s annual income.

4.

Is the data described categorical or quantitative?

In a study you ask the subjects how many siblings they have.
In a study you ask the subjects what their favorite movie genre is.
In a study to measure the subjects’ blood pressure.
The daily rainfall in a city over several weeks.
In a study you ask the subjects the amount they spend on housing each month

5.

What types of graphs are used for categorical data?

6.

What types of graphs are used for quantitative data?

7.

A group of adults were asked how many children they have in their family. The bar graph to the right shows the number of adults who indicated each number of children.

A bar graph showing the frequency for amount of children: 0 kids 5; 2 kid 3; 2 kids 4; 3 kids 2; 4 kids 0; 5 kids 1.

How many adults had 3 children?
How many adults where questioned?
What percentage of the adults questioned had 0 children?

8.

Jasmine was interested in how many days it would take a DVD order from Netflix to arrive at her door. The graph shows the data she collected.

The x-axis is labeled shipping times (days) with 1, 2, 3, 4, and 5. The y-axis is labeld frequency. The bar for 1 goes up to 4, 2 goes up to 8, 3 goes up to 6, there’s no bar at 4 and 5 goes up to 1.

How many movies took 2 days to arrive?
How many movies did she order in total?
What percentage of the movies arrived in one day?

9.

This relative frequency bar graph shows the percentage of students who received each letter grade on their last English paper. The class contains 20 students. What number of students earned an A on their paper?

There are 4 bars labeled A, B, C, and D. The x-axis is labeled grade on English paper and the y-axis is labeled frequency (%). A goes up to 25%, B goes up to 35%, C goes up to 25% and D goes up to 15%.

10.

This relative frequency bar graph shows the percentage of each drink type served over the weekend at a local coffee shop. There were 120 drinks served in total. How many served drinks were lattes?

A relative frequency bar graph shows the percentage of each type of coffee drink served over a weekend. For coffee, the relative frequency is 45%. For tea, the relative frequency is 15%. For latte, the relative frequency is 20%. For mocha, the relative frequency is 10%. For cappuccino, the relative frequency is 5%. For other, the relative frequency is 5%.

11.

Corey categorized his spending for this month into four categories: Rent, Food, Fun, and Other. The percentages he spent in each category are pictured here. If he spent a total of $2,600 this month, how much did he spend on rent?

This is a pie chart with four regions. Fun is labeled 16%, Food is labeled 24%, Rent is labeled 26% and Other is labeled 34%.

12.

Habiba categorized the amount of time spent each week into 5 categories: Work, Travel, Housework, Leisure, and Sleep. If there are a total of 168 hours each week, how many hours does Habiba spend travelling each week?

This is a pie chart with five regions. Work is labled 29%, Travel is labeled 7%, Housework is labeled 12%, Leisure is labeled 23% and Sleep is labeled 29%.

13.

In a survey

Gallup Poll. March 5-8, 2009. pollingreport.com/enviro.htm

, 1012 adults were asked whether they personally worried about a variety of environmental concerns. The number of people who indicated that they worried “a great deal” about some selected concerns is listed below.

Is this categorical or quantitative data?
Make a bar chart for these data.
Why can’t we make a pie chart for these data?

Environmenal Issue	Frequency
Pollution of drinking water	597
Contamination of soil and water by toxic waste	526
Air pollution	455
Global warming	354

14.

In a survey, 2056 adults were asked about their views on immigration. The percent of people who responded that immigrants to the United States are making each of the following situations in the country better are listed below.

Is this categorical or quantitative data?
Make a relative frequency bar chart for these data.
Can we make a pie chart for these data?

Situation	Relative Frequency (%)
Food, music and the arts	57
The economy in general	43
Social and moral values	31
Job opportunities for you and your family	19
Taxes	20
Crime	7

15.

The following table is from a sample of five hundred homes in Oregon that were asked the primary source of heating in their home.

How many of the households heat their home with firewood?
What percent of households heat their home with natural gas?

Type of Heat	Relative Frequency (%)
Electricity	33
Heating Oil	4
Natural Gas	50
Firewood	8
Other	5

16.

The following table is from a sample of 50 undergraduate students at Portland State University.

What percent of the sampled students are below senior class?
How many of the sampled students are freshmen?

Class	Relative Frequency (%)
Freshman	18
Sophmore	13
Junior	23
Senior	46

17.

A group of adults were asked how many cars they had in their household.

Is this categorical or quantitative data?
Make a relative frequency table for the data.
Make a bar chart for the data.
Make a pie chart for the data.

1 4 2 2 1 2 3 3 1 4 2 2 1 2 1 3 2 2 1 2 1 1 1 2

18.

The table below shows scores on a math test.

Is this categorical or quantitative data?
Make a relative frequency table for the data using a class width of 10.
Construct a histogram of the data.

82 55 51 97 73 79 100 60 71 85 78 59

90 100 88 72 46 82 89 70 100 68 61 52

19.

This graph shows the number of adults and kids who prefer each type of soda. There were 130 adults and kids surveyed. Discuss some ways in which the graph could be improved.

This is a 3 dimensional double bar graph. There is a bar for kids and a bar for adults. The x-axis is labeled coke, diet coke, sprite and cherry soda. The y-axis starts at 20 and goes to 45 with a scale of 5.

20.

A poll was taken asking people if they agreed with the positions of the 4 candidates for a county office. Does this pie chart present a good representation of these data? Explain.

A pie chart with 4 regions: Nguyen is labeled 42%, McKee is labeled 35%, Jones is labeled 64% and Brown is labeled 52%.

21.

Why is this a misleading or poor graph?

This is a bar graph and is titled "Favorite Drinks". The x-axis is labeled Drink Type and includes colas, lemon flavored, root beer, teas, coffee, and other. The y-axis is labeled frequency and has no scale.

22.

Why is this a misleading or poor graph?

This is a bar graph and is titled "Profit During First Half of Year". The x-axis is labeled Year and includes January, February, March, April, May, and June. The y-axis is labeled Profits in $. The scale goes from 0 to 4500 by 500. The bar for each month is labeled with the following values: January 4230, February 3760, March 2670, April -1320, May 750, and June 1560. The bar for April is colored red while the remaining bars are colored blue.

23.

Match each description to one of the graphs.

Normal distribution
Positive or right skewed
Negative or left skewed
Bimodal

This histogram has a tall bar on the left side, then the bars decrease significantly and then increase for a peak on the right side. — Figure 4.2.17. The frequency of times between eruptions of the Old Faithful geyser.

This histogram has shorter bars to the left, which gradually get taller. The tallest bar is on the right side of the graph. — Figure 4.2.18. Scores on a 20-point statistics quiz.

This histogram increases sharply and then decreases gradually. The peak is just to the left of the center. — Figure 4.2.19. The distribution of scores on a psychology test.

This histogram is symmetric. The highest bar is in the middle and the bars decrease in height as you move away from the middle. — Figure 4.2.20. The number of heads in 24 sets of 100 coin flips.

24.

Write a sentence or two to describe each distribution in terms of modality, symmetry, skewness and outliers.

This histogram has the tallest bars on the left side and then decreases to the right.

The bars in this histogram increase gradually to the center and then decrease. The bars drop off slightly faster to the right.

The bars in this histogram increase sharply to the tallest bars on the left side, then decrease and increase to the second tallest bar, then decrease again to the right.

The bars on this histogram start out at a middle height and then go down and up in a U shape to the tallest bar to the right of center. Then the bars decrease and increase again for another tall bar on the right.

The bars on this histogram gradually increase to the right and drop of quickly after the peak toward the right side of the graph.

25.

Studies are often done by pharmaceutical companies to determine the effectiveness of a treatment. Suppose that a new cancer drug is currently under study. Of interest is the average length of time in months patients live once starting the treatment. Two researchers each follow a different set of 40 cancer patients throughout their treatment. The following data (in months) are collected.

Create a histogram for each dataset, using the same class intervals and scales so you can compare them.
Compare and contrast the two distributions.

Researcher 1 Patients (in months):

3 4 11 15 16 17 22 44 37 16 14 24 25 15 26 27 33 29 35 44

13 21 22 10 12 8 40 32 26 27 31 34 29 17 8 24 18 47 33 34

Researcher 2 Patients (in months):

3 14 11 5 16 17 28 41 31 18 14 14 26 25 21 22 31 2 35 44

23 21 21 16 12 18 41 22 16 25 33 34 29 13 18 24 23 42 33 29

Prev Top Next

Car Color	Frequency of Total- Loss Collisions	Relative Frequency of Total-Loss Collisions
Blue	25	\(25/216 = 0.116\) or 11.6%
Green	52	\(52/216 = 0.241\) or 24.1%
Red	41	\(41/216 = 0.190\) or 19.0%
White	36	\(36/216 = 0.167\) or 16.7%
Black	39	\(39/216 = 0.181\) or 18.1%
Grey	23	\(23/216 = 0.107\) or 10.7%
Total	216	\(216/216 = 1.0\) or 100%

3	4	11	15	16	17	22	44	37	16	14	24	25	15	26	27	33	29	35	44
13	21	22	10	12	8	40	32	26	27	31	34	29	17	8	24	18	47	33	34

3	14	11	5	16	17	28	41	31	18	14	14	26	25	21	22	31	2	35	44
23	21	21	16	12	18	41	22	16	25	33	34	29	13	18	24	23	42	33	29

3	4	11	15	16	17	22	44	37	16	14	24	25	15	26	27	33	29	35	44
13	21	22	10	12	8	40	32	26	27	31	34	29	17	8	24	18	47	33	34

3	14	11	5	16	17	28	41	31	18	14	14	26	25	21	22	31	2	35	44
23	21	21	16	12	18	41	22	16	25	33	34	29	13	18	24	23	42	33	29

3	4	11	15	16	17	22	44	37	16	14	24	25	15	26	27	33	29	35	44
13	21	22	10	12	8	40	32	26	27	31	34	29	17	8	24	18	47	33	34

3	14	11	5	16	17	28	41	31	18	14	14	26	25	21	22	31	2	35	44
23	21	21	16	12	18	41	22	16	25	33	34	29	13	18	24	23	42	33	29