A population is the whole set of items from which a data sample can
be drawn; so a sample is only a portion of the population
data. Features of a sample are described by statistics.
Statistical methods enable us to arrange, analyse and interpret the
sample data obtained from a population. We gather sample data when
it is impractical to analyse the population data as a smaller sample often
allows us to gain a better understanding of the population without doing
too much work or wasting precious time.
For example, television program ratings indicate the number of TV
viewers per 100 watching various programs. Ratings are determined
from a small sample of viewers that represent the entire audience (or
population) as it is too expensive to monitor every household that watches
television. The small sample gives a reasonable estimate of the
entire population.
Statistics are used for many purposes including election polling,
sample marketing, understanding internet website visitors etc.
In this chapter, we will consider summary
statistics, tables, boxplots, scatterplots, line of best fit and time series data.
Summary Statistics
A set of data values collected by a survey or experiment is known as a sample. Summary statistics provide useful information about sample data. The
most commonly used types of summary statistics are measures of location
(or central tendency) and measures of
spread.
Measures of Location (or Central Tendency)
We can describe a data sample's location by using statistics to analyse
the sample's average, middle or most common value. The statistics used are
known as the mean, median and mode and are collectively known as measures
of location (or central tendency).
Mean
The mean of a set of values is defined as the sum of all the
values divided by the number of values. That is:
Example 1
The marks of five candidates in a mathematics test with a maximum
possible mark of 20 are given below.
15 13
19 18 14
Find the mean value.
Solution:
So, the mean mark is 15.8.
Median
The median is the middle value of a data set arranged in
ascending order of magnitude.
E.g. the median of 4, 6 and 10 is 6 as it is the middle value.
Example 2
The marks of five candidates in a geography test for which the maximum
possible mark was 20 are given below:
19 18
16 15 20
Find the median mark.
Solution:
Arrange the marks in ascending order of magnitude:
15 16
18 19 20
The third score, 18, is the middle one in this arrangement.
Note:
In general:
If the number of values in the data set is even, then the median is
the average of the two middle values.
Example 3
Find the median of the following scores:
11 17
15 20
9 12
Solution:
Arrange the score values in ascending order of magnitude:
9 11
12 15
17 20
There are 6 scores in the data set.
The third and fourth scores, 12 and 15, are in the middle. That
is, there is no one middle value.
Note:
Half of the values in the data set lie below the median and half lie
above the median.
Mode
The mode is the value (or values) that occurs most often.
E.g. the mode of the data set {4, 6, 7, 8, 8, 8, 8, 8, 9, 9, 10} is 8
as it occurs most often.
Example 4
The marks awarded to seven pupils for an assignment were as follows:
19 15
19 16
13 20 19
a. Find the median mark.
b. State the mode.
Solution:
a. Arrange the marks in ascending order of magnitude:
13 15
16 19
19 19 20
Note:
The fourth score, 19, is the middle data value in this arrangement.
b. 19 is the score that occurs most often.
Note the following:
- The mode has applications in manufacturing. For example, it is
important to manufacture more of the most popular cars, because
manufacturing different cars in equal numbers would cause a shortage
of some cars and an oversupply of others.
- The mean (or average) of a set of data values is used to predict
future results.
- The median is easy to comprehend. Extreme values in the data
set do not affect the median.
Key Terms
statistics, population, sample, summary statistics, measures of location, measures
of central tendency, mean, median, mode |