Understanding and describing distributions of data
Categorical (qualitative): Data that can be sorted into categories (e.g., eye color, gender)
Quantitative (numerical): Data that can be measured numerically
Levels of Measurement:
Mean: The arithmetic average of the data
Median: The middle value when data are arranged in order
Mode: The most frequently occurring value(s)
When to use each measure:
Range: The difference between the maximum and minimum values
Interquartile Range (IQR): The difference between the first and third quartiles
Standard Deviation: The average distance of data points from the mean
Variance: The square of the standard deviation
Dotplots: Each data point is represented by a dot above its value on a number line
Histograms: Data are grouped into bins, with bar heights representing frequencies
Boxplots: Visual representation of the five-number summary
Stemplots: Data are split into stems (leading digits) and leaves (final digits)
Shape:
Center: Typically described using mean or median
Spread: Typically described using range, IQR, or standard deviation
Outliers: Values that fall outside 1.5 × IQR from the quartiles
Adding/subtracting a constant (X + c):
Multiplying/dividing by a constant (X × c):
Remember to use the correct notation:
The mean is sensitive to outliers and can be misleading for skewed distributions. The median is often more appropriate for skewed data.
Remember that boxplots show the five-number summary, not the shape of the distribution. The box represents the middle 50% of the data.
Outliers are typically defined as values that fall outside 1.5 × IQR from the quartiles:
The following data represent the number of hours 10 students spent studying for an exam:
2, 3, 5, 7, 8, 10, 12, 4, 6, 9
a) Calculate the mean, median, and mode of this data set.
b) Calculate the range, IQR, and standard deviation of this data set.
c) Create a boxplot of the data.
d) Describe the distribution in terms of shape, center, and spread.
a) Mean = (2 + 3 + 5 + 7 + 8 + 10 + 12 + 4 + 6 + 9) / 10 = 66 / 10 = 6.6
Median: Arranging the data in order: 2, 3, 4, 5, 6, 7, 8, 9, 10, 12
Since n = 10 (even), median = (6 + 7) / 2 = 6.5
Mode: No value appears more than once, so there is no mode.
b) Range = max - min = 12 - 2 = 10
Q1 = median of lower half = (3 + 4) / 2 = 3.5
Q3 = median of upper half = (9 + 10) / 2 = 9.5
IQR = Q3 - Q1 = 9.5 - 3.5 = 6
Standard deviation:
s = √[Σ(x - x̄)²/(n-1)]
= √[((2-6.6)² + (3-6.6)² + ... + (12-6.6)²)/9]
= √[((−4.6)² + (−3.6)² + ... + (5.4)²)/9]
= √[110.4/9]
= √12.27
= 3.5 (approximately)
c) Boxplot would show:
d) The distribution appears to be approximately symmetric, with a center around 6.5 hours. The spread is moderate, with an IQR of 6 hours and a standard deviation of about 3.5 hours. There are no outliers in the data set.
The following data represent the test scores of two different classes:
Class A: 65, 70, 75, 80, 85, 90, 95
Class B: 60, 65, 70, 80, 90, 95, 100
a) Calculate the mean and standard deviation for each class.
b) Compare the two distributions in terms of center and spread.
c) If 5 points are added to each score in Class A, what will be the new mean and standard deviation?
d) If each score in Class B is multiplied by 0.9, what will be the new mean and standard deviation?
a) Class A:
Mean = (65 + 70 + 75 + 80 + 85 + 90 + 95) / 7 = 560 / 7 = 80
Standard deviation:
s = √[Σ(x - x̄)²/(n-1)]
= √[((65-80)² + (70-80)² + ... + (95-80)²)/6]
= √[((−15)² + (−10)² + ... + (15)²)/6]
= √[700/6]
= √116.67
= 10.8 (approximately)
Class B:
Mean = (60 + 65 + 70 + 80 + 90 + 95 + 100) / 7 = 560 / 7 = 80
Standard deviation:
s = √[Σ(x - x̄)²/(n-1)]
= √[((60-80)² + (65-80)² + ... + (100-80)²)/6]
= √[((−20)² + (−15)² + ... + (20)²)/6]
= √[1050/6]
= √175
= 13.2 (approximately)
b) Both classes have the same mean (80), but Class B has a larger standard deviation (13.2 vs. 10.8), indicating that the scores in Class B are more spread out than those in Class A.
c) If 5 points are added to each score in Class A:
New mean = 80 + 5 = 85
New standard deviation = 10.8 (unchanged)
d) If each score in Class B is multiplied by 0.9:
New mean = 80 × 0.9 = 72
New standard deviation = 13.2 × 0.9 = 11.9
Objective: Practice calculating and interpreting summary statistics using a graphing calculator.
Instructions:
15, 18, 22, 25, 28, 30, 32, 35, 40, 45, 50, 55, 60
TI-84 Steps:
Objective: Create different graphical displays and compare their effectiveness.
Instructions:
TI-84 Steps:
Mean: x̄ = (Σx)/n
Standard Deviation: s = √[Σ(x - x̄)²/(n-1)]
z-score: z = (x - x̄)/s
IQR: Q3 - Q1
Outlier Boundaries:
When describing distributions on the AP exam, always include all three components: shape, center, and spread. Also mention any outliers if present.
Example: "The distribution is approximately symmetric with a mean of 75. The standard deviation is 10, indicating moderate spread. There are no outliers."