Day 1
Day 2
Day 3
Day 4
Day 5
Day 6
Day 7
Day 8

Day 1: Exploring One-Variable Data

Understanding and describing distributions of data

Exam Weight: 15-23% of the AP Statistics Exam

Learning Objectives

  • Identify different types of data and levels of measurement
  • Calculate and interpret measures of center (mean, median, mode)
  • Calculate and interpret measures of spread (range, IQR, standard deviation)
  • Create and interpret graphical displays of data
  • Describe distributions in terms of shape, center, and spread
  • Identify and handle outliers appropriately
  • Understand the effects of transformations on data

Key Concepts

Types of Data

Categorical (qualitative): Data that can be sorted into categories (e.g., eye color, gender)

Quantitative (numerical): Data that can be measured numerically

  • Discrete: Countable values (e.g., number of students)
  • Continuous: Measurable on a continuum (e.g., height, weight)

Levels of Measurement:

  • Nominal: Categories with no natural ordering (e.g., eye color)
  • Ordinal: Categories with a natural ordering (e.g., satisfaction ratings)
  • Interval: Numerical with equal intervals but no true zero (e.g., temperature in °F)
  • Ratio: Numerical with equal intervals and a true zero (e.g., height, weight)

Measures of Center

Mean: The arithmetic average of the data

x̄ = (Σx)/n

Median: The middle value when data are arranged in order

Mode: The most frequently occurring value(s)

When to use each measure:

  • Mean: Best for symmetric distributions, uses all data points
  • Median: Best for skewed distributions, resistant to outliers
  • Mode: Useful for categorical data or finding peaks in a distribution

Measures of Spread

Range: The difference between the maximum and minimum values

Range = max - min

Interquartile Range (IQR): The difference between the first and third quartiles

IQR = Q3 - Q1

Standard Deviation: The average distance of data points from the mean

s = √[Σ(x - x̄)²/(n-1)]

Variance: The square of the standard deviation

s² = Σ(x - x̄)²/(n-1)

Graphical Displays

Dotplots: Each data point is represented by a dot above its value on a number line

Histograms: Data are grouped into bins, with bar heights representing frequencies

Boxplots: Visual representation of the five-number summary

Stemplots: Data are split into stems (leading digits) and leaves (final digits)

Describing Distributions

Shape:

  • Symmetric: Data are evenly distributed around the center
  • Skewed right (positive): Tail extends to the right
  • Skewed left (negative): Tail extends to the left
  • Bimodal: Two distinct peaks
  • Uniform: Approximately equal frequencies throughout

Center: Typically described using mean or median

Spread: Typically described using range, IQR, or standard deviation

Outliers: Values that fall outside 1.5 × IQR from the quartiles

Effects of Transformations

Adding/subtracting a constant (X + c):

  • Shifts the distribution by c units
  • Mean and median increase by c
  • Measures of spread remain unchanged

Multiplying/dividing by a constant (X × c):

  • Stretches/compresses the distribution by a factor of c
  • Mean and median are multiplied by c
  • Measures of spread are multiplied by |c|

Common Mistakes to Avoid

Confusing Population and Sample Statistics

Remember to use the correct notation:

  • Population mean: μ, Sample mean: x̄
  • Population standard deviation: σ, Sample standard deviation: s

Using Mean with Skewed Distributions

The mean is sensitive to outliers and can be misleading for skewed distributions. The median is often more appropriate for skewed data.

Misinterpreting Boxplots

Remember that boxplots show the five-number summary, not the shape of the distribution. The box represents the middle 50% of the data.

Incorrectly Identifying Outliers

Outliers are typically defined as values that fall outside 1.5 × IQR from the quartiles:

  • Lower fence: Q1 - 1.5 × IQR
  • Upper fence: Q3 + 1.5 × IQR

Practice Problems

Problem 1

The following data represent the number of hours 10 students spent studying for an exam:

2, 3, 5, 7, 8, 10, 12, 4, 6, 9

a) Calculate the mean, median, and mode of this data set.

b) Calculate the range, IQR, and standard deviation of this data set.

c) Create a boxplot of the data.

d) Describe the distribution in terms of shape, center, and spread.

Solution to Problem 1

a) Mean = (2 + 3 + 5 + 7 + 8 + 10 + 12 + 4 + 6 + 9) / 10 = 66 / 10 = 6.6

Median: Arranging the data in order: 2, 3, 4, 5, 6, 7, 8, 9, 10, 12

Since n = 10 (even), median = (6 + 7) / 2 = 6.5

Mode: No value appears more than once, so there is no mode.

b) Range = max - min = 12 - 2 = 10

Q1 = median of lower half = (3 + 4) / 2 = 3.5

Q3 = median of upper half = (9 + 10) / 2 = 9.5

IQR = Q3 - Q1 = 9.5 - 3.5 = 6

Standard deviation:

s = √[Σ(x - x̄)²/(n-1)]

= √[((2-6.6)² + (3-6.6)² + ... + (12-6.6)²)/9]

= √[((−4.6)² + (−3.6)² + ... + (5.4)²)/9]

= √[110.4/9]

= √12.27

= 3.5 (approximately)

c) Boxplot would show:

  • Minimum: 2
  • Q1: 3.5
  • Median: 6.5
  • Q3: 9.5
  • Maximum: 12

d) The distribution appears to be approximately symmetric, with a center around 6.5 hours. The spread is moderate, with an IQR of 6 hours and a standard deviation of about 3.5 hours. There are no outliers in the data set.

Problem 2

The following data represent the test scores of two different classes:

Class A: 65, 70, 75, 80, 85, 90, 95

Class B: 60, 65, 70, 80, 90, 95, 100

a) Calculate the mean and standard deviation for each class.

b) Compare the two distributions in terms of center and spread.

c) If 5 points are added to each score in Class A, what will be the new mean and standard deviation?

d) If each score in Class B is multiplied by 0.9, what will be the new mean and standard deviation?

Solution to Problem 2

a) Class A:

Mean = (65 + 70 + 75 + 80 + 85 + 90 + 95) / 7 = 560 / 7 = 80

Standard deviation:

s = √[Σ(x - x̄)²/(n-1)]

= √[((65-80)² + (70-80)² + ... + (95-80)²)/6]

= √[((−15)² + (−10)² + ... + (15)²)/6]

= √[700/6]

= √116.67

= 10.8 (approximately)

Class B:

Mean = (60 + 65 + 70 + 80 + 90 + 95 + 100) / 7 = 560 / 7 = 80

Standard deviation:

s = √[Σ(x - x̄)²/(n-1)]

= √[((60-80)² + (65-80)² + ... + (100-80)²)/6]

= √[((−20)² + (−15)² + ... + (20)²)/6]

= √[1050/6]

= √175

= 13.2 (approximately)

b) Both classes have the same mean (80), but Class B has a larger standard deviation (13.2 vs. 10.8), indicating that the scores in Class B are more spread out than those in Class A.

c) If 5 points are added to each score in Class A:

New mean = 80 + 5 = 85

New standard deviation = 10.8 (unchanged)

d) If each score in Class B is multiplied by 0.9:

New mean = 80 × 0.9 = 72

New standard deviation = 13.2 × 0.9 = 11.9

Calculator Activities

Activity 1: Calculating Summary Statistics

Objective: Practice calculating and interpreting summary statistics using a graphing calculator.

Instructions:

  1. Enter the following data set into L1 on your calculator:
    15, 18, 22, 25, 28, 30, 32, 35, 40, 45, 50, 55, 60
  2. Calculate the mean, median, standard deviation, and five-number summary using 1-Var Stats.
  3. Interpret each statistic in context.

TI-84 Steps:

  1. Press STAT → EDIT to enter data into L1
  2. Press STAT → CALC → 1-Var Stats
  3. Press ENTER to calculate statistics for L1

Activity 2: Creating and Interpreting Graphical Displays

Objective: Create different graphical displays and compare their effectiveness.

Instructions:

  1. Using the same data set from Activity 1, create:
    • A histogram (try different bin widths)
    • A boxplot
    • A dotplot
  2. For each display, describe what it reveals about the distribution.
  3. Determine which graphical display is most appropriate for this data set and explain why.

TI-84 Steps:

  1. Press 2nd → STAT PLOT → 1 to set up Plot 1
  2. Select the type of plot (histogram, boxplot, etc.)
  3. Press ZOOM → 9 (ZoomStat) to view the graph

Additional Resources

Quick Reference
Formulas to Remember

Mean: x̄ = (Σx)/n

Standard Deviation: s = √[Σ(x - x̄)²/(n-1)]

z-score: z = (x - x̄)/s

IQR: Q3 - Q1

Outlier Boundaries:

  • Lower: Q1 - 1.5 × IQR
  • Upper: Q3 + 1.5 × IQR
Exam Tip

When describing distributions on the AP exam, always include all three components: shape, center, and spread. Also mention any outliers if present.

Example: "The distribution is approximately symmetric with a mean of 75. The standard deviation is 10, indicating moderate spread. There are no outliers."