Fast statistical analysis • 2026 edition
Mean: \(\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\)
Median: Middle value when data is sorted
Mode: Most frequently occurring value
These measures represent the central or typical value in a dataset. The mean is the arithmetic average, the median is the middle value when arranged in order, and the mode is the value that appears most frequently.
Example: For dataset [2, 3, 3, 4, 5, 5, 5, 6]:
These measures help summarize data and identify patterns in distributions.
| Statistic | Value |
|---|---|
| Mean | 4.50 |
| Median | 5.00 |
| Mode | 5 |
| Count | 11 |
| Statistic | Value |
|---|
Central tendency refers to the measure that represents the center of a data set. It provides a single value that describes the entire dataset by identifying the typical or central value around which other data points cluster. The three main measures of central tendency are mean, median, and mode.
The mean (or arithmetic mean) is calculated by summing all values and dividing by the count:
Where:
The median is the middle value when data is arranged in ascending order. For odd number of values, it's the middle value. For even number of values, it's the average of the two middle values.
The mode is the value that appears most frequently in the dataset. A dataset can have:
Arithmetic average of all values in a dataset.
\(\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\)
Sum all values divided by count.
Measure of spread around the mean.
\(\sigma = \sqrt{\frac{\sum(x_i - \bar{x})^2}{n}}\)
Which measure of central tendency is most affected by extreme outliers in a dataset?
The answer is A) Mean. The mean is most affected by outliers because it takes into account every value in the dataset. Extreme values pull the mean towards them, skewing the result. For example, in the dataset [1, 2, 3, 4, 100], the mean is 22, which is heavily influenced by the outlier 100. The median (3) and mode (all values appear once, so no mode) are much less affected by this extreme value.
Understanding the sensitivity of different measures to outliers is crucial in statistics. The mean uses every data point in its calculation, making it vulnerable to extreme values. The median only depends on the middle value(s) when sorted, so outliers don't affect it as much. The mode depends on frequency, not magnitude, so it's also resistant to outliers.
Outlier: Data point that is significantly different from other observations
Robust statistic: Measure that is not greatly affected by outliers
Mean sensitivity: How much a measure changes when outliers are present
• Mean is sensitive to all values in the dataset
• Median is robust to outliers
• Mode is not affected by the magnitude of values
• Use median when outliers are present
• Use mean when data is normally distributed
• Always examine data for outliers before choosing a measure
• Using mean without checking for outliers
• Assuming all measures are equally affected by outliers
• Ignoring the shape of the distribution when selecting measures
Given the dataset: [12, 15, 18, 15, 20, 22, 15, 25, 30, 15], calculate the mean, median, and mode. Show your work step by step.
Step 1: Organize the data
Original: [12, 15, 18, 15, 20, 22, 15, 25, 30, 15]
Sorted: [12, 15, 15, 15, 15, 18, 20, 22, 25, 30]
Step 2: Calculate the Mean
Mean = Sum of all values ÷ Number of values
Sum = 12 + 15 + 18 + 15 + 20 + 22 + 15 + 25 + 30 + 15 = 187
Count = 10
Mean = 187 ÷ 10 = 18.7
Step 3: Calculate the Median
Since there are 10 values (even number), the median is the average of the 5th and 6th values:
5th value = 15, 6th value = 18
Median = (15 + 18) ÷ 2 = 16.5
Step 4: Calculate the Mode
Count frequency of each value:
12: 1, 15: 4, 18: 1, 20: 1, 22: 1, 25: 1, 30: 1
Mode = 15 (appears 4 times, more than any other value)
Final Answer:
Mean = 18.7
Median = 16.5
Mode = 15
This example demonstrates how the three measures of central tendency can differ. The mean (18.7) is pulled higher by the larger values (25, 30), while the median (16.5) represents the middle of the data. The mode (15) indicates the most common value in the dataset. Notice how the mode appears in the dataset multiple times, making it the most frequent value.
Frequency: How many times a value appears in the dataset
Sorted data: Values arranged in ascending or descending order
Central tendency: Measures that describe the center of a dataset
• Always sort data before finding median
• For even count, median is average of middle two values
• Mode is the most frequent value
• Create a frequency table to find mode quickly
• Use a calculator for large datasets
• Double-check calculations with different methods
• Forgetting to sort data before finding median
• Miscounting frequencies for mode
• Arithmetic errors in mean calculation
Q: When should I use mean versus median in my analysis?
A: The choice between mean and median depends on your data distribution:
Use the Mean when: Your data is normally distributed (bell-shaped curve) with no significant outliers. The mean provides the best representation of the center because it incorporates all data points. For example, heights of adult men in a population.
Use the Median when: Your data contains outliers or is skewed. The median is more robust and less affected by extreme values. For example, household incomes in a region where a few very wealthy individuals could skew the average.
Consider this example: In a neighborhood, 9 houses are worth $300,000 each, but one house is worth $3,000,000. The mean would be $570,000, but the median would be $300,000. The median better represents the typical house value in this case.
Q: Can a dataset have more than one mode?
A: Yes, a dataset can have multiple modes:
Unimodal: One mode (most common). Example: [1, 2, 2, 3, 4] - mode is 2
Bimodal: Two modes. Example: [1, 1, 2, 3, 3, 4] - modes are 1 and 3
Trimodal: Three modes
Multimodal: More than three modes
No mode: All values appear with equal frequency. Example: [1, 2, 3, 4, 5] - no mode
Bimodal and multimodal distributions often indicate that the data comes from two or more different populations or processes. For instance, exam scores might show bimodal distribution if there are distinct groups of students (those who studied vs. those who didn't).