Summary Statistics
Definitions
Parameters & Statistics
- x: The sample mean
- μ: the population mean
- S: Sample standard deviation
- σ: Population standard deviation
- X2: Sample chi-square value
- χ2: Population chi-quare value
- ρ^: Sample proportion
- p: Population proportion
Quartiles
- Q1: One quarter (25%) of the data is less than or equal to Q1
- Q3: Three quarters (75%) of the data is less than or equal to Q3
Interquartile Range (IQR)
Standard Deviation & Variance
- The standard deviation for a population: σx=n∑(Xi−x)2
- The standard deviation for a sample: sx=n−1∑(xi−x)2
- Variance is the square of the standard deviation, i.e., σ2 and s2
Frequency
- Frequency: number of times that a certain value has appeared, denoted by f
- x=ni=1∑nfixi
- Relative Frequency: nf
Outliers
- x is an outlier if x<Q1−1.5⋅IQR or x>Q3+1.5⋅IQR
- x is an outlier if x≥μ+2σ or x≤μ−2σ
Five-Number Summary
- min value
- first quartile
- median
- third quartile
- max value
Boxplots
- A box is used to represent the middle 50% of the data
- The width of the box represents IQR
- Two whiskers (horizontal lines) represent min and max value
- Outliers are represented with a cross and are outside of the whiskers
Skewness
- Skewness describes the direction in which a non-symmetrical distribution of data is leaning
- A distribution that has its tail on the right side has positive skew
- A distribution that has its tail on the left side has negative skew
- Find the skewness from a boxplot:
- If the median is roughly in the middle of the first and third quartiles, then the distribution is approximately symmetrical
- If the median is closer to Q1, then the distribution has positive skew
- If the median is closer to Q3, then the distribution has negative skew
- Find the skewness from the median and the mean
- Positively skewed: median < mean
- Negatively skewed: median > mean
Data Comparison
- Compare numerical values
- Describe meaning in real life