Skip to content

Summary Statistics

Definitions

Parameters & Statistics

  • x\overline{x}: The sample mean
  • μ\mu: the population mean
  • SS: Sample standard deviation
  • σ\sigma: Population standard deviation
  • X2X^2: Sample chi-square value
  • χ2\chi^2: Population chi-quare value
  • ρ^\hat{\rho}: Sample proportion
  • pp: Population proportion

Quartiles

  • Q1: One quarter (25%) of the data is less than or equal to Q1
  • Q3: Three quarters (75%) of the data is less than or equal to Q3

Interquartile Range (IQR)

  • IQR = Q3 - Q1

Standard Deviation & Variance

  • The standard deviation for a population: σx=(Xix)2n\sigma_x = \sqrt{\dfrac{\sum (X_i - \overline{x})^2}{n}}
  • The standard deviation for a sample: sx=(xix)2n1s_x = \sqrt{\dfrac{\sum(x_i - \overline{x})^2}{n-1}}
  • Variance is the square of the standard deviation, i.e., σ2\sigma^2 and s2s^2

Frequency

  • Frequency: number of times that a certain value has appeared, denoted by ff
  • x=i=1nfixin\overline{x} = \dfrac{\sum\limits_{i=1}^n f_i x_i}{n}
  • Relative Frequency: fn\dfrac{f}{n}

Outliers

  • xx is an outlier if x<Q11.5IQRx < Q1 - 1.5 \cdot IQR or x>Q3+1.5IQRx > Q3 + 1.5 \cdot IQR
  • xx is an outlier if xμ+2σx \geq \mu + 2\sigma or xμ2σx \leq \mu - 2\sigma

Five-Number Summary

  • min value
  • first quartile
  • median
  • third quartile
  • max value

Boxplots

  • A box is used to represent the middle 50% of the data
  • The width of the box represents IQR
  • Two whiskers (horizontal lines) represent min and max value
  • Outliers are represented with a cross and are outside of the whiskers

Skewness

  • Skewness describes the direction in which a non-symmetrical distribution of data is leaning
    • A distribution that has its tail on the right side has positive skew
    • A distribution that has its tail on the left side has negative skew
  • Find the skewness from a boxplot:
    • If the median is roughly in the middle of the first and third quartiles, then the distribution is approximately symmetrical
    • If the median is closer to Q1, then the distribution has positive skew
    • If the median is closer to Q3, then the distribution has negative skew
  • Find the skewness from the median and the mean
    • Positively skewed: median < mean
    • Negatively skewed: median > mean

Data Comparison

  • Compare numerical values
  • Describe meaning in real life