Skip to content

Inference for Regression Slopes

Sampling Distributions for Sample Slopes

Population Least-Squares Regression Line

Theoretical, best-fitting straight line that described the true linear relationship between two variables for an entire population

\(\hat{y} = \alpha + \beta x\)

  • \(\hat{y}\): predicted response
  • \(\alpha\): population \(\hat{y}\)-intercept
  • \(\beta\): population slope

Least-squares indicate that the sum of squared residuals are minimized

Sample Least-Squares Regression Line

The line of best fit for a set of data points that minimize the sum of squared residuals

\(\hat{y} = a+bx\)

  • Different samples produce different sample least-square regression lines. These all have different sample sloped \(b\). This means \(b\) has a sampling distribution.

Mean and Standard Deviation of Sampling Distribution for Sample Slopes

For the sample slopes \(b\):

  • \(\mu = \beta\)
  • \(\sigma = \dfrac{\sigma}{\sigma_x \sqrt{n}}\)
    • \(n\): sample size
    • \(\sigma\): \(\sigma\) of all population residuals
    • \(\sigma_x = \sqrt{\dfrac{\sum (x_i - \overline{x})^2}{n}}\): \(\sigma\) of the \(x\)-values only
  • \(\dfrac{\sigma}{\sigma_x \sqrt{n}}\) is unknown in practice and must be estimated from the sample (standard error)
    • \(s_b = \dfrac{s}{s_x \sqrt{n-1}}\): the standard error of sample slopes
    • \(s = \sqrt{\dfrac{\sum (y_i - \hat{y_i})^2}{n-2}}\)
      • Divided by \(n-2\) as two parameters, \(\alpha\) and \(\beta\), are estimated
    • \(s_x = \sqrt{\dfrac{\sum (x_i - \overline{x})^2}{n-1}}\)

Sampling Distributions for Standardized Sample Slopes

  • \(t = \dfrac{b - \beta}{s_b}\): standardized sample slope
  • \(t\)-distribution with \(dof=n-2\)

Hypothesis Tests for Slopes of Regression Lines

Conditions for A t-Test for A Slope

  • Relationship between \(x\) and \(y\) is linear
  • \(\sigma_y\) cannot vary with \(x\)
  • Residuals are independent
    • Data is collected by random sampling/assignment
    • If sampling without replacement, \(n < 0.1N\)
  • For a given value of \(x\), \(y\)-values follow an approximate normal distribution
  • If \(n<30\), \(y\)-values distribution have no strong skew and no outliers

Computer Output Table

Predictor Coef SE Coef T P
Constant \(a\)
\(x\)-variable \(b\) \(s_b\)