Inference for Regression Slopes¶

Sampling Distributions for Sample Slopes¶

Population Least-Squares Regression Line¶

Theoretical, best-fitting straight line that described the true linear relationship between two variables for an entire population

\(\hat{y} = \alpha + \beta x\)

\(\hat{y}\): predicted response

\(\alpha\): population \(\hat{y}\)-intercept

\(\beta\): population slope

Least-squares indicate that the sum of squared residuals are minimized

Sample Least-Squares Regression Line¶

The line of best fit for a set of data points that minimize the sum of squared residuals

\(\hat{y} = a+bx\)

Different samples produce different sample least-square regression lines. These all have different sample sloped \(b\). This means \(b\) has a sampling distribution.

Mean and Standard Deviation of Sampling Distribution for Sample Slopes¶

For the sample slopes \(b\):

\(\mu = \beta\)
\(\sigma = \dfrac{\sigma}{\sigma_x \sqrt{n}}\)
- \(n\): sample size
- \(\sigma\): \(\sigma\) of all population residuals
- \(\sigma_x = \sqrt{\dfrac{\sum (x_i - \overline{x})^2}{n}}\): \(\sigma\) of the \(x\)-values only
\(\dfrac{\sigma}{\sigma_x \sqrt{n}}\) is unknown in practice and must be estimated from the sample (standard error)
- \(s_b = \dfrac{s}{s_x \sqrt{n-1}}\): the standard error of sample slopes
- \(s = \sqrt{\dfrac{\sum (y_i - \hat{y_i})^2}{n-2}}\)
  - Divided by \(n-2\) as two parameters, \(\alpha\) and \(\beta\), are estimated
- \(s_x = \sqrt{\dfrac{\sum (x_i - \overline{x})^2}{n-1}}\)

Sampling Distributions for Standardized Sample Slopes¶

\(t = \dfrac{b - \beta}{s_b}\): standardized sample slope
\(t\)-distribution with \(dof=n-2\)

Hypothesis Tests for Slopes of Regression Lines¶

Conditions for A t-Test for A Slope¶

Relationship between \(x\) and \(y\) is linear
\(\sigma_y\) cannot vary with \(x\)
Residuals are independent
- Data is collected by random sampling/assignment
- If sampling without replacement, \(n < 0.1N\)
For a given value of \(x\), \(y\)-values follow an approximate normal distribution
If \(n<30\), \(y\)-values distribution have no strong skew and no outliers

Computer Output Table¶

Predictor	Coef	SE Coef
Constant	\(a\)
\(x\)-variable	\(b\)	\(s_b\)