Hypothesis test and confidence interval

Procedures for Hypothesis Test¶

Define context
- \(H_0: parameter =\ ...\)
- \(H_a: parameter\ condition\ ...\)
- \(parameter\) is the ...
- \(\alpha =\ ...\)
Verify inference conditions
Find p-value
Conclusion
- \(p\ condition\ \alpha\)
- \(H_0\) should/should not be rejected
- The data provides sufficient evidence that ...

Procedures for Confidence Interval¶

Verify confidence interval conditions
Find confidence interval
- \(CI = statistic \pm (critical\ value) \times (standard\ error\ of\ statistic)\)
We can be \(C\%\) confident that the interval from \(lower\ limit\) to \(upper\ limit\) captures the actual value of the \(parameter\)

Population Proportions¶

One-sample z-test.

Conditions:

Independence condition
- Random sampling/assignment
- If sampling without replacement, \(n < 0.1 N\)
Approximately normally distributed
- \(\left\{\begin{aligned} np_0 &\geq 10 \\ n(1-p_0) &\geq 10 \end{aligned}\right.\)

Test statistic:

\(z = \dfrac{\hat{p} - p_0}{\sqrt{\dfrac{p_0 (1-p_0)}{n}}}\)
standard error = \(\sqrt{\dfrac{p_0 (1-p_0)}{n}}\)

Differences in Population Proportions¶

Two-sample z-test .

Conditions:

Independence condition
- Random sampling/assignment
- If sampling without replacement, \(n < 0.1 N\)
Approximately normally distributed
- Combined proportion/pooled proportion \(\hat{p}_c = \dfrac{X_1 + X_2}{n_1 + n_2}\), \(X = n \hat{p}\)
- \(\left\{\begin{aligned} n_1\hat{p}_c &\geq 10 \\ n_1(1-\hat{p}_c) &\geq 10 \\ n_2\hat{p}_c &\geq 10 \\ n_2(1-\hat{p}_c) &\geq 10 \end{aligned}\right.\)

Test statistic:

\(z = \dfrac{(\hat{p}_1 - \hat{p}_2) - 0}{\sqrt{\dfrac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \dfrac{\hat{p}_2(1-\hat{p}_2)}{n_2}}}\)
standard error = \(\sqrt{\dfrac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \dfrac{\hat{p}_2(1-\hat{p}_2)}{n_2}}\)

Population Means¶

One-sample t-test.

Conditions:

Independence condition
- Random sampling/assignment
- If sampling without replacement, \(n < 0.1 N\)
Approximately normally distributed
- Approximately symmetric
- No outliers
If very skewed, \(n \geq 30\)

Test statistic:

\(t = \dfrac{\overline{x} - \mu}{\frac{s}{\sqrt{n}}}\)
standard error = \(\dfrac{s}{\sqrt{n}}\)

Differences in Population Means¶

Two-sample t-test.

Conditions:

Independence condition
- Random sampling/assignment
- If sampling without replacement, \(n < 0.1 N\)
Approximately normally distributed
- Approximately symmetric
- No outliers
If very skewed, \(n \geq 30\)

Test statistic:

\(t = \dfrac{\overline{x}_1 - \overline{x}_2}{\sqrt{\dfrac{s_1^2}{n_1} + \dfrac{s_2^2}{n_2}}}\)
standard error = \(\sqrt{\dfrac{s_1^2}{n_1} + \dfrac{s_2^2}{n_2}}\)

Differences in Matched Pairs¶

One-sample t-test.

Conditions:

Two measures come from the same items within the population
Independence condition
- Random sampling/assignment
- If sampling without replacement, \(n < 0.1 N\)
Approximately normally distributed
- Approximately symmetric
- No outliers
If very skewed, \(n \geq 30\)

Test statistic:

\(t = \dfrac{\overline{x}_d - \mu_d}{\frac{s_d}{\sqrt{n}}}\)
standard error = \(\dfrac{s_d}{\sqrt{n}}\)

Goodness of Fit¶

\(\chi^2\)-test.

Conditions:

Independence condition
- Random sampling
- If sampling without replacement, \(n < 0.1 N\)
Large counts condition
- Each expected value ≥ 5

Test statistic:

\(\chi^2 = \sum \dfrac{(observed - expected)^2}{expected}\)

Independence¶

One-sample \(\chi^2\)-test.

Conditions:

Independence condition
- Random sampling
- If sampling without replacement, \(n < 0.1 N\)
Large counts condition
- Each expected value ≥ 5
- or ≥ 80\% expected values > 5 and all are ≥ 1

Test statistic:

\(\chi^2 = \sum \dfrac{(observed - expected)^2}{expected}\)
\(Expected\ value\) = \(\dfrac{Row\ total \times Column\ total}{Total\ number\ in\ sample}\)
\(dof = (n_{row}-1)(n_{col}-1)\)

Homogeneity¶

Multi-sample \(\chi^2\)-test.

Conditions:

Independence condition
- Random sampling
- If sampling without replacement, \(n < 0.1 N\)
Large counts condition
- Each expected value ≥ 5

Test statistic:

\(\chi^2 = \sum \dfrac{(observed - expected)^2}{expected}\)
\(dof = (n_{row}-1)(n_{col}-1)\)

Regression Line¶

One-sample t-test.

Conditions:

Relationship between \(x\) and \(y\) is linear
\(\sigma_y\) cannot vary with \(x\)
Independence condition
- Random sampling
- If sampling without replacement, \(n < 0.1 N\)
For a given value of \(x\), \(y\)-values follow an approximate normal distribution
If \(n<30\), \(y\)-values distribution have no strong skew and no outliers

Test statistic:

\(t = \dfrac{b - \beta}{s_b}\)
standard error \(s_b = \dfrac{s}{s_x \sqrt{n-1}}\)
- \(s = \sqrt{\dfrac{\sum (y_i - \hat{y_i})^2}{n-2}}\)
- \(s_x = \sqrt{\dfrac{\sum (x_i - \overline{x})^2}{n-1}}\)
\(t\)-distribution with \(dof=n-2\)

Predictor	Coef	SE Coef	T	P
Constant	\(a\)
\(x\)-variable	\(b\)	\(s_b\)