Lab 4

Bootstrap Confidence Intervals and a Single-Proportion Test

Author

YOUR NAME HERE

Getting the Lab Files

Open the Lab 4 project in the course space on Posit Cloud to get started.

Overview

Use a simulation-based workflow (bootstrap CI + hypothesis test) to analyze a governance reform example. Fill in each ??? with the correct code.

You are encouraged to have Module 3.2 and Module 4.1 open while completing this lab.

Getting Started

library(tidyverse)
library(tidymodels)
run <- isTRUE(params$completed)

Note

If tidymodels is not installed locally, run install.packages("tidymodels").

The Data

A municipal permit office introduces a transparency reform (posted fee schedules + anonymous reporting) to reduce unofficial payment requests.

In a sample of 50 permit applicants after the reform, 6 reported being asked for an unofficial payment and 44 did not.

permit_office <- tibble(
  outcome = c(rep("asked_payment", 6), rep("not_asked", 44))
)

Part 1: Explore the Sample (25 points)

Sample summary (10 pts)

Write code to:

view the first 6 rows
count outcomes
calculate the observed sample payment-request rate (proportion asked_payment)

# Write your code here

Question: What is the observed sample payment-request rate (proportion and percent)?

YOUR ANSWER HERE

Bar chart + short interpretation (15 pts)

Create a bar chart of outcome.

include a title and axis labels
use theme_minimal()

# Write your plot code here

Question: Based only on this sample, does the reform look promising? Why is this not enough to make a strong causal claim?

YOUR ANSWER HERE

Part 2: Bootstrap Confidence Interval (45 points)

Estimate a 95% bootstrap confidence interval for the payment-request rate.

Bootstrap distribution (20 pts)

Fill in the blanks:

set.seed(66)

boot_df <- permit_office |>
  specify(response = ???, success = ???) |>
  generate(reps = 5000, type = ???) |>
  calculate(stat = ???)

boot_df

CI estimate and plot (15 pts)

Use get_ci() and shade_ci() to estimate and visualize the 95% CI.

ci_95 <- boot_df |>
  get_ci(level = ???)

boot_df |>
  visualize() +
  ??? +
  labs(
    title = "Bootstrap Distribution of the Payment-Request Rate",
    x = "Sample proportion asked for unofficial payment",
    y = "Count"
  ) +
  theme_minimal()

CI interpretation (10 pts)

Write 2-3 sentences interpreting the 95% confidence interval in context.

YOUR INTERPRETATION HERE

Confidence level comparison (10 pts)

Change the confidence level in get_ci() to 0.90 and re-run the visualization.

ci_90 <- boot_df |>
  get_ci(level = ???)

boot_df |>
  visualize() +
  shade_ci(???, color = "red", fill = NULL) +
  labs(
    title = "Bootstrap Distribution with 90% CI",
    x = "Sample proportion asked for unofficial payment",
    y = "Count"
  ) +
  theme_minimal()

Question: Is the 90% CI wider or narrower than the 95% CI? Explain in 1-2 sentences why that makes sense.

Part 3: Single-Proportion Hypothesis Test (50 points)

Suppose the office’s historical payment-request rate is 20%, and the null hypothesis says the reform had no effect.

Hypotheses (10 pts)

Question: Is the payment-request rate after the reform lower than 20%?

Null hypothesis (\(H_0\)):
Alternative hypothesis (\(H_A\)):

YOUR ANSWERS HERE

Simulate the null distribution (20 pts)

Fill in the blanks for a simulation under \(H_0: p = 0.20\).

set.seed(71)

null_dist <- permit_office |>
  specify(response = ???, success = ???) |>
  hypothesize(null = "point", p = c("asked_payment" = ???, "not_asked" = ???)) |>
  generate(reps = 2000, type = ???) |>
  calculate(stat = ???)

null_dist

Null distribution plot (5 pts)

Visualize the null distribution and shade the p-value region.

null_dist |>
  visualize() +
  shade_p_value(obs_stat = ???, direction = ???)

Calculate the p-value (10 pts)

Use get_p_value() to calculate the p-value.

null_dist |>
  get_p_value(obs_stat = ???, direction = ???)

Conclusion (5 pts)

Using \(\alpha = 0.05\), state whether you reject or do not reject \(H_0\), then explain your conclusion in 2-3 sentences.

YOUR CONCLUSION HERE

Submission (20 points)

Replace “YOUR NAME HERE” with your name
Make sure all code chunks run without errors
Render to PDF
Submit the PDF to Blackboard

Hints (Short Version)

Observed proportion

permit_office |>
  summarize(p_hat = mean(outcome == "asked_payment"))

Bootstrap workflow

permit_office |>
  specify(response = outcome, success = "asked_payment") |>
  generate(reps = 5000, type = "bootstrap") |>
  calculate(stat = "prop")

Null distribution workflow

permit_office |>
  specify(response = outcome, success = "asked_payment") |>
  hypothesize(null = "point", p = c("asked_payment" = 0.20, "not_asked" = 0.80)) |>
  generate(reps = 2000, type = "draw") |>
  calculate(stat = "prop")

Null distribution visualization

null_dist |>
  visualize() +
  shade_p_value(obs_stat = ???, direction = "less")

Simulation p-value

null_dist |>
  get_p_value(obs_stat = ???, direction = "less")