library(tidyverse)
library(tidymodels)
run <- isTRUE(params$completed)Lab 4
Bootstrap Confidence Intervals and a Single-Proportion Test
Fill in each ??? with the correct code. Once all placeholders are filled in, change completed: false to completed: true in the YAML header above and render to HTML. For your final submission, change format: html to format: pdf.
Overview
Use a simulation-based workflow (bootstrap CI + hypothesis test) to analyze a governance reform example.
You are encouraged to have Module 3.2 and Module 4.1 open while completing this lab.
Getting Started
If tidymodels is not installed locally, run install.packages("tidymodels").
The Data
A municipal permit office introduces a transparency reform (posted fee schedules + anonymous reporting) to reduce unofficial payment requests.
In a sample of 50 permit applicants after the reform, 6 reported being asked for an unofficial payment and 44 did not.
permit_office <- tibble(
outcome = c(rep("asked_payment", 6), rep("not_asked", 44))
)Part 1: Explore the Sample (25 points)
Sample summary (10 pts)
Write code to:
- view the first 6 rows
- count outcomes
- calculate the observed sample payment-request rate (proportion
asked_payment)
# Write your code hereQuestion: What is the observed sample payment-request rate (proportion and percent)?
YOUR ANSWER HERE
Bar chart + short interpretation (15 pts)
Create a bar chart of outcome.
- include a title and axis labels
- use
theme_minimal()
# Write your plot code hereQuestion: Based only on this sample, does the reform look promising? Why is this not enough to make a strong causal claim?
YOUR ANSWER HERE
Part 2: Bootstrap Confidence Interval (45 points)
Estimate a 95% bootstrap confidence interval for the payment-request rate.
Bootstrap distribution (20 pts)
Fill in the blanks:
set.seed(66)
boot_df <- permit_office |>
specify(response = ???, success = ???) |>
generate(reps = 5000, type = ???) |>
calculate(stat = ???)
boot_dfCI estimate and plot (15 pts)
Use get_ci() and shade_ci() to estimate and visualize the 95% CI.
ci_95 <- boot_df |>
get_ci(level = 0.95)
boot_df |>
visualize() +
??? +
labs(
title = "Bootstrap Distribution of the Payment-Request Rate",
x = "Sample proportion asked for unofficial payment",
y = "Count"
) +
theme_minimal()CI interpretation (10 pts)
Write 2-3 sentences interpreting the 95% confidence interval in context.
YOUR INTERPRETATION HERE
Part 3: Single-Proportion Hypothesis Test (50 points)
Suppose the office’s historical payment-request rate is 20%, and the null hypothesis says the reform had no effect.
Hypotheses (10 pts)
Question: Is the payment-request rate after the reform lower than 20%?
- Null hypothesis (\(H_0\)):
- Alternative hypothesis (\(H_A\)):
YOUR ANSWERS HERE
Simulate the null distribution (20 pts)
Fill in the blanks for a simulation under \(H_0: p = 0.20\).
set.seed(71)
null_dist <- permit_office |>
specify(response = ???, success = ???) |>
hypothesize(null = "point", p = c("asked_payment" = ???, "not_asked" = ???)) |>
generate(reps = 2000, type = ???) |>
calculate(stat = ???)
null_distNull distribution plot (5 pts)
Create a histogram of null_dist and add a vertical line at the observed sample proportion.
# Write your plot code hereOne-sided p-value (10 pts)
Because \(H_A\) is “lower than 0.20,” calculate the proportion of simulated values less than or equal to the observed sample proportion.
null_dist |>
filter(stat <= ???) |>
summarize(p_value = ???)Conclusion (5 pts)
Using \(\alpha = 0.05\), state whether you reject or do not reject \(H_0\), then explain your conclusion in 2-3 sentences.
YOUR CONCLUSION HERE
Submission (20 points)
- Replace “YOUR NAME HERE” with your name
- Make sure all code chunks run without errors
- Render to PDF
- Submit the PDF to Blackboard
Hints (Short Version)
Observed proportion
permit_office |>
summarize(p_hat = mean(outcome == "asked_payment"))
Bootstrap workflow
permit_office |>
specify(response = outcome, success = "asked_payment") |>
generate(reps = 5000, type = "bootstrap") |>
calculate(stat = "prop")
Null distribution workflow
permit_office |>
specify(response = outcome, success = "asked_payment") |>
hypothesize(null = "point", p = c("asked_payment" = 0.20, "not_asked" = 0.80)) |>
generate(reps = 2000, type = "draw") |>
calculate(stat = "prop")
Simulation p-value
null_dist |>
filter(stat <= 0.12) |>
summarize(p_value = n() / nrow(null_dist))