Lab 2

Wrangling and Exploring Democracy Data

Author

YOUR NAME HERE

Getting the Lab Files

Open the Lab 2 project in the course space on Posit Cloud to get started.

Overview

In this lab, you will practice the main skills from Modules 2.1 and 2.2. You will:

Load and wrangle V-Dem data
Summarize democracy indicators by region
Create a column chart from summarized data
Explore categorical regime data with geom_bar()
Compare regime proportions across regions
Write brief interpretations of your results

Fill in each ??? with the correct code. You are encouraged to have the Week 2 lecture materials and the V-Dem codebook open while completing the lab.

Getting Started

Load the required packages.

library(vdemdata)
library(tidyverse)
run <- isTRUE(params$completed)

Installing vdemdata locally

If you are working on your own computer and do not have vdemdata installed, install pak and then install vdemdata from GitHub:

install.packages("pak")
pak::pkg_install("vdeminstitute/vdemdata")

Part 1: Wrangle Democracy Data (30 points)

In this part, you will build a clean dataset from V-Dem using filter(), select(), and mutate().

Step 1: Create a Wrangled Dataset (20 pts)

Filter the data to one year, select the variables you need, and recode the region variable into readable labels.

democracy <- vdem |>
  ???(year == ???) |>
  ???(
    country = ???,
    year,
    polyarchy = ???,
    libdem = ???,
    gdp_pc = ???,
    region = ???
  ) |>
  ???(
    region = case_match(region,
      1 ~ "Eastern Europe",
      2 ~ "Latin America",
      3 ~ "Middle East",
      4 ~ "Africa",
      5 ~ "The West",
      6 ~ "Asia")
  )

Step 2: Examine the Data (5 pts)

Run glimpse() on your new data frame.

glimpse(???)

Step 3: Brief Response (5 pts)

Write 2-3 sentences answering these questions:

What year did you choose?
How many rows and columns does your wrangled dataset have?
Why is it useful to recode region before plotting?

YOUR RESPONSE HERE

Part 2: Summarize and Visualize by Region (35 points)

In this part, you will practice group_by(), summarize(), arrange(), and geom_col().

Step 1: Summarize the Data (15 pts)

Create a regional summary dataset. Use mean() for polyarchy and gdp_pc, and sort from highest to lowest polyarchy.

dem_summary <- democracy |>
  ???(region) |>
  ???(
    polyarchy = ???(polyarchy, na.rm = TRUE),
    gdp_pc = ???(gdp_pc, na.rm = TRUE)
  ) |>
  ???(desc(polyarchy))

Step 2: Print the Summary (5 pts)

Print your summarized data frame.

???

Step 3: Create a Column Chart (10 pts)

Create a column chart of average polyarchy by region. Reorder the bars from highest to lowest.

# Write your code here

Step 4: Brief Interpretation (5 pts)

Write 2-3 sentences describing what you see.

Which region has the highest average polyarchy score?
Which region has the lowest?
Does the ranking make sense to you?

YOUR RESPONSE HERE

Part 3: Explore Categorical Regime Data (35 points)

In this part, you will work with V-Dem’s v2x_regime variable and use count() and geom_bar().

Step 1: Build a Regime Dataset (15 pts)

Filter the data to 2022, select country, regime, and region, and recode both region and regime.

vdem2022_regime <- vdem |>
  ???(year == ???) |>
  ???(
    country = ???,
    regime = ???,
    region = ???
  ) |>
  ???(
    region = case_match(region,
      1 ~ "Eastern Europe",
      2 ~ "Latin America",
      3 ~ "Middle East",
      4 ~ "Africa",
      5 ~ "The West",
      6 ~ "Asia"),
    regime = case_match(regime,
      0 ~ "Closed Autocracy",
      1 ~ "Electoral Autocracy",
      2 ~ "Electoral Democracy",
      3 ~ "Liberal Democracy")
  )

Step 2: Count Regime Types (5 pts)

Create a frequency table of regime types.

vdem2022_regime |>
  ???(regime)

Step 3: Create a Bar Chart of Regime Types (5 pts)

Use geom_bar() to plot the distribution of regime types.

# Write your code here

Step 4: Compare Regions with Proportions (5 pts)

Now create a second bar chart comparing regime types across regions using position = "fill".

# Write your code here

Step 5: Brief Interpretation (5 pts)

Write 2-3 sentences describing what you see.

Which regime type is most common overall?
Which region appears to have the largest share of liberal democracies?
Why is position = "fill" useful here?

YOUR RESPONSE HERE

Submission

Replace “YOUR NAME HERE” at the top with your actual name
Make sure all code chunks run without errors
Change completed: false to completed: true
Render to HTML to check your work
Change format: html to format: pdf
Submit the PDF to Blackboard

Hints

Only look at these if you are stuck.

Hint 1 - Wrangling structure

new_data <- vdem |>
  filter(year == 2015) |>
  select(
    country = country_name,
    year,
    polyarchy = v2x_polyarchy
  ) |>
  mutate(...)

Hint 2 - Summarizing structure

summary_data <- democracy |>
  group_by(region) |>
  summarize(
    polyarchy = mean(polyarchy, na.rm = TRUE)
  ) |>
  arrange(desc(polyarchy))

Hint 3 - geom_col()

ggplot(dem_summary, aes(x = reorder(region, -polyarchy), y = polyarchy)) +
  geom_col()

Hint 4 - geom_bar()

ggplot(vdem2022_regime, aes(x = regime)) +
  geom_bar()

Hint 5 - Proportional bar chart

ggplot(vdem2022_regime, aes(x = region, fill = regime)) +
  geom_bar(position = "fill")