Module 3.1

Working with APIs

Prework
  • Install the devtools package. Type install.packages("devtools") in your console. You will need this to install the vdemdata package because it is not on the CRAN Network.
  • Install the vdemdata package from GitHub. Type devtools::install_github("vdeminstitute/vdemdata") in your console.
  • Generate a quarto document named “module-1.2.qmd” in your modules project folder so that you can code along with me

Overview

In this module working, we are going to be working with data from an API. We are going to be working with the vdemdata in this lesson and some others later in the course.

Along the way, we are going to continue to extend our data wrangling skills. We will learn some new functions in dplyr that will help us get our data into a usable form for analysis. We are also going to cover in depth some common data science workflows, including filtering observations, selecting variables, summarizing data for different groups, and sorting data based on column values.

Downloading data from the V-Dem API

The first thing we want to talk about is how to filter observations and to select variables for analysis. We will also delve into the topic of how to create new variables. To illustrate these concepts, we are going to be working with the V-Dem Dataset. The V-Dem offers an R package for downloading its data called vdemdata.

vdemdata is perfect for illustrating the filter() and select() verbs because its main function for downloading the data (vdem) does not take any arguments (it simply downloads the whole dataset). So you have to use R functions to narrow down the variables and years you want to work with.

While V-Dem has wealth of indicators related to democracy, we are going to focus on the most famous one called the “polyarchy” score. We are also going to download data on per capita GDP and create some indicator variables for region that we will use later on when we summarize the data. Along with those variables, we also want to retain country_name, year and country_id for the purposes of merging these data with our World Bank data.

Note

While V-Dem has a variable look-up tool (find_var), it does not provide very much information on the variables that the search function returns. Therefore, if you want to use this package for your own research, I highly recommend just going to the V-Dem codebook and manually grabbing the codes for the indicators that you want to use in your analysis.

In addition to filtering out years and selecting variables, let’s also create a region coding to facilitate our analysis later on. We will do this by piping in a mutate() call where we use the case_match() function to change the region from a numeric variable to a string. This will come in handy when we go to visualize the data in future lessons.

We will store our new data as an object called democracy.

# Load packages
library(dplyr)
library(vdemdata) # to download V-Dem data

# Download the data
democracy <- vdem |> # download the V-Dem dataset
  filter(year >= 1990)  |> # filter out years less than 1990
  select(                  # select (and rename) these variables
    country = country_name,     # the name before the = sign is the new name  
    vdem_ctry_id = country_id,  # the name after the = sign is the old name
    year, 
    polyarchy = v2x_polyarchy, 
    gdp_pc = e_gdppc, 
    region = e_regionpol_6C
    ) |>
  mutate(
    region = case_match(region, # replace the values in region with names
                     1 ~ "Eastern Europe", 
                     2 ~ "Latin America",  
                     3 ~ "Middle East",   
                     4 ~ "Africa", 
                     5 ~ "The West", 
                     6 ~ "Asia")
                    # number on the left of the ~ is the V-Dem region code
                    # we are changing the number to the country name on the right
                    # of the equals sign
  )

# View the data
glimpse(democracy)
Rows: 5,846
Columns: 6
$ country      <chr> "Mexico", "Mexico", "Mexico", "Mexico", "Mexico", "Mexico…
$ vdem_ctry_id <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, …
$ year         <dbl> 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 199…
$ polyarchy    <dbl> 0.396, 0.418, 0.441, 0.451, 0.472, 0.489, 0.511, 0.560, 0…
$ gdp_pc       <dbl> 11.389, 11.635, 11.883, 11.983, 12.043, 11.742, 12.059, 1…
$ region       <chr> "Latin America", "Latin America", "Latin America", "Latin…