Module 2.1

Bar Charts and Histograms

Prework
  • Have a look at the documentation for ggplot2
  • Familiarize yourself with the ggplot2 cheatseet
  • Generate a quarto document named “module-2.qmd” in youe modules project folder so that you can code along with me

If you have installed the Tidyverse, then you should already have the packages for this model, including ggplot2. You can go ahead and load ggplot2 along with readr and dplyr.

Note that you could also load these three packages by running library(tidyverse). However, it is good to be intentional about which packages we are loading as we are learning them.

Overview

Last week we learned how to gather and wrangle data. This week we are going to start visualizing it with the ggplot2. In this module we will learn to make bar charts and histograms.

Along the way we are going to be talking about the “grammar of graphics” that ggplot2 is based on. The “gg” in ggplot stands for “grammar of graphics.” The grammar of graphics is a layered approach to constructing graphs based on a book by Leland Wilkinson.

The idea is that each visualization you make is going to contain cerain elements. You will start with some data. Then you will incorporate some “aesthetics” which you can think of as the dimensions of the visualization (x-axis, y-axis and color, size or shapes for additional dimensions). Next you identify a geometric obejct that you want to use such as a bar, a line or a point. From there you can customize various elements of the plot like the title and axis scales and labels.

Bar charts

Let’s get started with our first visualization–a basic bar chart. Bar charts are good for comparing data across cases. Our aim here is going to be to summarize levels of democracy across different regions like we did in the last lesson, but this time we will illustrate the differences with a chart.

We will start by loading in the dem_summary.csv file that you can find here. Next we will do our first ggplot() call. The ggplot() function takes two arguments: data and mapping. data refers to the data frame that includes the variables we want to visualize and mapping refers to the aesthetics mappings for the visualization. The aesthetics mappings are themselves presented in a quoting function aes() that defines the x and y values of the plot along with other aesthetic values like fill, color and linetype. We will focus on x and y values here and return to these additional aesthetic values later.

After our ggplot() call, we can add a series of additional functions to define our visualization following a + sign. The most important group are the geoms which will define the basic type of plot we want to make. In this case, we are calling geom_col() for our histogram and specifying that the fill color should be “steelblue.”

From there we will further customize our visualization with the labs() function to provide a title, axis labels and a caption.

dem_summary <- read_csv("data/dem_summary.csv")

ggplot(dem_summary, aes(x = region, y = polyarchy)) + # ggplot call
  geom_col(fill = "steelblue") + # we use geom_col() for a a bar chart
  labs(
    x = "Region", 
    y = "Avg. Polyarchy Score", 
    title = "Democracy by region, 1990 - present", 
    caption = "Source: V-Dem Institute"
    )

This looks pretty good but frequently we would want the bars of our bar chart to be sorted in order of the values being displayed. Let’s go ahead and add the fct_reorder() from the forcats package to our aes() call so that we are reordering the bars based on descending values of the average polyarchy score.

library(forcats)

ggplot(dem_summary, aes(x = fct_reorder(region, -polyarchy), y = polyarchy)) +
  geom_col(fill = "steelblue") + 
  labs(
    x = "Region", 
    y = "Avg. Polyarchy Score", 
    title = "Democracy by region, 1990 - present", 
    caption = "Source: V-Dem Institute"
    )

Histograms

Now let’s do another ggplot() call to make a histogram. We use histograms when we want to show how our data are distributed.

We’ll start by reading in the dem_women.csv file that you can download here. From there, we call ggplot(), specifying the polyarchy score on x-axis. But this time we change the geom to geom_histogram(). We also change the title and axis labels to reflect the fact that we are plotting the number of cases falling in each bin.

Note

Note that we leave the y-axis blank for the histogram because ggplot will automatically know to plot the number of units in each bin on the y-axis.

dem_women_2015 <- read_csv("data/dem_women.csv") |> 
  filter(year == 2015) 

ggplot(dem_women_2015, aes(x = polyarchy)) + # only specify x for histogram
  geom_histogram(fill = "steelblue") + # geom is a histogram
  labs(
    x = "Polyarchy Score, 2015", 
    y = "Count",
    title = "Distribution of democracy, 2015", 
    caption = "Source: V-Dem Institute"
    )