Numerical Data

April 23, 2024

Electoral Democracy Measure

To what extent is the ideal of electoral democracy in its fullest sense achieved?
Measure runs from 0 (lowest) to 1 (highest)
0.5 is a cutoff for distinguishing electoral democracy from electoral autocracy

The electoral principle of democracy seeks to embody the core value of making rulers responsive to citizens, achieved through electoral competition for the electorate’s approval under circumstances when suffrage is extensive; political and civil society organizations can operate freely; elections are clean and not marred by fraud or systematic irregularities; and elections affect the composition of the chief executive of the country. In between elections, there is freedom of expression and an independent media capable of presenting alternative views on matters of political relevance. – V-Dem Codebook

Other High-Level V-Dem Measures

Liberal Democracy
Egalitarian Democracy
Participatory Democracy
Deliberative Democracy

All continuous measures, ranging from 0 to 1. Let’s take a look at how to summarize data like this!

Data Setup

# Load packages 
library(vdemdata)
library(tidyverse)

# Create dataset for year 2022, with country name, year, and electoral dem
vdem2022 <- vdem |>
  filter(year == 2022)  |>
  select(
    country = country_name, 
    year, 
    polyarchy = v2x_polyarchy, 
    region = e_regionpol_6C 
    ) |>
  mutate(region = case_match(region, 
                        1 ~ "Eastern Europe", 
                        2 ~ "Latin America",  
                        3 ~ "Middle East",   
                        4 ~ "Africa", 
                        5 ~ "The West", 
                        6 ~ "Asia"))

Examine the Data

glimpse(vdem2022)

Rows: 179
Columns: 4
$ country   <chr> "Mexico", "Suriname", "Sweden", "Switzerland", "Ghana", "Sou…
$ year      <dbl> 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, …
$ polyarchy <dbl> 0.598, 0.770, 0.899, 0.898, 0.633, 0.692, 0.833, 0.093, 0.20…
$ region    <chr> "Latin America", "Latin America", "The West", "The West", "A…

How can we summarize measures of democracy? 🤔

We could calculate the mean.

vdem2022 |>
  summarize(mean_democracy = mean(polyarchy))

  mean_democracy
1          0.497

The mean is the average of the values. Common measure of central tendency but sensitive to outliers.

How can we summarize measures of democracy? 🤔

We could calculate the median.

vdem2022 |>
  summarize(median_democracy = median(polyarchy))

  median_democracy
1            0.501

The median is the value that separates the higher half from the lower half of the data.

We can also describe the shape of the distribution…

symmetric (e.g. normal)
right-skewed
left-skewed
unimodal (one peak)
bimodal (multiple peaks)

Histograms

Used to represent the distribution of a continuous variable
The x-axis represents the range of values
The y-axis represents the frequency of each value
The bars represent the number of observations in each range or “bin”
The shape of the histogram can tell us a lot about the distribution of the data

Symmetric Distributions

Skewed Distributions

Bimodal Distribution

When is the Mean Useful?

When is the `mean` useful?

The Mean works well as a summary statistic when the distribution is relatively symmetric
Not as well when distributions are skewed or bimodal (or multi-modal)
With skewed distributions, the mean is sensitive to extreme values
The median is more robust

Lesson

Always look at your data!!
When reading or in a presentation, ask yourself:
- Does the mean make sense given the distribution of the measure?
- Could extreme values in a skewed distribution make the mean not as useful?
- Have the analysts shown you the distribution? If not, ask about it!

Visualize Our Measure

mn <- mean(vdem2022$polyarchy)
med <- median(vdem2022$polyarchy)

ggplot(vdem2022, aes(x = polyarchy )) +
  geom_histogram(binwidth = .05, fill = "steelblue") +
   labs(
    x = "Electoral Democracy", 
    y = "Frequency", 
    title = "Distribution of Electoral Democracy in 2022", 
    caption = "Source: V-Dem Institute"
  ) + 
  geom_vline(xintercept = mn, linewidth = 1, color = "darkorange") +
  theme_minimal()

Visualize Our Measure

mn <- mean(vdem2022$polyarchy)
med <- median(vdem2022$polyarchy)

ggplot(vdem2022, aes(x = polyarchy )) +
  geom_histogram(binwidth = .05, fill = "steelblue") +
   labs(
    x = "Electoral Democracy", 
    y = "Frequency", 
    title = "Distribution of Electoral Democracy in 2022", 
    caption = "Source: V-Dem Institute"
  ) + 
  geom_vline(xintercept = mn, linewidth = 1, color = "darkorange") +
  theme_minimal()

Your Turn!

Look at the V-Dem codebook
Select a different high-level measure of democracy
Preprocess your data to include tha measure in your data frame
Calculate the mean and median and store as a variable
Visualize the distribution of the measure
Include a vertical line for the mean
Now try the median

05:00

Recap

We can use statistics like mean or median to describe the center of a variable
We can visualize the entire distribution to charachterize the distribution of the variable
We should also say something about the spread of the distribution

Why Measure and Visualize Spread?

Measures of Spread: Range

Range (min and max values)
Not ideal b/c does not tell us much about where most of the values are located

vdem2022 |>
  summarize(min = min(polyarchy),
            max = max(polyarchy))

    min   max
1 0.016 0.916

Measure of Spread: Interquartile Range

IQR: 25th percentile - 75th percentile

Interquartile Range

The middle 50 percent of the countries in the data lie between 0.262 and 0.747
The IQR (0.485) is the difference between the Q3 and Q1 values

vdem2022 %>% 
  summarize(IQRlow =  quantile(polyarchy, .25),
            IQRhigh = quantile(polyarchy, .75),
            IQRlength = IQR(polyarchy)
          )

  IQRlow IQRhigh IQRlength
1  0.262   0.747     0.485

Box Plot

A box plot is a graphical representation of the distribution based on the median and quartiles
It is a standardized way of displaying the distribution of data based on a five number summary: minimum, first quartile, median, third quartile, and maximum

Box Plot

Code

ggplot(vdem2022, aes(x = "", y = polyarchy)) +
  geom_boxplot(fill = "steelblue") + 
   labs(
    x = "", 
    y = "Electoral Democracy", 
    title = "Distribution of Electoral Democracy in 2022", 
    caption = "Source: V-Dem Institute"
  ) +
  theme_minimal()

Measure of Spead: Standard Deviation

Can think of it as something like the “average distance” of each data point from the mean

vdem2022 |>
  summarize(mean = mean(polyarchy),
            stdDev = sd(polyarchy))

   mean   stdDev
1 0.497 0.259951

Standard Deviation

A low standard deviation indicates that the values tend to be close to the mean
A high standard deviation indicates that the values are spread out over a wider range

Starting with Variance

Variance is a step towards calculating the standard deviation.
It quantifies the average squared deviation of each number from the mean of the data set.

Calculating Deviation from the Mean

First, calculate the mean (\(\bar{X}\)) of the dataset.
For each data point (\(X_i\)), calculate its deviation from the mean: \[e_i = X_i - \bar{X}\]
- Example with a mean of 5:
  - For a data point where \((X_i = 0): (0 - 5 = -5)\)
  - For a data point where \((X_i = 10): (10 - 5 = 5)\)

Squaring the Deviations

Squaring each deviation (\(e_i\)) to eliminate negative values: \[e_i^2 = (X_i - \bar{X})^2\]
Summing up all squared deviations: \[\sum_{i=1}^{n} (X_i - \bar{X})^2\]
This sum represents the total squared deviation from the mean.

Calculating the Variance

Divide the total squared deviation by \((n-1)\) (to account for the sample variance): \[\text{Variance} = \frac{1}{n-1} \sum_{i=1}^{n} (X_i - \bar{X})^2\]
Using \((n-1)\) ensures an unbiased estimate of the population variance when calculating from a sample.

Deriving the Standard Deviation

The standard deviation is the square root of the variance: \[s = \sqrt{\frac{1}{n-1} \sum_{i=1}^{n} (X_i - \bar{X})^2}\]
Taking the square root converts the variance back to the units of the original data.

Standard Deviation Simple Example

x = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
e <- x - mean(x)
e

 [1] -5 -4 -3 -2 -1  0  1  2  3  4  5

Standard Deviation Simple Example

x = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
e_squared <- e^2
e_squared

 [1] 25 16  9  4  1  0  1  4  9 16 25

Standard Deviation Simple Example

x = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
sum_e_squared <- sum(e_squared)
sum_e_squared

[1] 110

Standard Deviation Simple Example

variance <- sum_e_squared/(length(x)-1)
variance

[1] 11

Standard Deviation Simple Example

standard_dev <- sqrt(variance)
standard_dev

[1] 3.316625

sd(x)

[1] 3.316625

Your Turn!

Calculate measures of spread for the polyarchy variable in the V-Dem data (mean, median, IQR, standard deviation)
How would you interpret these measures?
Try a box plot for the polyarchy variable
Try another variable in the V-Dem data
How does it compare to polyarchy?

05:00

Calculating Statistics by groups

What if we want to describe electoral democracy and see how it differs by some different variable? For example, by world region, or by year?
In this case we want to combine numerical summaries with categorical variables
This brings us back to bar chart

Calculating Statistics by Groups

Let’s calculate the mean and median of electoral democracy in each world region
For this, we add the group_by() to our previous code

vdem2022 |>
  group_by(region) |>
  summarize(mean_dem = mean(polyarchy),
            median_dem = median(polyarchy))

# A tibble: 6 × 3
  region         mean_dem median_dem
  <chr>             <dbl>      <dbl>
1 Africa            0.403      0.371
2 Asia              0.424      0.428
3 Eastern Europe    0.533      0.558
4 Latin America     0.605      0.678
5 Middle East       0.235      0.213
6 The West          0.854      0.857

Calculating Statistics by Groups

Let’s store our statistics as a new data object, democracy_region

democracy_region <- vdem2022 |> 
  group_by(region) |>
  summarize(mean_dem = mean(polyarchy),
            median_dem = median(polyarchy))

democracy_region

# A tibble: 6 × 3
  region         mean_dem median_dem
  <chr>             <dbl>      <dbl>
1 Africa            0.403      0.371
2 Asia              0.424      0.428
3 Eastern Europe    0.533      0.558
4 Latin America     0.605      0.678
5 Middle East       0.235      0.213
6 The West          0.854      0.857

Visualize using our Bar Chart Skills

Code

ggplot(democracy_region, aes(x = reorder(region, -mean_dem), y = mean_dem)) +
  geom_col(fill = "steelblue") + 
  labs(
    x = "Region", 
    y = "Mean Polyarchy Score", 
    title = "Democracy by region, 1990 - present", 
    caption = "Source: V-Dem Institute"
    ) + 
  theme_minimal()

Numerical Variable by Group

How should we interpret this plot?

Code

library(ggridges)
#library(forcats)
  ggplot(vdem2022, aes(x = polyarchy, y = region, fill = region)) +
    geom_density_ridges() +
  labs(
    x = "Electoral Democracy",
    y = "Region",
    title = "A Ridge Plot",
    caption = "Source: V-Dem Institute",
  ) +
  scale_fill_viridis_d() +
  theme_minimal()

Your Turn!

Make a bar chart summarizing polyarchy or some other V-Dem variable
Now try your hand at a ridge plot

05:00

Numerical Data

Electoral Democracy Measure

Other High-Level V-Dem Measures

Data Setup

Examine the Data

Histograms

Symmetric Distributions

Symmetric Distributions

Skewed Distributions

Skewed Distributions

Bimodal Distribution

When is the Mean Useful?

When is the Mean Useful?

When is the Mean Useful?

When is the mean useful?

Lesson

Visualize Our Measure

Visualize Our Measure

Visualize Our Measure

Your Turn!

Recap

Why Measure and Visualize Spread?

Measures of Spread: Range

Measure of Spread: Interquartile Range

Interquartile Range

Box Plot

Box Plot

Measure of Spead: Standard Deviation

Standard Deviation

Starting with Variance

Calculating Deviation from the Mean

Squaring the Deviations

Calculating the Variance

Deriving the Standard Deviation

Standard Deviation Simple Example

Standard Deviation Simple Example

Standard Deviation Simple Example

Standard Deviation Simple Example

Standard Deviation Simple Example

Your Turn!

Calculating Statistics by groups

Calculating Statistics by Groups

Calculating Statistics by Groups

Visualize using our Bar Chart Skills

Numerical Variable by Group

Your Turn!

When is the `mean` useful?