Take an average or some other statistic for a group
Rank from high to low values of summary value
Setup
# Load packageslibrary(vdemdata) # to download V-Dem datalibrary(dplyr)# Download the datademocracy <- vdem |># download the V-Dem datasetfilter(year ==2015) |># filter year, keep 2015select( # select (and rename) these variablescountry = country_name, # the name before the = sign is the new name vdem_ctry_id = country_id, # the name after the = sign is the old name year, polyarchy = v2x_polyarchy,libdem = v2x_libdem,corruption = v2x_corr,gdp_pc = e_gdppc, region = e_regionpol_6C ) |>mutate(region =case_match(region, # replace the values in region with country names1~"Eastern Europe", 2~"Latin America", 3~"Middle East", 4~"Africa", 5~"The West", 6~"Asia") )# View the dataglimpse(democracy)
Summarize by Region
# group_by(), summarize() and arrange()dem_summary <- democracy |># save result as new objectgroup_by(region) |># group data by regionsummarize( # summarize following vars (by region)polyarchy =mean(polyarchy, na.rm =TRUE), # calculate mean, remove NAslibdem =median(libdem, na.rm =TRUE),corruption =sd(corruption, na.rm =TRUE),gdp_pc =max(gdp_pc, na.rm =TRUE) ) |>arrange(desc(polyarchy)) # arrange in descending order by polyarchy score# Print the datadem_summary
Summarize by Region
# A tibble: 6 × 5
region polyarchy libdem corruption gdp_pc
<chr> <dbl> <dbl> <dbl> <dbl>
1 The West 0.876 0.824 0.0647 81.7
2 Latin America 0.648 0.476 0.281 30.8
3 Eastern Europe 0.548 0.419 0.292 31.7
4 Asia 0.443 0.312 0.263 64.8
5 Africa 0.435 0.261 0.231 30.6
6 Middle East 0.271 0.171 0.250 91.2
Use group_by() to group countries by region…
# group_by(), summarize() and arrange()dem_summary <- democracy |># save result as new objectgroup_by(region) |># group data by regionsummarize( # summarize following vars (by region)polyarchy =mean(polyarchy, na.rm =TRUE), # calculate mean, remove NAslibdem =median(libdem, na.rm =TRUE),corruption =sd(corruption, na.rm =TRUE),gdp_pc =max(gdp_pc, na.rm =TRUE) ) |>arrange(desc(polyarchy)) # arrange in descending order by polyarchy score# Print the datadem_summary
Use summarize() to get the regional means polyarchy and gpd_pc….
# group_by(), summarize() and arrange()dem_summary <- democracy |># save result as new objectgroup_by(region) |># group data by regionsummarize( # summarize following vars (by region)polyarchy =mean(polyarchy, na.rm =TRUE), # calculate mean, remove NAslibdem =median(libdem, na.rm =TRUE),corruption =sd(corruption, na.rm =TRUE),gdp_pc =max(gdp_pc, na.rm =TRUE) ) |>arrange(desc(polyarchy)) # arrange in descending order by polyarchy score# Print the datadem_summary
Then use arrange() with desc() to sort in descending order by polyarchy score…
# group_by(), summarize() and arrange()dem_summary <- democracy |># save result as new objectgroup_by(region) |># group data by regionsummarize( # summarize following vars (by region)polyarchy =mean(polyarchy, na.rm =TRUE), # calculate mean, remove NAslibdem =median(libdem, na.rm =TRUE),corruption =sd(corruption, na.rm =TRUE),gdp_pc =max(gdp_pc, na.rm =TRUE) ) |>arrange(desc(polyarchy)) # arrange in descending order by polyarchy score# Print the datadem_summary
We are printing the data frame instead of using glimpse() here…
# group_by(), summarize() and arrange()dem_summary <- democracy |># save result as new objectgroup_by(region) |># group data by regionsummarize( # summarize following vars (by region)polyarchy =mean(polyarchy, na.rm =TRUE), # calculate mean, remove NAslibdem =median(libdem, na.rm =TRUE),corruption =sd(corruption, na.rm =TRUE),gdp_pc =max(gdp_pc, na.rm =TRUE) ) |>arrange(desc(polyarchy)) # arrange in descending order by polyarchy score# Print the datadem_summary
Some Common Arithmetic Functions
sqrt() square root
log() natural logarithm
mean() mean
median() median
sd() standard deviation
Try it Yourself
Try running a group_by(), summarize() and arrange() in your Quarto document
Try changing the parameters to answer these questions:
Try summarizing the data with a different function for one or more of the variables.
What is the median value of polyarchy for The West?
What is the max value of libdem for Eastern Europe?
What is the standard deviation of corruption for Africa?
What is the mean of gdp_pc for the Middle East?
Now try grouping by country instead of region.
What is the median value of polyarchy for Sweden?
What is the max value of libdem New Zealand?
What is the standard deviation of corruption for Spain?
What is the interquartile range of gdp_pc for Germany?
Sort countries in descending order based on the mean value of gdp_pc (instead of the median value of polyarchy). Which country ranks first based on this sorting?
Now try sorting countries in ascending order based on the median value of libdem (hint: delete “desc” from the arrange() call). Which country ranks at the “top” of the list?
05:00
Visualize It!
library(ggplot2)ggplot(dem_summary, aes(x =reorder(region, -polyarchy), y = polyarchy)) +geom_col(fill ="steelblue") +labs(x ="Region", y ="Avg. Polyarchy Score", title ="Democracy by region, 2015", caption ="Source: V-Dem Institute" ) +theme_minimal()
Visualize It!
Try it Yourself
Run the code and a bar chart with the dem_summary data you wrangled, again grouping by region (instead of country)
Try visualizing different variables, e.g. libdem, corruption, gdp_pc
Try different summary statistics, e.g. mean, median, standard deviation, etc.
Try grouping by country instead of region and visualizing that