April 23, 2024
modelData <- vdem |>
filter(year == 2006) |>
select(country_name,
libdem = v2x_libdem,
wealth = e_gdppc,
oil_rents = e_total_oil_income_pc,
polarization = v2cacamps,
corruption = v2x_corr,
judicial_review = v2jureview_ord,
region = e_regionpol_6C,
regime = v2x_regime) |>
mutate(log_wealth = log(wealth),
region = factor(
region,
labels=c("Eastern Europe",
"Latin America",
"MENA",
"SSAfrica",
"Western Europe and North America",
"Asia and Pacific"))
)
glimpse(modelData)
Rows: 177
Columns: 10
$ country_name <chr> "Mexico", "Suriname", "Sweden", "Switzerland", "Ghana"…
$ libdem <dbl> 0.482, 0.672, 0.888, 0.845, 0.640, 0.659, 0.768, 0.018…
$ wealth <dbl> 14.584, 10.008, 42.378, 48.410, 3.284, 10.580, 37.068,…
$ oil_rents <dbl> 694.847, 639.506, 0.000, 0.000, 6.380, 10.060, 2.635, …
$ polarization <dbl> -0.346, -1.748, -2.346, -1.711, -0.398, -0.029, -2.273…
$ corruption <dbl> 0.611, 0.226, 0.004, 0.023, 0.640, 0.400, 0.107, 0.892…
$ judicial_review <dbl> 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, …
$ region <fct> Latin America, Latin America, Western Europe and North…
$ regime <dbl> 2, 2, 3, 3, 3, 3, 3, 0, 1, 2, 1, 1, 2, 3, 2, 3, 3, 2, …
$ log_wealth <dbl> 2.6799250, 2.3033848, 3.7466294, 3.8797064, 1.1890622,…
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.383 0.0187 20.5 2.68e-48
2 polarization -0.0882 0.0140 -6.29 2.45e- 9
linear_reg() |>
set_engine("lm") |>
fit(libdem ~ polarization + log_wealth, data = modelData) |>
tidy()
# A tibble: 3 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.200 0.0344 5.80 0.0000000318
2 polarization -0.0550 0.0139 -3.96 0.000111
3 log_wealth 0.0952 0.0152 6.24 0.00000000334
\[\hat{Y_i} = a + b_1*Polarization + b_2*GDPpc\]
\[\hat{Y_i} = 0.18 + -0.05*Polarization + 0.10*GDPpc\]
\[\hat{Y_i} = a + b_1*Polarization + b_2*GDPpc\]
\[\hat{Y_i} = 0.18 + -0.05*Polarization + 0.10*GDPpc\]
\(a\) is the predicted level of Y when BOTH GDP per capita and polarization are equal to 0
\[\hat{Y_i} = a + b_1*Polarization + b_2*GDPpc\]
\[\hat{Y_i} = 0.18 + -0.05*Polarization + 0.10*GDPpc\]
\[\hat{Y_i} = a + b_1*Polarization + b_2*GDPpc\]
\[\hat{Y_i} = 0.18 + -0.05*Polarization + 0.10*GDPpc\]
\[\hat{Y_i} = a + b_1*Polarization + b_2*GDPpc\]
\[\hat{Y_i} = a + b_1*Polarization + b_2*GDPpc + b_3*OilRents\]
linear_reg() |>
set_engine("lm") |>
fit(libdem ~ polarization + log_wealth + oil_rents, data = modelData) |>
tidy()
# A tibble: 4 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.157 0.0322 4.86 2.78e- 6
2 polarization -0.0538 0.0130 -4.13 5.87e- 5
3 log_wealth 0.132 0.0152 8.67 4.76e-15
4 oil_rents -0.0000412 0.00000615 -6.70 3.39e-10
\[\hat{Y_i} = a + b_1*Polarization + b_2*GDPpc + b_3*OilRents\]
\[\hat{Y_i} = a + -.05*Polarization + .13*GDPpc + .00004*OilRents\]
10:00
# A tibble: 3 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.846 0.0268 31.5 7.42e-74
2 libdem -0.760 0.0588 -12.9 3.42e-27
3 polarization 0.0495 0.0121 4.10 6.33e- 5
# A tibble: 4 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.954 0.0283 33.8 5.21e-77
2 libdem -0.602 0.0576 -10.5 4.59e-20
3 polarization 0.0332 0.0109 3.04 2.71e- 3
4 log_wealth -0.0846 0.0127 -6.67 3.49e-10
Judicial Review:
\[\widehat{Democracy_{i}} = 0.17 + 0.28*JudicialReview(yes)\]
How should we interpret intercept? How about the coefficient on Latin America?
# A tibble: 6 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.436 0.0362 12.0 1.54e-24
2 regionLatin America 0.0685 0.0537 1.28 2.04e- 1
3 regionMENA -0.235 0.0573 -4.11 6.16e- 5
4 regionSSAfrica -0.138 0.0458 -3.00 3.07e- 3
5 regionWestern Europe and North America 0.373 0.0543 6.87 1.15e-10
6 regionAsia and Pacific -0.134 0.0521 -2.57 1.10e- 2
What if you want a different baseline category? How do we interpret now?
# make SS Africa the reference category
modelData <- modelData |>
mutate(newReg = relevel(region, ref=4))
linear_reg() |>
set_engine("lm") |>
fit(libdem ~ newReg, data = modelData) |>
tidy()
# A tibble: 6 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.298 0.0281 10.6 1.44e-20
2 newRegEastern Europe 0.138 0.0458 3.00 3.07e- 3
3 newRegLatin America 0.206 0.0486 4.24 3.63e- 5
4 newRegMENA -0.0977 0.0525 -1.86 6.44e- 2
5 newRegWestern Europe and North America 0.511 0.0493 10.4 7.57e-20
6 newRegAsia and Pacific 0.00351 0.0468 0.0750 9.40e- 1
Which types of regime have more corruption?
V-Dem also includes a categorial regime variable: Closed autocracy (0), Electoral Autocracy (1), Electoral Democracy (2), Liberal Democracy (3)
Which types of regime have more corruption?
First, let’s make this an easier factor variable to work with.
# Make nicer regime factor variable
modelData <- modelData |>
mutate(regime = factor(regime,
labels = c("Closed Autocracy",
"Electoral Autocracy",
"Electoral Democracy",
"Liberal Democracy")))
levels(modelData$regime)
[1] "Closed Autocracy" "Electoral Autocracy" "Electoral Democracy"
[4] "Liberal Democracy"
Which types of regime have more corruption?
10:00
# A tibble: 4 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.597 0.0395 15.1 1.93e-33
2 regimeElectoral Autocracy 0.148 0.0470 3.14 1.99e- 3
3 regimeElectoral Democracy -0.0580 0.0486 -1.19 2.35e- 1
4 regimeLiberal Democracy -0.459 0.0497 -9.23 9.63e-17