Modeling

April 23, 2024

Modeling

  • Use models to explain the relationship between variables and to make predictions
  • Explaining relationships [usually interested in causal relationships, but not always]
    • Does oil wealth impact regime type?
  • Predictive modeling
    • Where is violence most likely to happen in [country X] during their next election?
    • Is this email spam?

Modeling

Modeling

Example: GDP per capita and Democracy

Pull in the VDEM Data


What is this code doing?

library(vdemdata)

modelData <- vdem |>
  filter(year == 2019) |> 
  select(
    country = country_name, 
    lib_dem = v2x_libdem, 
    wealth = e_gdppc) |>
  mutate(log_wealth = log(wealth))

glimpse(modelData)
Rows: 179
Columns: 4
$ country    <chr> "Mexico", "Suriname", "Sweden", "Switzerland", "Ghana", "So…
$ lib_dem    <dbl> 0.433, 0.593, 0.875, 0.870, 0.614, 0.601, 0.754, 0.267, 0.1…
$ wealth     <dbl> 16.814, 11.752, 48.804, 56.110, 5.608, 11.345, 39.061, 5.69…
$ log_wealth <dbl> 2.8222119, 2.4640234, 3.8878123, 4.0273140, 1.7241941, 2.42…

Plot the Relationship

Plot the Relationship

Plot the Relationship


ggplot(modelData, aes(x = wealth, y = lib_dem)) +
  geom_point() +
  geom_smooth(method = "lm", color = "#E48957", se = FALSE) +
  labs(x = "GPD per capita", y = "Liberal Democracy Index") +
  theme_bw()

Using the Scales Package

Using the Scales Package


ggplot(modelData, aes(x = wealth, y = lib_dem)) +
  geom_point() +
  geom_smooth(method = "lm", color = "#E48957", se = FALSE) +
  scale_x_log10(label = scales::label_dollar(suffix = "k")) +
  labs(
    title = "Wealth and Democracy, 2019",
    x = "GPD per capita", 
    y = "Liberal Democracy Index") +
  theme_bw()

Models as Functions

  • We can represent relationships between variables using functions
  • A function is a mathematical concept: the relationship between an output and one or more inputs
    • Plug in the inputs and receive back the output
  • Example: The formula \(y = 3x + 7\) is a function with input \(x\) and output \(y\).
    • If \(x\) is \(5\), \(y\) is \(22\),
    • \(y = 3 \times 5 + 7 = 22\)

Quant Lingo


  • Response variable: Variable whose behavior or variation you are trying to understand, on the y-axis in the plot
    • Dependent variable
    • Outcome variable
    • Y variable
  • Explanatory variables: Other variables that you want to use to explain the variation in the response, on the x-axis in the plot
    • Independent variables
    • Predictors


Linear model with one explanatory variable…

  • \(Y = a + bX\)
  • \(Y\) is the outcome variable
  • \(X\) is the explanatory variable
  • \(a\) is the intercept: the predicted value of \(Y\) when \(X\) is equal to 0
  • \(b\) is the slope of the line [remember rise over run!]

Quant Lingo


  • Predicted value: Output of the model function
    • The model function gives the typical (expected) value of the response variable conditioning on the explanatory variables
    • We often call this \(\hat{Y}\) to differentiate the predicted value from an observed value of Y in the data
  • Residuals: A measure of how far each case is from its predicted value (based on a particular model)
    • Residual = Observed value (\(Y\)) - Predicted value (\(\hat{Y}\))
    • How far above/below the expected value each case is

Residuals

Linear Model

\(\hat{Y} = a + b \times X\)

\(\hat{Y} = 0.13 + 0.12 \times X\)

Linear Model: Interpretation


\(\hat{Y} = a + b \times X\)
\(\hat{Y} = 0.13 + 0.12 \times X\)

What is the interpretation of our estimate of \(a\)?


\(\hat{Y} = 0.13 + 0.12 \times 0\)
\(\hat{Y} = 0.13\)

\(a\) is our predicted level of democracy when GDP per capita is 0.

Linear Model: Interpretation


\(\hat{Y} = a + b \times X\)
\(\hat{Y} = 0.13 + 0.12 \times X\)

What is interpretation of our estimate of \(b\)?


\(\hat{Y} = a + \frac{Rise}{Run} \times X\)
\(\hat{Y} = a + \frac{Change Y}{Change X} \times X\)

Linear Model: Interpretation


\(b = \frac{Change Y}{Change X}\)
\(0.12 = \frac{Change Y}{Change X}\)
\({Change Y} = 0.12 * {ChangeX}\)


When \(ChangeX = 1\):
\({Change Y = 0.12}\)


\(b\) is the predicted change in \(Y\) associated with a ONE unit change in X.

Linear Model: Interpretation

Linear Model: Interpretation

Linear Model: Interpretation

Linear Model: Interpretation


Is this the causal effect of GDP per capita on liberal democracy?


No! It is only the association…


To identify causality we need other methods (beyond the scope of this course).

Your Task


An economist is interested in the relationship between years of education and hourly wages. They estimate a linear model with estimates of \(a\) and \(b\) as follows:


\(\hat{Y} = 9 + 1.60*{YrsEdu}\)


1. Interpret \(a\) and \(b\)
2. What is the predicted hourly wage for those with 10 years of education?

Next step


  • Linear model with one predictor: \(Y = a + bX\)
  • For any given data…
  • How do we figure out what the best values are for \(a\) and \(b\)??