Answers to regression and inference
Materials for class on Thursday, November 29, 2018
These are the interpreted regression coefficients for the in-class activity for week 12 (since I was out of town). Don’t look at these until you’ve tried it on your own.
Load libraries and data
library(tidyverse)
library(moderndive)
happiness <- read_csv("https://statsf18.classes.andrewheiss.com/data/world_happiness.csv")
results_brexit <- read_csv("https://statsf18.classes.andrewheiss.com/data/brexit_results.csv")
World happiness
# The base case for region is "East Asia & Pacific"
# The base case for income is "High income"
model_happiness <- lm(happiness_score ~ life_expectancy +
access_to_electricity + region + income,
data = happiness)
model_happiness %>% get_regression_table()
term | estimate | std_error | statistic | p_value | lower_ci | upper_ci |
---|---|---|---|---|---|---|
intercept | 3.764 | 1.242 | 3.03 | 0.003 | 1.309 | 6.22 |
life_expectancy | 0.034 | 0.016 | 2.121 | 0.036 | 0.002 | 0.066 |
access_to_electricity | 0 | 0.005 | -0.041 | 0.967 | -0.01 | 0.01 |
regionEurope & Central Asia | -0.079 | 0.195 | -0.404 | 0.687 | -0.465 | 0.307 |
regionLatin America & Caribbean | 0.736 | 0.223 | 3.301 | 0.001 | 0.295 | 1.177 |
regionMiddle East & North Africa | -0.201 | 0.221 | -0.91 | 0.364 | -0.637 | 0.235 |
regionNorth America | 0.768 | 0.5 | 1.535 | 0.127 | -0.221 | 1.757 |
regionSouth Asia | -0.149 | 0.316 | -0.47 | 0.639 | -0.773 | 0.476 |
regionSub-Saharan Africa | -0.162 | 0.298 | -0.544 | 0.587 | -0.752 | 0.427 |
incomeLow income | -1.718 | 0.322 | -5.328 | 0 | -2.355 | -1.08 |
incomeLower middle income | -1.267 | 0.213 | -5.953 | 0 | -1.688 | -0.846 |
incomeUpper middle income | -0.88 | 0.173 | -5.088 | 0 | -1.222 | -0.538 |
model_happiness %>% get_regression_summaries()
r_squared | adj_r_squared | mse | rmse | sigma | statistic | p_value | df |
---|---|---|---|---|---|---|---|
0.699 | 0.676 | 0.3971 | 0.6302 | 0.656 | 30.23 | 0 | 12 |
I’m not going to condense this down into paragraph form. I just put each important piece of information in a list here. You have to use your writing skills to craft this into something readable.
- This model explains 68% of the variation in world happiness.
- Life expectancy has a statistically significant association with happiness. Controlling for access to electricity, region, and income, a one year increase in life expectancy is associated with a 0.03 point increase in happiness (p = 0.036).
- Access to electricity does not have a significant effect on national happiness when taking life expectancy, region, and income into account (p = 0.98).
- Regional differences generally do not have a significant effect on national happiness when controlling for life expectancy, access to electricity, and income, with the exception of Latin America and the Caribbean, which scores 0.74 points higher than East Asia on average (p = 0.001).
- On the other hand, differences in income do have a significant effect on happiness after controlling for life expectancy, access to electricity, and region. Upper middle income countries score 0.88 points lower than high income countries on average, while lower income countries score 1.27 points lower and low income countries score 1.72 points lower than high income countries, respectively. Each of these differences is statistically significant (p < 0.001).
Brexit
model_brexit <- lm(leave_share ~ con_2015 + lab_2015 + ukip_2015 +
degree + age_18to24 + born_in_uk + unemployed + male,
data = results_brexit)
model_brexit %>% get_regression_table()
term | estimate | std_error | statistic | p_value | lower_ci | upper_ci |
---|---|---|---|---|---|---|
intercept | 18.95 | 11.05 | 1.715 | 0.087 | -2.748 | 40.66 |
con_2015 | 0.163 | 0.017 | 9.677 | 0 | 0.13 | 0.196 |
lab_2015 | 0.04 | 0.017 | 2.331 | 0.02 | 0.006 | 0.073 |
ukip_2015 | 0.691 | 0.038 | 18.05 | 0 | 0.616 | 0.766 |
degree | -0.834 | 0.031 | -26.7 | 0 | -0.896 | -0.773 |
age_18to24 | -0.257 | 0.048 | -5.297 | 0 | -0.352 | -0.162 |
born_in_uk | -0.012 | 0.021 | -0.573 | 0.567 | -0.054 | 0.029 |
unemployed | 0.491 | 0.192 | 2.55 | 0.011 | 0.113 | 0.869 |
male | 0.657 | 0.207 | 3.173 | 0.002 | 0.25 | 1.063 |
model_brexit %>% get_regression_summaries()
r_squared | adj_r_squared | mse | rmse | sigma | statistic | p_value | df |
---|---|---|---|---|---|---|---|
0.917 | 0.916 | 9.688 | 3.113 | 3.137 | 778.6 | 0 | 9 |
Again, I’m not going to make this a pretty paragraph. You’re in charge of that.
- This model explains an astounding 92% of the variation in Leave votes.
- Conservative vote share in 2015 is significantly associated with Leave vote share. Controlling for all other variables in the model, a 1% increase in the Conservative vote share is associated with a 0.163% increase in the Leave vote share (p < 0.001).
- Labour votes are also significantly associated with Leave votes, but the size of the effect is likely not very substantial. A 1% increase in the Labour vote share is associated with a 0.04% increase in the Leave vote share (p = 0.02).
- UKIP votes, on the other hand, have a substantial and significant effect on Leave votes. A 1% increase in the UKIP vote share in a constituency is associated with a 0.69% increase in the Leave vote share (p < 0.001).
- Education has a sizable and significant effect too. A 1% increase in the proportion of people with a university degree is associated with a 0.83% drop in the Leave vote share (p < 0.001), showing that the propensity to vote to Leave decreases with more education.
- The likelihood of voting to leave also decreases as constituencies become younger. A 1% increase in the proportion of young people in a constituency is associated with a 0.26% drop in the Leave vote share, and this effect is significant (p < 0.001).
- After controlling for all other variables in the model, immigration status actually has no statistically significant effect on the Leave vote share (p = 0.57).
- Unemployment is associated with more Leave votes. A 1% increase in the proportion of unemployed people is associated with a 0.49% increase in the Leave vote share (p = 0.01).
- Finally, gender played a significant role too—a 1% increase in the proportion of a constituency that is male is associated with a 0.66% increase in the Leave vote share (p = 0.002).
Phew. Based on all this, older, unemployed, male UKIP voters with no university education were the most likely Brexit voters.