Answers to regression and inference

Materials for class on Thursday, November 29, 2018

These are the interpreted regression coefficients for the in-class activity for week 12 (since I was out of town). Don’t look at these until you’ve tried it on your own.

Load libraries and data

library(tidyverse)
library(moderndive)
happiness <- read_csv("https://statsf18.classes.andrewheiss.com/data/world_happiness.csv")
results_brexit <- read_csv("https://statsf18.classes.andrewheiss.com/data/brexit_results.csv")

World happiness

# The base case for region is "East Asia & Pacific"
# The base case for income is "High income"
model_happiness <- lm(happiness_score ~ life_expectancy + 
                        access_to_electricity + region + income, 
                      data = happiness)
model_happiness %>% get_regression_table()
term estimate std_error statistic p_value lower_ci upper_ci
intercept 3.764 1.242 3.03 0.003 1.309 6.22
life_expectancy 0.034 0.016 2.121 0.036 0.002 0.066
access_to_electricity 0 0.005 -0.041 0.967 -0.01 0.01
regionEurope & Central Asia -0.079 0.195 -0.404 0.687 -0.465 0.307
regionLatin America & Caribbean 0.736 0.223 3.301 0.001 0.295 1.177
regionMiddle East & North Africa -0.201 0.221 -0.91 0.364 -0.637 0.235
regionNorth America 0.768 0.5 1.535 0.127 -0.221 1.757
regionSouth Asia -0.149 0.316 -0.47 0.639 -0.773 0.476
regionSub-Saharan Africa -0.162 0.298 -0.544 0.587 -0.752 0.427
incomeLow income -1.718 0.322 -5.328 0 -2.355 -1.08
incomeLower middle income -1.267 0.213 -5.953 0 -1.688 -0.846
incomeUpper middle income -0.88 0.173 -5.088 0 -1.222 -0.538
model_happiness %>% get_regression_summaries()
r_squared adj_r_squared mse rmse sigma statistic p_value df
0.699 0.676 0.3971 0.6302 0.656 30.23 0 12

I’m not going to condense this down into paragraph form. I just put each important piece of information in a list here. You have to use your writing skills to craft this into something readable.

Brexit

model_brexit <- lm(leave_share ~ con_2015 + lab_2015 + ukip_2015 +
                     degree + age_18to24 + born_in_uk + unemployed + male, 
                   data = results_brexit)
model_brexit %>% get_regression_table()
term estimate std_error statistic p_value lower_ci upper_ci
intercept 18.95 11.05 1.715 0.087 -2.748 40.66
con_2015 0.163 0.017 9.677 0 0.13 0.196
lab_2015 0.04 0.017 2.331 0.02 0.006 0.073
ukip_2015 0.691 0.038 18.05 0 0.616 0.766
degree -0.834 0.031 -26.7 0 -0.896 -0.773
age_18to24 -0.257 0.048 -5.297 0 -0.352 -0.162
born_in_uk -0.012 0.021 -0.573 0.567 -0.054 0.029
unemployed 0.491 0.192 2.55 0.011 0.113 0.869
male 0.657 0.207 3.173 0.002 0.25 1.063
model_brexit %>% get_regression_summaries()
r_squared adj_r_squared mse rmse sigma statistic p_value df
0.917 0.916 9.688 3.113 3.137 778.6 0 9

Again, I’m not going to make this a pretty paragraph. You’re in charge of that.

Phew. Based on all this, older, unemployed, male UKIP voters with no university education were the most likely Brexit voters.