Answers to regression and inference

Materials for class on Thursday, November 29, 2018

These are the interpreted regression coefficients for the in-class activity for week 12 (since I was out of town). Don’t look at these until you’ve tried it on your own.

Load libraries and data

library(tidyverse)
library(moderndive)

happiness <- read_csv("https://statsf18.classes.andrewheiss.com/data/world_happiness.csv")
results_brexit <- read_csv("https://statsf18.classes.andrewheiss.com/data/brexit_results.csv")

World happiness

# The base case for region is "East Asia & Pacific"
# The base case for income is "High income"
model_happiness <- lm(happiness_score ~ life_expectancy + 
                        access_to_electricity + region + income, 
                      data = happiness)

model_happiness %>% get_regression_table()

term	estimate	std_error	statistic	p_value	lower_ci	upper_ci
intercept	3.764	1.242	3.03	0.003	1.309	6.22
life_expectancy	0.034	0.016	2.121	0.036	0.002	0.066
access_to_electricity	0	0.005	-0.041	0.967	-0.01	0.01
regionEurope & Central Asia	-0.079	0.195	-0.404	0.687	-0.465	0.307
regionLatin America & Caribbean	0.736	0.223	3.301	0.001	0.295	1.177
regionMiddle East & North Africa	-0.201	0.221	-0.91	0.364	-0.637	0.235
regionNorth America	0.768	0.5	1.535	0.127	-0.221	1.757
regionSouth Asia	-0.149	0.316	-0.47	0.639	-0.773	0.476
regionSub-Saharan Africa	-0.162	0.298	-0.544	0.587	-0.752	0.427
incomeLow income	-1.718	0.322	-5.328	0	-2.355	-1.08
incomeLower middle income	-1.267	0.213	-5.953	0	-1.688	-0.846
incomeUpper middle income	-0.88	0.173	-5.088	0	-1.222	-0.538

model_happiness %>% get_regression_summaries()

r_squared	adj_r_squared	mse	rmse	sigma	statistic	p_value	df
0.699	0.676	0.3971	0.6302	0.656	30.23	0	12

I’m not going to condense this down into paragraph form. I just put each important piece of information in a list here. You have to use your writing skills to craft this into something readable.

This model explains 68% of the variation in world happiness.
Life expectancy has a statistically significant association with happiness. Controlling for access to electricity, region, and income, a one year increase in life expectancy is associated with a 0.03 point increase in happiness (p = 0.036).
Access to electricity does not have a significant effect on national happiness when taking life expectancy, region, and income into account (p = 0.98).
Regional differences generally do not have a significant effect on national happiness when controlling for life expectancy, access to electricity, and income, with the exception of Latin America and the Caribbean, which scores 0.74 points higher than East Asia on average (p = 0.001).
On the other hand, differences in income do have a significant effect on happiness after controlling for life expectancy, access to electricity, and region. Upper middle income countries score 0.88 points lower than high income countries on average, while lower income countries score 1.27 points lower and low income countries score 1.72 points lower than high income countries, respectively. Each of these differences is statistically significant (p < 0.001).

Brexit

model_brexit <- lm(leave_share ~ con_2015 + lab_2015 + ukip_2015 +
                     degree + age_18to24 + born_in_uk + unemployed + male, 
                   data = results_brexit)

model_brexit %>% get_regression_table()

term	estimate	std_error	statistic	p_value	lower_ci	upper_ci
intercept	18.95	11.05	1.715	0.087	-2.748	40.66
con_2015	0.163	0.017	9.677	0	0.13	0.196
lab_2015	0.04	0.017	2.331	0.02	0.006	0.073
ukip_2015	0.691	0.038	18.05	0	0.616	0.766
degree	-0.834	0.031	-26.7	0	-0.896	-0.773
age_18to24	-0.257	0.048	-5.297	0	-0.352	-0.162
born_in_uk	-0.012	0.021	-0.573	0.567	-0.054	0.029
unemployed	0.491	0.192	2.55	0.011	0.113	0.869
male	0.657	0.207	3.173	0.002	0.25	1.063

model_brexit %>% get_regression_summaries()

r_squared	adj_r_squared	mse	rmse	sigma	statistic	p_value	df
0.917	0.916	9.688	3.113	3.137	778.6	0	9

Again, I’m not going to make this a pretty paragraph. You’re in charge of that.

This model explains an astounding 92% of the variation in Leave votes.
Conservative vote share in 2015 is significantly associated with Leave vote share. Controlling for all other variables in the model, a 1% increase in the Conservative vote share is associated with a 0.163% increase in the Leave vote share (p < 0.001).
Labour votes are also significantly associated with Leave votes, but the size of the effect is likely not very substantial. A 1% increase in the Labour vote share is associated with a 0.04% increase in the Leave vote share (p = 0.02).
UKIP votes, on the other hand, have a substantial and significant effect on Leave votes. A 1% increase in the UKIP vote share in a constituency is associated with a 0.69% increase in the Leave vote share (p < 0.001).
Education has a sizable and significant effect too. A 1% increase in the proportion of people with a university degree is associated with a 0.83% drop in the Leave vote share (p < 0.001), showing that the propensity to vote to Leave decreases with more education.
The likelihood of voting to leave also decreases as constituencies become younger. A 1% increase in the proportion of young people in a constituency is associated with a 0.26% drop in the Leave vote share, and this effect is significant (p < 0.001).
After controlling for all other variables in the model, immigration status actually has no statistically significant effect on the Leave vote share (p = 0.57).
Unemployment is associated with more Leave votes. A 1% increase in the proportion of unemployed people is associated with a 0.49% increase in the Leave vote share (p = 0.01).
Finally, gender played a significant role too—a 1% increase in the proportion of a constituency that is male is associated with a 0.66% increase in the Leave vote share (p = 0.002).

Phew. Based on all this, older, unemployed, male UKIP voters with no university education were the most likely Brexit voters.