Data wrangling with the Tidyverse II

Materials for class on Thursday, October 4, 2018

Contents

Slides

Download the slides from today’s lecture:

First slide

2016 elections, food security, and mortality

Setting up your project

(Practice makes perfect!)

Do the following:

  1. Create a new RStudio project named “elections-food-death” (or something) and put it somewhere on your computer.

  2. Navigate to that new project folder on your computer with Windows File Explorer or macOS Finder (i.e. however you look at files on your computer). Create a new folder in your project called “data”.

  3. Download this CSV file: clean_combined_data.csvYou’ll probably need to right click on the link and select “Save link as…” or something similar—often browsers will load the CSV file like a web page, which isn’t helpful.

    The R code I used to clean and merge these three datasets can be seen here. The raw data can be downloaded here.

  4. Using Windows File Explorer or macOS Finder, move the newly downloaded CSV file into the “data” folder you created.

  5. Download this R Markdown file: elections-food-death-questions.RmdAgain, you’ll probably need to right click on the link and select “Save link as…”

    and place it in your newly-created project (but not in your data folder—put it in the main directory).

In the end, your project folder should be structured like this:

elections-food-death\
  elections-food-death-questions.Rmd
  elections-food-death.Rproj
  data\
    clean_combined_data.csv

Questions

I provided you with once chunk in the R Markdown file to load the pre-cleaned data. Choose a few of these questions and answer them with a table or a plot (or both). You should be able to answer most with filter(BLAH == "BLOOP") or group_by(BLAH) %>% summarize(SOMETHING = BLAH) and with ggplot().

(Note: I have no idea what you will find here. This is a fishing expedition; see what interesting stories you can find!)

Filtering, grouping, and summarizing

Relationships and correlations

What is the relationship between the following variables at a county (or state) level? Which counties (or states) have the strongest or weakest relationships? What could that relationship possibly mean?

Clearest and muddiest things

Go to this form and answer these three questions:

  1. What was the muddiest thing from class today? What are you still wondering about?
  2. What was the clearest thing from class today?
  3. What was the most exciting thing you learned?

I’ll compile the questions and send out answers after class.