Due by 11:59 PM on Wednesday, December 19, 2018
For your final project, you will take a dataset from the wild, explore it, wrangle and clean it, tell a story with it, and make inferences about it using regression analysis and other statistical tools.
You will complete this project as a team. Your team will produce one R Markdown report in the end that each member will contribute to jointly.
Here’s a fake version of what a final project might look like:This text was all generated by a bot that took the text of famous books and tried to generate paragraphs that could hypothetically fit in those books.
Your project shouldn’t be a 1-to-1 transformation of this example (i.e. you don’t need to have exactly five paragraphs in the introduction; you don’t need to include a figure after the second paragraph in the data and methods section; etc.). This just shows how you’ll mix longer text with code, tables, and figures.
I want this project to be as useful for you and your career as possible. Accordingly, you have a lot of freedom in what data you can use for this project. Choose a dataset from this list:
Your own data
Use a dataset from one of your teammates’ places of employment. Doing this would be the most practical, hands-on experience you can have.
Data from the internet
Go to Google Dataset Search or Utah’s Open Data Catalog (or anywhere else online), find an interesting dataset and ask questions about it. Here are some different high-quality datasets that students have worked with before:
- U.S. Charities and Non-profits: All of the charities and nonprofits registered with the IRSSource: IRS. This is actually split into six separate files. You can combine them all into one massive national database with
bind_rows(), or filter the data to include specific states (or a single state). It all depends on the story you’re telling.
- Nonprofit Grants 2010 to 2016: Nonprofit grants made in the US as listed in Schedule I of the IRS 990 tax form between 2010 to 2016Source: IRS
Federal, state, and local government management
- Deadly traffic accidents in the UK (2015): List of all traffic-related deaths in the UK in 2015Source: data.gov.uk
- Firefighter Fatalities in the United States: Name, rank, and cause of death for all firefighters killed since 2000Source: FEMA
- Federal Emergencies and Disasters, 1953–Present: Every federal emergency or disaster declared by the President of the United States since 1953Source: FEMA
- Global Terrorism Database (1970–2016): 170,000 terrorist attacks worldwide, 1970-2016Source: National Consortium for the Study of Terrorism and Responses to Terrorism (START), University of Maryland
- City of Austin 311 Unified Data: All 311 calls to the City of Austin since 2014Source: City of Austin
You will turn in a report where you define your research questions, explore and describe your data, build models and make inferences, discuss the implications of those findings, and make recommendations.
In your report, you need to include at least one of each of the following elements (i.e. at least one plot, but more is fine; at least one regression model, but more is fine):
- A plot of a single variable (like a histogram; see ModernDive 3)
- A plot of multiple variables (like a scatterplot; see ModernDive 3)
- 2-3 hypotheses that you will test
- A comparison of proportions or means (see ModernDive 9 and 10)
- A multiple regression model (see ModernDive 6, 7, and 11)
Here is a suggested outline for your final report:
- Executive summary: one-page summary of your questions, methods, findings, and recommendations
- Introduction and description of research questions: describe the motivation for this study, outline and define what questions you are exploring and why
- Data and methods: explain how the data was collected, provide basic summary statistics (tables and figures) of the main variables you’re interested in, and describe what statistical tools you will use to answer your questions (i.e. regression, bootstrapped comparisons of means, etc.)
- Results: answer each of your questions using statistical tools and interpret the results of the different statistical tests you use
- Limitations of the study: provide caveats for your analysis and explain how confident you are in your results
- Recommendations and conclusion: discuss the implications of these findings and make recommendations based on the results
- Appendices: if you want to include tables of summary statistics or tables showing alternative models, you can include them in an appendix instead of in the body of the report itself.
Here’s what you’ll need to do:
- Choose a dataset.
- Look at the different columns it has available and think of 2-3 questions you want to answer with it.If you want, you can combine multiple datasets too
- Identify your main outcome (y) variable (or multiple if you want). Identify which variables explain variation in the outcome (x, or explanatory variables).
- Consult with me. Talk to me in person, via e-mail, chat, carrier pigeon,Extra credit if you do this, I guess?
or whatever, and describe your research questions and analysis strategy. I will help make sure you’re on the right track. Do not skip this step. (You can/should repeat this step as often as you want.)
- Make a new RStudio project. Put this R Markdown template file in it. Put your data in a subfolder named
- Explore the data, either inside R Markdown chunks in the final report itself, or in a separate R Markdown file called
exploratory.Rmd(or whatever you want to call it)—that way you’re not running all your intermediate plots and models and tests in the final document.
- Answer your questions with the data and interpret the results.
- Put the text of your final report and interpretation and analysis, etc. in the R Markdown file for the final report. Note: You don’t have to type the actual text inside RStudio (which doesn’t have an automatic spell checker). You can use a shared Google Doc or a Word file or something similar, write the text of your report there, and then copy/paste the text into your final R Markdown file.
- Knit the report as a Word file or PDF.
- Submit the following items on Learning Suite. Only one person from each team needs to submit these:
- Knitted Word file or PDF with no code included
- Knitted Word file or PDF with code included
No late work will be accepted for this project since it’s the last project and it counts as your final.