Final project

Due by 11:59 PM on Wednesday, December 19, 2018


For your final project, you will take a dataset from the wild, explore it, wrangle and clean it, tell a story with it, and make inferences about it using regression analysis and other statistical tools.

You will complete this project as a team. Your team will produce one R Markdown report in the end that each member will contribute to jointly.

Here’s a fake version of what a final project might look like:This text was all generated by a bot that took the text of famous books and tried to generate paragraphs that could hypothetically fit in those books.

Your project shouldn’t be a 1-to-1 transformation of this example (i.e. you don’t need to have exactly five paragraphs in the introduction; you don’t need to include a figure after the second paragraph in the data and methods section; etc.). This just shows how you’ll mix longer text with code, tables, and figures.


I want this project to be as useful for you and your career as possible. Accordingly, you have a lot of freedom in what data you can use for this project. Choose a dataset from this list:

Your own data

Use a dataset from one of your teammates’ places of employment. Doing this would be the most practical, hands-on experience you can have.

Data from the internet

Go to Google Dataset Search or Utah’s Open Data Catalog (or anywhere else online), find an interesting dataset and ask questions about it. Here are some different high-quality datasets that students have worked with before:

Nonprofit management

Federal, state, and local government management

Final report

You will turn in a report where you define your research questions, explore and describe your data, build models and make inferences, discuss the implications of those findings, and make recommendations.


In your report, you need to include at least one of each of the following elements (i.e. at least one plot, but more is fine; at least one regression model, but more is fine):


Here is a suggested outline for your final report:

  1. Executive summary: one-page summary of your questions, methods, findings, and recommendations
  2. Introduction and description of research questions: describe the motivation for this study, outline and define what questions you are exploring and why
  3. Data and methods: explain how the data was collected, provide basic summary statistics (tables and figures) of the main variables you’re interested in, and describe what statistical tools you will use to answer your questions (i.e. regression, bootstrapped comparisons of means, etc.)
  4. Results: answer each of your questions using statistical tools and interpret the results of the different statistical tests you use
  5. Limitations of the study: provide caveats for your analysis and explain how confident you are in your results
  6. Recommendations and conclusion: discuss the implications of these findings and make recommendations based on the results
  7. Appendices: if you want to include tables of summary statistics or tables showing alternative models, you can include them in an appendix instead of in the body of the report itself.


Here’s what you’ll need to do:

  1. Choose a dataset.
  2. Look at the different columns it has available and think of 2-3 questions you want to answer with it.If you want, you can combine multiple datasets too

  3. Identify your main outcome (y) variable (or multiple if you want). Identify which variables explain variation in the outcome (x, or explanatory variables).
  4. Consult with me. Talk to me in person, via e-mail, chat, carrier pigeon,Extra credit if you do this, I guess?

    or whatever, and describe your research questions and analysis strategy. I will help make sure you’re on the right track. Do not skip this step. (You can/should repeat this step as often as you want.)
  5. Make a new RStudio project. Put this R Markdown template file in it. Put your data in a subfolder named data.
  6. Explore the data, either inside R Markdown chunks in the final report itself, or in a separate R Markdown file called exploratory.Rmd (or whatever you want to call it)—that way you’re not running all your intermediate plots and models and tests in the final document.
  7. Answer your questions with the data and interpret the results.
  8. Put the text of your final report and interpretation and analysis, etc. in the R Markdown file for the final report. Note: You don’t have to type the actual text inside RStudio (which doesn’t have an automatic spell checker). You can use a shared Google Doc or a Word file or something similar, write the text of your report there, and then copy/paste the text into your final R Markdown file.
  9. Knit the report as a Word file or PDF.
  10. Submit the following items on Learning Suite. Only one person from each team needs to submit these:
    • Knitted Word file or PDF with no code included
    • Knitted Word file or PDF with code included

No late work will be accepted for this project since it’s the last project and it counts as your final.

Good luck!