You will get the most out of this class if you (1) attend class, (2) complete all the readings, and (3) engageTake detailed notes, work through the example code and try to understand it, have vivid dreams about statistics, etc.

with the readings.Also (4) ask for help!

To encourage attendance and preparation, I use an honor-system-based self-reporting system.

At the beginning of every class, I will post a quiz on Learning Suite with the following questions:

  1. Are you here in class today?
    • Yes (1.5 points)
    • No (0 points)
  2. How much of today’s reading did you finish?
    • 100% (4 points)
    • 75–99% (3 points)
    • 50–74% (2 points)
    • 11–49% (1 points)
    • 0–10% (0 points)
  3. How well did you read?
    • I was engaged and read carefully (3 points)
    • I was fairly engaged and read fairly carefully (2 points)
    • I skimmed it (1 points)
    • I didn’t read it at all (0 points)
  4. How closely did you follow along with the example code included in the readings?
    • I worked through all the code and tried to understand what was going on (2 points)
    • I copied and pasted the code but didn’t make a good effort to understand it (1 point)
    • I didn’t do anything with the code in the readings (0 points)

Each day is worth 10.5 points. It is unlikely that you’ll score a 10.5 every day.But it would be amazing if you did!

I will shift the distribution of everyone’s final preparation score up at the end of the course.

Problem sets

To practice writing R code and answering statistical questions with data science tools, you will complete a series of 7 problem sets. You need to show that you made a good faith effort to work each question. The problem sets will be graded using a check system:

I will rescale everyone’s problem set grades at the end of the semester.

You may (and should!) work together on the problem sets, but you must turn in your own answers. You cannot work in groups of more than five.


The objectives of this class include “Be curious and confident with data,” “Feel comfortable with R,” and “Communicate the results of your analyses in accessible language.” To help you with this, you will write a code-through tutorial of some statistical, data scientific, or data-based principle or example.

One of the reasons R is so popular is because the R community is exceptionally generous and open and sharing.So are Python and other modern open source languages too.

The internet is full of tutorials and code-throughs where people explain how to do something interesting with R.

You will write one code-through or tutorial during the semester on a topic of your choice. Complete details for the assignment (along with a lot of examples to look at) will be given later. You will complete this on your own, but you can get help from your team (but you can’t all write about the same topic).

The R-Weekly e-mail newsletter includes dozens of these every week, and Mara Averick (chief tidyverse advocate at RStudio) regularly tweets out links to different posts as well. Here are some others examples to give you a jist of what you’ll be doing: Yours won’t be nearly as complicated as these, by the way. Nor do they need to be. You’ll illustrate and explain something simple.

This assignment will also be graded using the same check system from the problem sets.


There will be three exams covering (1) data wrangling and tidying, (2) modeling, and (3) statistical inference.

You will take these exams on paper in class. The exams are designed to take an hour at most. The exams are closed book and closed internet, but you can use a one-page cheatsheet (doublesided) with whatever information you feel would be helpful.

I will post examples of exam questions prior to the actual exams.

Final project

At the end of the course, you will demonstrate your knowledge of modern data science tools and statistical analysis by completing a final project.

Project details will be posted later once I settle on the best form for it. This much is certain so far:

There is no final exam. This project is your final exam.