Week 4 (April 4)

This week builds onto our data wrangling skills by focusing on transforming and joining data. Common transformation procedures include filtering observations by their values, reordering the rows, selecting variables, creating new variables with functions of existing variables, and collapsing values down to single summary summary statistics (i.e. mean, max, variance).

Furthermore, it’s rare that a data analysis involves only a single table of data. Typically you have many tables of data, and you must combine them to answer the questions that you’re interested in. Collectively, multiple tables of data are called relational data because it is the relations, not just the individual datasets, that are important.

Data scientists often work with different data types, and sometimes working with different data types can be difficult. Thankfully, the Tidyverse has powerful (and easy to use!) packages that make data wrangling with difficult data types much easier.

This module covers these basic capabilities by teaching you how to use the dplyr package and other Tidyverse to perform common data transformation and joining tasks.


Assignments

  • Complete Homework #3 located in this week’s folder.
  • One person from each group will submit via Slack the group’s .R script and Word document.
  • This homework assignment is due by 9AM, April 11, 2020.

Readings


Class

Please download this material for Monday’s class:  

See you in class on Monday!


Mid-term Project Due!

Your mid-term project is due by the end of class this week. Be sure to refer to the grading rubric so you understand what is expected. Create an HTML R markdown document titled “Project Proposal” and be sure to include your name in the YAML.