This week builds onto our data wrangling skills by focusing on transforming and joining data. Common transformation procedures include filtering observations by their values, reordering the rows, selecting variables, creating new variables with functions of existing variables, and collapsing values down to single summary summary statistics (i.e. mean, max, variance).
Furthermore, it’s rare that a data analysis involves only a single table of data. Typically you have many tables of data, and you must combine them to answer the questions that you’re interested in. Collectively, multiple tables of data are called relational data because it is the relations, not just the individual datasets, that are important.
Data scientists often work with different data types, and sometimes working with different data types can be difficult. Thankfully, the Tidyverse has powerful (and easy to use!) packages that make data wrangling with difficult data types much easier.
This module covers these basic capabilities by teaching you how to use the dplyr
package and other Tidyverse to perform common data transformation and joining tasks.
Please download this material for Monday’s class:
See you in class on Monday!
Your mid-term project is due by the end of class this week. Be sure to refer to the grading rubric so you understand what is expected. Create an HTML R markdown document titled “Project Proposal” and be sure to include your name in the YAML.