Week 3 (October 22-28)

Last week we discussed general guidelines for first interacting with a new data set. This week we want to build on those activities by learning how to clean and tidy our data. But first, it is important to understand the different structural forms for holding data within R.

Specifically, this week you are going to learn:

  1. The most common data structures in R.
  2. How to manage non-data frame data structures.
  3. How to make your data “tidy”.

Consequently, this week will give you a strong foundation for managing and cleaning your data. This will prepare you for your second challenge in completing your course project - that of cleaning, tidying, and preparing your data for exploratory data analysis!

Below outlines the tutorials that you need to review, and the assignments you need to complete, prior to Saturday’s class. The skills and functions introduced in these tutorials will be necessary for Saturday’s in-class activities.


1. Most Common Data Structures

R’s basic data structures can be organized by their dimensionality (1d, 2d, or nd) and whether they’re homogeneous (all contents must be of the same type) or heterogeneous (the contents can be of different types). This gives rise to five data types most often used in data analysis, and the five data types that nearly all R objects and outputs are built upon (there are five but we’ll focus on four). The two most common data structures are vectors and data frames. To understand these data structures read and work through:

Now that you’ve gone through some initial tutorials, complete the following DataCamp assignments:

  • Vectors & Factors (factors are just categorical vectors and this assignment helps you learn a little more about these particular data types).
  • Data frames

2. Less Common Data Structures

You will work with vectors and data frames on a daily basis. However, two additional data structures you might find your self dealing with are matrices and lists. Read and work through:

Now that you’ve gone through some initial tutorials, complete the following DataCamp assignments:


3. Cleaning and Tidying Your Data

Now that you have imported your data and you understand the basics of managing your data structure, the next thing you probably want to do is jump into exploratory data analysis. However, prior to that it is important to make sure your data frame is properly prepared for analysis. This may require you to do some basic cleaning and ensure your data is in a “tidy” format. Read and work through Chapter 12: Tidy Data in R for Data Science to learn how to organize your data the “tidy” way.

Now that you’ve gone through some initial tutorials, complete the Cleaning data in R DataCamp assignment.


Class

Please download this material for Saturday’s:  

See you in class on Saturday!