So last week we learned to import data. However, data by themselves are pretty useless so we need to start doing some basic care and feeding of the data we’ve imported. This week we investigate good practices for when we get a new data set. Spending a little time up front to understand your data will help speed up your analysis later on. Thus, this week we are going to focus on three objectives that we should have when we first open up a new data set:
Please work through the following tutorials prior to Saturday’s class. The skills and functions introduced in these tutorials will be necessary to complete your assignment, which is due at the beginning of Saturday’s class, and will also be used in Saturday’s in-class small group work.
1. Review the codebook: Understanding the source data is crucial to any analysis. A codebook is the documentation that explicitly tells you about the data you are working with and should be the first thing you review before starting any kind of analysis. Read Review the Codebook to get a taste of what to look for.
2. Learn about the data: When first opening a data set it is important to get a basic understanding of the data dimensions (rows and columns), what the data looks like, how many missing values are in the data, and some basic summary statistis such as mean, median, and the range of each variable. Read and work through Learn About the Data to understand some of the first things you should do with a fresh data set.
3. Visualize the data: Although visualizing your data is not always considered a data wrangling activity, it is essential in every step of data analysis. In this class we are going to focus on
ggplot2 for visualizing our data, as it is the premier data visualizing package in R. Read and work through Chapter 3: Data Visualization of the R for Data Science book.
Create an HTML R Markdown document titled “week-3.Rmd”. I want you to scrape the Cincinnati weather data located here and provide the following sections in the R Markdown document:
Knit this R Markdown document to an HTML file, publish it on RPubs, and send me the URL for your published report prior to class (either by email or through Slack messenger).