Last week we discussed general guidelines for first interacting with a new data set. This week we want to build on those activities by performing early exploratory data analysis to answer questions about your data via transforming and visualizing your data. We have two objectives for this week:
dplyrpackage to perform many common data transformation and manipulation tasks.
Combining the activities of data transformation and visualization in a methodical way is what defines exploratory data analysis. Only by systematically applying these techniques will you be able to answer and refine questions about your data.
Please work through the following tutorials prior to Saturday’s class. The skills and functions introduced in these tutorials will be necessary to complete your assignment, which is due at the beginning of Saturday’s class, and will also be used in Saturday’s in-class small group work.
1. Transform your data: Although many fundamental data manipulation functions exist in R, they have been a bit convoluted to date and have lacked consistent coding and the ability to easily flow together.
dplyr is one such package which was built for the sole purpose of simplifying the process of manipulating, sorting, summarizing, and joining data frames. Read and work through:
dplyrfunctions for a single dataset tutorials.
2. Advancing your visualizations: Visualizing and transforming your data in a systematic way is a task that statisticians call exploratory data analysis. Combining the functionality of
dplyr with the visualization capabilites of
dplyr can help to answer a lot of initial questions about your data. Read and work through Chapter 7: Exploratory Data Analysis of the R for Data Science book to learn to use
ggplot2 interactively to ask questions, answer them with data, and then ask new questions.
Create an HTML R Markdown document named “week-4.Rmd”. Title this HTML document “Gapminder Exploratory Data Analysis.” If you have not already done so, install the gapminder package (
install.packages("gapminder"). Using the
gapminder_unfiltered data answer the questions below. Be sure that your report includes the following sections:
Knit this R Markdown document to an HTML file, publish it on RPubs, and send me the URL for your published report prior to class (either by email or through Slack messenger).
In today’s lecture we are going to work through several exploratory data analysis exercises. Feel free to download these .R scripts to reference the functionality and capabilities of dplyr and ggplot2: