Week 2 (October 16-22)

“What we have is a data glut.” - Vernon Vinge

Data are being generated by everything around us at all times. Every digital process and social media exchange produces it. Systems, sensors and mobile devices transmit it. Countless databases collect it. Data are arriving from multiple sources at an alarming rate and analysts and organizations are seeking ways to leverage these new sources of information. Consequently, analysts need to understand how to get data from these data sources. Furthermore, since analysis is often a collaborative effort analysts also need to know how to share their data.

Welcome to week 2! This week we will cover the process of importing, exporting, and scraping data. First, you will learn the basics of importing tabular and spreadsheet data. You will also cover the equally important process of getting data out of R. Then, since modern day data wrangling often includes scraping data from the flood of web-based data becoming available to organizations and analysts, you will learn the fundamentals of web-scraping with R. This includes importing spreadsheet data files stored online, scraping HTML text and data tables, and leveraging APIs.

Consequently, this week will give you a strong foundation for the different ways to get your data into and out of R. This will prepare you for your first challenge in completing your course project - that of acquiring your data!

Tutorials & Resources

Please work through the following tutorials prior to Saturday’s class. The skills and functions introduced in these tutorials will be necessary to complete your assignment, which is due at the beginning of Saturday’s class, and will also be used in Saturday’s in-class small group work.

Importing and exporting spreadsheet data

Scraping text & tables

Assignment

  1. Create a .R script titled “week-2.R” and in this script perform the following exercises:
  2. Create an HTML R Markdown document titled “week-2.Rmd” that contains the following:
    • Synopsis: Include a short paragraph that summarizes what the point of this R Markdown file is
    • Packages Required: Include a code chunk in this section that loads all the packages required for this homework and a short comment that says what purpose each package provides
    • Homework Problems: Perform the five data importing exercises listed in #1. For each problem you should import the data and save as a data frame, use head() to display the first few rows of the data frame, use str() to display the structure of each data frame, and be sure that each code chunk fully displays your code.
    • Example: You will find an example here of the basic output I am looking for.

Submission: Knit this R Markdown document to an HTML file, publish it on RPubs, and send me the URL for your published report prior to class (either by email or through Slack messenger). Also, knit to a PDF document and bring this document to class on Saturday for submission.

Class

In today’s lecture we are going to work through several data importing exercises. Please download this material: