Week 2 (March 4-10)

“What we have is a data glut.” - Vernon Vinge

Data are being generated by everything around us at all times. Every digital process and social media exchange produces it. Systems, sensors and mobile devices transmit it. Countless databases collect it. Data are arriving from multiple sources at an alarming rate and analysts and organizations are seeking ways to leverage these new sources of information. Consequently, analysts need to understand how to get data from these data sources.

Welcome to week 2! This week we will focus on:

  1. The basics of importing tabular and spreadsheet data into R.
  2. Getting a basic understanding of the data once its in R.
  3. Advanced importing capabilities such as importing data straight from relational databases (i.e. SQL), web scraping, and importing other statistical software data files (i.e. SPSS, SAS, STATA)

Consequently, this week will give you a strong foundation for the different ways to get your data into R and understanding the basics of your data set. This will prepare you for your first challenge in completing your course project - that of acquiring your data!

Below outlines the tutorials that you need to review, and the assignments you need to complete, prior to Saturday’s class. The skills and functions introduced in these tutorials will be necessary for Saturday’s in-class activities.

1. Basics of Importing & Exporting Data Tutorials

First, learn the basics of importing and exporting tabular and spreadsheet data with R.

  1. Read & work through the Importing Data tutorial
  2. Read & work through the exporting to text files and exporting to R objects tutorials

2. Get to Know Your Data

Now that you know how to get data into R, learn the process of getting a basic understanding of your data:

  1. Review the codebook: Understanding the source data is crucial to any analysis. A codebook is the documentation that explicitly tells you about the data you are working with and should be the first thing you review before starting any kind of analysis. Read Review the Codebook to get a taste of what to look for.
  2. Learn about the data: When first opening a data set it is important to get a basic understanding of the data dimensions (rows and columns), what the data looks like, how many missing values are in the data, and some basic summary statistis such as mean, median, and the range of each variable. Read and work through Learn About the Data to understand some of the first things you should do with a fresh data set.
  3. Quick visualizations: It is also good to get an initial understanding of your data through visual means. Read and work through Getting Started with Charts in R.

3. DataCamp Assignment

Now that you’ve gone through some initial tutorials, complete the Importing Data into R: Part 1 DataCamp assignment.

4. Advanced

Now that you have the basics down, work through the Importing Data into R: Part 2 DataCamp assignment. This will take a deeper dive into the wide range of data formats out there. More specifically, you’ll learn how to import data from relational databases and how to import and work with data coming from the web. Finally, you’ll get hands-on experience with importing data from statistical software packages such SAS, STATA and SPSS.

You may feel overwhelmed with this material but that is okay. This advanced material is meant to give you exposure to the many data importing options available in R. If this was not overwhelming and you are still looking for more to learn prior to Saturday, then dig into this web scraping tutorial.


Please download this material for Saturday’s:  

See you in class on Saturday!