Key Things to Remember

Not all charts need to be pretty!



What to Remember from this Section

Exploratory data analysis plotting should be quick and simple and base R excels at this

Visualization Function
Strip chart stripchart()
Histogram hist()
Density plot plot(density())
Box plot boxplot()
Bar chart barplot()
Dot plot dotchart()
Scatter plot plot(), pairs()
Line chart plot()

What to Remember from this Section

In R, graphs are typically created interactively:

attach(mtcars)
plot(wt, mpg) 
abline(lm(mpg~wt))
title("Regression of MPG on Weight")

What to Remember from this Section


You can specify fonts, colors, line styles, axes, reference lines, etc. by specifying graphical parameters


This allows a wide degree of customization; however…


I have found that ggplot is an easier syntax for customization needs

Data Used…

Import the following data sets from the data folder

facebook.tsv
reddit.csv
race-comparison.csv
Supermarket Transactions.xlsx

Univariate Visualizations

Continuous Variables: Strip Chart

Useful when sample sizes are small but not when sample size are large

stripchart(mtcars$mpg, pch = 16)
stripchart(facebook$tenure, pch = 16)

Continuous Variables: Histogram

hist(facebook$tenure)

hist(facebook$tenure, breaks = 100, col = "grey", main = "Facebook User Tenure", xlab = "Tenure (Days)")

Continuous Variables: Histogram

A perfect example of why customization with base R is not always enjoyable; in ggplot this is far simpler

x <- na.omit(facebook$tenure)

# histogram
h<-hist(x, breaks = 100, col = "grey", main = "Facebook User Tenure", xlab = "Tenure (Days)") 

# add a normal curve
xfit <- seq(min(x), max(x), length = 40) 
yfit <- dnorm(xfit, mean = mean(x), sd = sd(x)) 
yfit <- yfit * diff(h$mids[1:2]) * length(x) 
lines(xfit, yfit, col = "red", lwd = 2)

Continuous Variables: Density Plot

Enclose density(x) within plot()

# basic density plot
d <- density(facebook$tenure, na.rm = TRUE)

plot(d, main = "Kernel Density of Tenure")

# fill denisty plot by adding polygon()
polygon(d, col = "red", border = "blue")

Continuous Variables: Box Plot

The previous methods provide good insights into the shape of the distribution but don't necessarily tell us about specific summary statistics such as:

summary(facebook$tenure)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     0.0   226.0   412.0   537.9   675.0  3139.0       2

However, boxplots provide a concise way to illustrate these standard statistics, the shape, and outliers of data:

Generic Box Plot

Continuous Variables: Box Plot

boxplot(facebook$tenure, horizontal = TRUE)
boxplot(facebook$tenure, horizontal = TRUE, notch = TRUE, col = "grey40")

Your Turn


Using the facebook.tsv data…


Visually assess the continuous variables. What do you find?

Categorical Variables: Bar Chart

reddit <- read.csv("data/reddit.csv")

table(reddit$dog.cat)
## 
##    I like cats.    I like dogs. I like turtles. 
##           11156           17151            4442

barplot(table(reddit$dog.cat))

Categorical Variables: Bar Chart

pets <- table(reddit$dog.cat)

barplot(pets, main = "Reddit User Animal Preferences", col = "cyan")

par(las = 1)
barplot(pets, main = "Reddit User Animal Preferences", horiz = TRUE, names.arg = c("Cats", "Dogs", "Turtles"))