class: clear, center, middle background-image: url(https://raw.githubusercontent.com/bradleyboehmke/Dayton-Weather-2018/master/Dayton_Weather.png) background-size: cover <br><br><br><br><br><br><br><br><br><br><br><br> .font200.bold[
Visualizing Data
] --- # Data visualization task <br><br> <img src="images/visualize-task.png" width="2560" style="display: block; margin: auto;" /> --- # ggplot <br> .pull-left[ <br> * R has several systems for making graphs * `ggplot2` is the most elegant and versatile * Implements the grammar of graphics theory behind data visualization ] .pull-right[ <img src="images/ggplot2.png" width="50%" height="50%" style="display: block; margin: auto;" /> ] --- # Basics `ggplot2` works with a layer based mentality: .pull-left[ ```r ggplot(data, aes(x, y)) + geom_xxx() + scale_xxx() + facet_xxx() + ggtitle() ``` ] -- .pull-right[ ```r ggplot(data = txhousing, aes(x = volume, y = median)) + geom_point(alpha = .1) + scale_y_continuous(name = "Median Sales Price", labels = scales::dollar) + scale_x_log10(name = "Total Sales Volume", labels = scales::comma) + ggtitle("Texas Housing Sales", subtitle = "Sales data from 2000-2010 provided by the TAMU real estate center") ``` <img src="day-1e-visualization_files/figure-html/example-ggplot-1.png" style="display: block; margin: auto;" /> ] --- # Prerequisites .pull-left[ ### Packages ```r library(ggplot2) # or library(tidyverse) library(dplyr) # for other data wrangling tasks ``` ] .pull-right[ ### Example Data ```r # built-in data set mpg ``` ### Exercise Data ```r transactions <- data.table::fread("data/transactions.csv", data.table = FALSE) %>% sample_frac(0.25) %>% as_tibble() transactions ``` ] --- # Canvas layer .bold[We can create a "canvas" for our plot with...] .pull-left[ ```r ggplot(data = mpg) ``` <img src="day-1e-visualization_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> ] .pull-right[ ```r ggplot(data = mpg, aes(x = displ, y = hwy)) ``` <img src="day-1e-visualization_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> ] .center[.content-box-gray[.bold[We use .red[aes()] to map attributes to our plot]]] --- # Plotting our data with .red[geoms] .pull-left[ * We display data with geometric shapes * ~ 30 built-in geoms (with many more offered by other pkgs) - `geom_point()` - `geom_line()` - `geom_histogram()` - `geom_density()` - `geom_freqpoly()` - `geom_boxplot()` - `geom_violin()` - `geom_bar()` - `geom_count()` - `geom_smooth()` .center[.content-box-gray[.bold[See full list with `geom_` + tab]]] ] --- # Plotting our data with .red[geoms] .pull-left[ * We display data with geometric shapes * ~ 30 built-in geoms (with many more offered by other pkgs) - `geom_point()` - `geom_line()` - .blue[.bold[`geom_histogram()`]] - .blue[.bold[`geom_density()`]] - .blue[.bold[`geom_freqpoly()`]] - `geom_boxplot()` - `geom_violin()` - `geom_bar()` - `geom_count()` - `geom_smooth()` .center[.content-box-gray[.bold[See full list with `geom_` + tab]]] ] .pull-right[ <br><br> ```r ggplot(data = mpg, aes(x = hwy)) + geom_histogram() ggplot(data = mpg, aes(x = hwy)) + geom_freqpoly() ggplot(data = mpg, aes(x = hwy)) + geom_density() ``` <img src="day-1e-visualization_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> ] --- # Plotting our data with .red[geoms] .pull-left[ * We display data with geometric shapes * ~ 30 built-in geoms (with many more offered by other pkgs) - `geom_point()` - `geom_line()` - `geom_histogram()` - `geom_density()` - `geom_freqpoly()` - .blue[.bold[`geom_boxplot()`]] - .blue[.bold[`geom_violin()`]] - .blue[.bold[`geom_bar()`]] - `geom_count()` - `geom_smooth()` .center[.content-box-gray[.bold[See full list with `geom_` + tab]]] ] .pull-right[ <br><br> ```r ggplot(data = mpg, aes(x = displ, y = hwy)) + geom_point() ggplot(data = mpg, aes(x = class, y = hwy)) + geom_boxplot() ggplot(data = mpg, aes(x = class, y = hwy)) + geom_violin() ``` <img src="day-1e-visualization_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> ] --- # Plotting our data with .red[geoms] .pull-left[ * We display data with geometric shapes * ~ 30 built-in geoms (with many more offered by other pkgs) - .blue[.bold[`geom_point()`]] - `geom_line()` - .blue[.bold[`geom_histogram()`]] - `geom_density()` - `geom_freqpoly()` - `geom_boxplot()` - `geom_violin()` - `geom_bar()` - `geom_count()` - `geom_smooth()` ] .pull-right[ <br><br> ```r *ggplot(data = mpg, aes(x = displ, y = hwy)) + geom_point() *ggplot(data = mpg, aes(x = hwy)) + geom_histogram() ``` <img src="day-1e-visualization_files/figure-html/unnamed-chunk-13-1.png" style="display: block; margin: auto;" /> .center[.content-box-gray[.bold[Some geoms only require .font130[`x`], others .font130[`x`] & .font130[`y`]]]] ] --- class: yourturn # Your Turn! .pull-left[ ### Challenge Using the __transactions__ data: 1. Create a chart that illustrates the distribution of the spend variable. 2. Create a chart that shows the counts for each store region 3. Create a scatter plot of units vs spend ] -- .pull-right[ ### Solutions ```r #1: distribution of spend variable ggplot(data = transactions, aes(x = spend)) + geom_histogram() #2: distribution of store region variable ggplot(data = transactions, aes(x = store_r)) + geom_bar() #3: scatter plot for units vs spend ggplot(data = transactions, aes(x = units, y = spend)) + geom_point() ``` <img src="day-1e-visualization_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" /> ] --- # Non-mapping aesthetics .pull-left[ We can also change other visual aesthetics in our plots: * .blue[c].orange[o].gray[l].purple[o].red[r] * .font70[s].font120[i]z.font110[e] * sh▵pe (0-25 `?pch`) * .opacity[opacity] ] -- .pull-right[ <br> ```r ggplot(data = mpg, aes(x = displ, y = hwy)) + geom_point(color = "blue", size = 2, shape = 17, alpha = .5) ``` <img src="day-1e-visualization_files/figure-html/unnamed-chunk-16-1.png" style="display: block; margin: auto;" /> .center[.content-box-gray[.bold[But why are some points darker? 🤔]]] ] --- # Non-mapping aesthetics .pull-left[ We can also change other visual aesthetics in our plots: * .blue[c].orange[o].gray[l].purple[o].red[r] * .font70[s].font120[i]z.font110[e] * sh▵pe (0-25 `?pch`) * .opacity[opacity] ] .pull-right[ <br> ```r ggplot(data = mpg, aes(x = displ, y = hwy)) + * geom_jitter(color = "blue", size = 2, shape = 17, alpha = .5, width = .5) ``` <img src="day-1e-visualization_files/figure-html/unnamed-chunk-17-1.png" style="display: block; margin: auto;" /> .center[.content-box-gray[.bold[Ahhhhhh, I see 😎]]] ] --- # Adding a 3<sup>rd</sup> dimension .bold[By moving the color argument to within .font120.gray[`aes()`], we can map a 3rd variable to our plot] .pull-left[ #### Non-mapping color aesthetic ```r ggplot(data = mpg, aes(x = displ, y = hwy)) + * geom_point(color = "blue") ``` <img src="day-1e-visualization_files/figure-html/unnamed-chunk-18-1.png" style="display: block; margin: auto;" /> ] .pull-right[ #### Mapping color aesthetic to class variable ```r *ggplot(data = mpg, aes(x = displ, y = hwy, color = class)) + geom_point() ``` <img src="day-1e-visualization_files/figure-html/mapping-ggplot-1.png" style="display: block; margin: auto;" /> ] --- class: yourturn # Your Turn! .pull-left[ ### Challenge 1. Create a scatter plot of `units` vs `spend` and color all points blue. 2. Create a scatter plot of `units` vs `spend` and color all points based on store region. ] -- .pull-right[ ### Solution ```r #1 left ggplot(transactions, aes(x = units, y = spend)) + geom_point(color = "blue") #2 right ggplot(transactions, aes(x = units, y = spend, color = store_r)) + geom_point() ``` <img src="day-1e-visualization_files/figure-html/unnamed-chunk-20-1.png" style="display: block; margin: auto;" /> ] --- # Creating small multiples with .red[facets] .bold[The .font120.gray[`facet_xxx()`] functions provide a simple way to create small multiples.] -- .scrollable90[ .pull-left[ .bold[`facet_wrap()`]: primarily used to create small multiples based on a single variable ```r ggplot(data = mpg, aes(x = displ, y = hwy)) + geom_point() + * facet_wrap(~ class, nrow = 2) ``` <img src="day-1e-visualization_files/figure-html/unnamed-chunk-21-1.png" style="display: block; margin: auto;" /> ] .pull-right[ .bold[`facet_grid()`]: primarily used to create small multiples grid based on two variables ```r ggplot(data = mpg, aes(x = displ, y = hwy)) + geom_point() + * facet_grid(drv ~ cyl) ``` <img src="day-1e-visualization_files/figure-html/unnamed-chunk-22-1.png" style="display: block; margin: auto;" /> ] ] --- class: yourturn # Your Turn! .pull-left[ ### Challenge 1. Compute total spend by store region and week. Plot the week vs total spend and use facetting to compare store regions. ] -- .pull-right[ ### Solution ```r transactions %>% group_by(store_r, week_num) %>% summarize(spend = sum(spend, na.rm = TRUE)) %>% ggplot(aes(x = week_num, y = spend)) + geom_line() + facet_wrap(~ store_r) ``` <img src="day-1e-visualization_files/figure-html/unnamed-chunk-23-1.png" style="display: block; margin: auto;" /> ] --- # Titles & Axes .bold[We can add titles with .font120.gray[`ggtitle()`] or with .font120.gray[`labs()`]] -- .pull-left[ ```r ggplot(data = mpg, aes(x = displ, y = hwy)) + geom_jitter() + * ggtitle("Displacement vs Highway MPG", * subtitle = "Data from 1999 & 2008") ``` <img src="day-1e-visualization_files/figure-html/unnamed-chunk-24-1.png" style="display: block; margin: auto;" /> ] .pull-right[ ```r ggplot(data = mpg, aes(x = displ, y = hwy)) + geom_jitter() + * labs( * title = "Displacement vs Highway MPG", * subtitle = "Data from 1999 & 2008", * caption = "http://fueleconomy.gov" * ) ``` <img src="day-1e-visualization_files/figure-html/unnamed-chunk-25-1.png" style="display: block; margin: auto;" /> ] --- # Titles & Axes .bold[We can add adjust axes with various .font120.gray[`scale_xxxx()`] functions] -- .pull-left[ ```r ggplot(data = txhousing, aes(x = volume, y = median)) + geom_point(alpha = .25) + * scale_x_log10() ``` <img src="day-1e-visualization_files/figure-html/unnamed-chunk-26-1.png" style="display: block; margin: auto;" /> ] .pull-right[ ```r ggplot(data = txhousing, aes(x = volume, y = median)) + geom_point(alpha = .25) + * scale_y_continuous(name = "Median Sales Price", labels = scales::dollar) + * scale_x_log10(name = "Total Sales Volume", labels = scales::comma) ``` <img src="day-1e-visualization_files/figure-html/unnamed-chunk-27-1.png" style="display: block; margin: auto;" /> ] --- # Putting it all together... ```r ggplot(data = txhousing, aes(x = volume, y = median)) + geom_point(alpha = .15) + scale_y_continuous(name = "Median Sales Price", labels = scales::dollar) + scale_x_log10(name = "Total Sales Volume", labels = scales::comma) + labs( title = "Texas Housing Sales", subtitle = "Sales data from 2000-2010 provided by the TAMU real estate center", caption = " http://recenter.tamu.edu/" ) ``` <img src="day-1e-visualization_files/figure-html/unnamed-chunk-28-1.png" style="display: block; margin: auto;" /> --- class: yourturn # Your Turn! .pull-left[ ### Challenge Complete this code to plot the relationship between total basket spend and units. See if you can adjust the x and y axis titles and also add a main title. ```r transactions %>% group_by(basket_num) %>% summarize( spend = sum(spend, na.rm = TRUE), units = sum(units, na.rm = TRUE) ) %>% ggplot(aes(x = units, y = spend)) + geom_point() + scale_x_log10(______) + scale_y_log10(______) + ggtitle(______) ``` ] -- .pull-right[ ```r transactions %>% group_by(basket_num) %>% summarize( spend = sum(spend, na.rm = TRUE), units = sum(units, na.rm = TRUE) ) %>% ggplot(aes(x = units, y = spend)) + geom_point(alpha = .01) + scale_x_log10("Total basket units") + scale_y_log10("Total basket spend", labels = scales::dollar) + ggtitle("Total Spend-to-Units Relationship") ``` <img src="day-1e-visualization_files/figure-html/unnamed-chunk-30-1.png" style="display: block; margin: auto;" /> ] --- # Overplotting .bold[Working in a layer mentality allows us to add multiple geoms, which can highlight certain patterns.] -- .scrollable90[ .pull-left[ ```r ggplot(data = txhousing, aes(x = volume, y = median)) + geom_point(alpha = .1) + scale_x_log10() + * geom_smooth() ggplot(data = txhousing, aes(x = volume, y = median)) + geom_point(alpha = .1) + scale_x_log10() + * geom_smooth(method = "lm") ``` <img src="day-1e-visualization_files/figure-html/unnamed-chunk-32-1.png" style="display: block; margin: auto;" /> ] .pull-right[ ```r ggplot(data = txhousing, aes(x = volume, y = median)) + geom_point(alpha = .1) + scale_x_log10() + * geom_smooth(method = "lm") + * facet_wrap(~ month) ``` <img src="day-1e-visualization_files/figure-html/unnamed-chunk-33-1.png" style="display: block; margin: auto;" /> ] ] --- # Global vs local .bold[Where we add our mapping aesthetics determines if the information flows through proceeding layers.] .pull-left[ ### Global
<i class="fas fa-globe faa-FALSE animated "></i>
```r *ggplot(data = mpg, aes(x = displ, y = hwy, color = drv)) + geom_point() + geom_smooth() ``` <img src="day-1e-visualization_files/figure-html/unnamed-chunk-34-1.png" style="display: block; margin: auto;" /> ] .pull-right[ ### Local
<i class="fas fa-map-pin faa-FALSE animated "></i>
```r ggplot(data = mpg, aes(x = displ, y = hwy)) + geom_point() + * geom_smooth(mapping = aes(color = drv)) ``` <img src="day-1e-visualization_files/figure-html/unnamed-chunk-35-1.png" style="display: block; margin: auto;" /> ] --- # Is that all? <img src="images/anything-else.gif" width="65%" style="display: block; margin: auto;" /> --- # Coolness is unlimited with ggplot extensions 😎 .pull-left[ .center.font130.bold[`ggmap`] <img src="day-1e-visualization_files/figure-html/unnamed-chunk-37-1.png" style="display: block; margin: auto;" /> ] .pull-left[ .center.font130.bold[`plotly`]
] --- # Coolness is unlimited with ggplot extensions 😎 .pull-left[ .center.font130.bold[`gganimate`] <img src="day-1e-visualization_files/figure-html/unnamed-chunk-39-1.gif" style="display: block; margin: auto;" /> ] .pull-left[ .center.font130.bold[And many more!] <br><br><br><br><br> .center[https://exts.ggplot2.tidyverse.org/] ] --- # Key things to remember .pull-left[ * .bold[`ggplot()`]: create canvas * .bold[`aes()`]: map variables to plot aesthetics * .bold[`geom_xxx()`]: display data with different geometric shapes * .bold[`facet_xxx()`]: create small multiples * .bold[`ggtitle()`, `labs()`, `scale_xxx()`]: adjust titles & axes <br> .center[.content-box-gray[Great resource: [ggplot2.tidyverse.org](ggplot2.tidyverse.org)]] ] .pull-right[ <img src="images/information-overload2.gif" width="80%" style="display: block; margin: auto;" /> ] --- # Key things to remember .pull-left[ <img src="images/cheatsheet-ggplot2.png" width="1467" style="display: block; margin: auto;" /> ] .pull-right[ <img src="images/information-overload2.gif" width="80%" style="display: block; margin: auto;" /> <br> ] .center[.content-box-gray[.bold[`Help >> Cheatsheets >> Data Visualization with ggplot2`]]] --- # Questions? <br> <img src="images/questions.png" width="450" height="450" style="display: block; margin: auto;" />