class: title-slide <a href="https://github.com/uc-r/Intro-R/"><img style="position: absolute; top: 0; right: 0; border: 0;" src="https://s3.amazonaws.com/github/ribbons/forkme_right_darkblue_121621.png" alt="Fork me on GitHub"></a> <br><br><br><br> # Day
2
: An .red[Incomplete] Introduction to
<i class="fab fa-r-project faa-FALSE animated faa-slow " style=" color:steelblue;"></i>
## .font70[.italic['If you fall, stand tall and come back for more'] - Tupac Shakur] ### Brad Boehmke ### Oct 17-18, 2019 --- # Today's schedule
<i class="fas fa-calendar-alt faa-FALSE animated " style=" color:red;"></i>
<br> | Topic | Time | |:------|:------:| | Review | 9:00-10:00 | | Data types | 10:00-10:45 | | Break | 10:45 - 11:00 | | Tidy data | 11:00-12:00 | | Lunch | 12:00 - 1:00 | | Joining data | 1:00-1:45 | | Data structures | 1:45-2:30 | | Break | 2:30-2:45 | | Case study | 2:45-4:00 | | Q&A | 4:00-4:30 | --- class: clear, center, middle background-image: url(images/review-day2.gif) background-size: cover --- # Importing data <br><br> * ______: Read in a "normal" sized delimited file (i.e. .csv, .tsv, .txt) * ______: Read the names of Excel worksheet files * ______: Read in the data from a specified Excel worksheet file * ______: Read in a large data file --- # Importing data <br><br> * .blue.bold[`readr::read_csv`]: Read in a "normal" sized delimited file (i.e. .csv, .tsv, .txt) * .blue.bold[`readxl::excel_sheets`]: Read the names of Excel worksheet files * .blue.bold[`readxl::read_excel`]: Read in the data from a specified Excel worksheet file * .blue.bold[`data.table::fread`]: Read in a large data file --- class: yourturn # Your Turn! <br> 1. Import the households.csv file using .grey[`readr::read_csv()`] 2. Import the transactions.csv file using .grey[`data.table::fread()`] -- <br><br> ```r library(tidyverse) # 1 households <- readr::read_csv("data/households.csv") # 2 transactions <- data.table::fread("data/transactions.csv", data.table = FALSE) %>% as_tibble() ``` --- # Transforming data <br> * ______: pick observations based on certain conditions * ______: pick variables of interest * ______: compute statistical summaries * ______: perform operations at different levels of your data * ______: reorder data * ______: create new variables --- # Transforming data <br> * .blue.bold[`filter`]: pick observations based on certain conditions * .blue.bold[`select`]: pick variables of interest * .blue.bold[`summarize`]: compute statistical summaries * .blue.bold[`group_by`]: perform operations at different levels of your data * .blue.bold[`arrange`]: reorder data * .blue.bold[`mutate`]: create new variables --- class: yourturn # Your Turn! .pull-left[ ### Challenge #1 How much total spend has household 4124 had throughout the available data? __Hint:__ - `hshd_num` `\(\rightarrow\)` household variable - `spend` `\(\rightarrow\)` spend variable ```r transactions %>% filter(_____) %>% summarize(_____) ``` ] -- .pull-right[ ### Solution ```r transactions %>% filter(hshd_num == 4124) %>% summarize(spend = sum(spend, na.rm = TRUE)) ## # A tibble: 1 x 1 ## spend ## <dbl> ## 1 574. ``` ] --- class: yourturn # Your Turn! .pull-left[ ### Challenge #2 Which week did household 4124 spend the most? __Hint:__ - `week_num` `\(\rightarrow\)` week variable ```r transactions %>% filter(_____) %>% group_by(_____) %>% summarize(spend = _____) %>% * top_n(_____) ``` <br> .center[.content-box-gray[.bold[try use .grey[`top_n()`] rather than .grey[`arrange()`]]]] ] -- .pull-right[ ### Solution ```r transactions %>% filter(hshd_num == 4124) %>% group_by(week_num) %>% summarize(spend = sum(spend, na.rm = TRUE)) %>% top_n(1, wt = spend) ## # A tibble: 1 x 2 ## week_num spend ## <int> <dbl> ## 1 100 35.4 ``` ] --- class: yourturn # Your Turn! .pull-left[ ### Challenge #3 Compute the average spend per basket (`basket_num`) for each region (`store_r`). __Hint:__ - `basket_num` `\(\rightarrow\)` basket variable - `store_r` `\(\rightarrow\)` region variable ```r transactions %>% group_by(_____, _____) %>% summarize(spend = _____) %>% summarize(avg_spend = _____) %>% arrange(_____) ``` ] -- .pull-right[ ### Solution ```r transactions %>% group_by(store_r, basket_num) %>% summarize(spend = sum(spend, na.rm = TRUE)) %>% summarize(avg_spend = mean(spend)) %>% arrange((desc(avg_spend))) ## # A tibble: 4 x 2 ## store_r avg_spend ## <chr> <dbl> ## 1 EAST 18.8 ## 2 CENTRAL 17.1 ## 3 WEST 16.4 ## 4 SOUTH 15.6 ``` ] --- # Visualizing data <br> * ______: create canvas * ______: map variables to plot aesthetics * ______: display data with different geometric shapes * ______: create small multiples * ______: adjust titles & axes --- # Visualizing data <br> * .blue.bold[`ggplot()`]: create canvas * .blue.bold[`aes()`]: map variables to plot aesthetics * .blue.bold[`geom_xxx()`]: display data with different geometric shapes * .blue.bold[`facet_xxx()`]: create small multiples * .blue.bold[`ggtitle()`, `labs()`, `scale_xxx()`]: adjust titles & axes --- class: yourturn # Your Turn! .pull-left[ ### Challenge #1 Plot the total spend by weeks ```r transactions %>% group_by(week_num) %>% summarize(spend = sum(spend, na.rm = TRUE)) %>% ggplot(aes(x = _____, y = _____)) + geom_______ ``` ] -- .pull-right[ ```r transactions %>% group_by(week_num) %>% summarize(spend = sum(spend, na.rm = TRUE)) %>% ggplot(aes(x = week_num, y = spend)) + geom_line() ``` <img src="day-2a-intro_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> ] --- class: yourturn # Your Turn! .pull-left[ ### Challenge #2 Plot the total spend versus total units for every household. Facet by store region to see if the pattern differs by region. ```r transactions %>% group_by(store_r, hshd_num) %>% summarize( spend = sum(spend, na.rm = TRUE), units = sum(units, na.rm = TRUE) ) %>% ggplot(aes(_____, _____)) + geom______() + facet____(_____) ``` ] -- .pull-right[ ```r transactions %>% group_by(store_r, hshd_num) %>% summarize( spend = sum(spend, na.rm = TRUE), units = sum(units, na.rm = TRUE) ) %>% ggplot(aes(x = units, y = spend)) + geom_point() + facet_wrap(~ store_r) ``` <img src="day-2a-intro_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> ] --- class: yourturn # Your Turn! .pull-left[ ### Challenge #3 Plot the total spend versus total units for every household. Facet by store region to see if the pattern differs by region. ```r transactions %>% group_by(store_r, hshd_num) %>% summarize( spend = sum(spend, na.rm = TRUE), units = sum(units, na.rm = TRUE) ) %>% ggplot(aes(_____, _____)) + geom______() + facet____(_____) ``` .center[.bold.red[Can you add a title and adjust the axes of this plot?]] ] .pull-right[ ```r transactions %>% group_by(store_r, hshd_num) %>% summarize( spend = sum(spend, na.rm = TRUE), units = sum(units, na.rm = TRUE) ) %>% ggplot(aes(x = units, y = spend)) + geom_point() + facet_wrap(~ store_r) ``` <img src="day-2a-intro_files/figure-html/unnamed-chunk-13-1.png" style="display: block; margin: auto;" /> ] --- class: yourturn # Your Turn! .scrollable90[ .pull-left[ ### Challenge #3 Plot the total spend versus total units for every household. Facet by store region to see if the pattern differs by region. ```r transactions %>% group_by(store_r, hshd_num) %>% summarize( spend = sum(spend, na.rm = TRUE), units = sum(units, na.rm = TRUE) ) %>% ggplot(aes(_____, _____)) + geom______() + facet____(_____) ``` .center[.bold.red[Can you add a title and adjust the axes of this plot?]] ] .pull-right[ ```r transactions %>% group_by(store_r, hshd_num) %>% summarize( spend = sum(spend, na.rm = TRUE), units = sum(units, na.rm = TRUE) ) %>% ggplot(aes(x = units, y = spend)) + geom_point() + facet_wrap(~ store_r) + ggtitle("Total household spend versus units.") + scale_x_continuous("Total units", labels = scales::comma) + scale_y_continuous("Total spend", labels = scales::dollar) ``` <img src="day-2a-intro_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" /> ] ] --- # Questions before
moving on?
<br> <img src="images/questions.png" width="450" height="450" style="display: block; margin: auto;" />