class: clear, center, middle background-image: url(https://static1.squarespace.com/static/5840371f3e00befa6e8e6739/58e4c75e414fb56455f504a0/58e4c75e414fb56455f5049f/1486220365319/Workflow-logo.png) --- # Workflow considerations <br><br> .font200[ 1. Project-oriented Workflow 2. Efficient & reproducible deliverables 3. Version control ] --- class: clear, center, middle background-image: url(images/working-directories.jpg) background-size: cover <br> .font300.bold.white[Project-oriented Workflow] .white[___“Organization is what you do before you do something, so that when you do it, it is not all mixed up.”___ - A.A. Milne] --- # When first learning R... .pull-left[ - Open R - start a new .R script or open an existing one - set the working directory or, add `setwd("C:\Users\Brad\path\only\I\have")` at the beginning of the script. - many times you will finish one project and then start working on another analysis in the same R session so you add `rm(list = ls())` at the top of your scripts. ] .pull-right[ <br><br> <img src="images/jenny-bryan-tweet.png" width="1517" style="display: block; margin: auto;" /> .font70.center[[_Courtesy Hadley Wickham tweet (Dec 2017)_](https://twitter.com/hadleywickham/status/940021008764846080)] ] --- # What's wrong with these? .pull-left[ ### `setwd()` - Is only specific to your current computer set up - Nearly 0% chance working for anyone else or - For you in 2 years when you have a new computer - Increases chance of project leakage when manually switching between projects in the same R session ] .pull-right[ ### `rm(list = ls())` - Only removes user deleted objects created - Does not create a fresh R session - Does not reset options you may have changed (i.e. `stringsAsFactors = FALSE`) - Does not unload packages you previously loaded ] .center[.content-box-gray[.bold[Reduces organization & reproducibility; there is a better way!]]] --- # Differentiating Workflow vs Product * __Workflow__: things you do because of personal taste and habits - Code/text editor - Directory paths - Operating system - Code you ran prior to the current project * __Product__: the logic and output that is the essence of your project - The R code someone needs to run on your raw data to get your results, including the explicit `library()` calls to load necessary packages. - Raw data leveraged in the project <br> Ideally, you ___don’t hardwire anything about your workflow into your product___. Workflow-related operations should be executed by you interactively, using whatever means is appropriate to your setup, but not built into the scripts themselves. --- # Best practice <br> - Organize each data analysis into its own project: a folder on your computer that holds all the files relevant to that particular piece of work - Any resident R script is written assuming that it will be run from a fresh R environment. .red[Do not save `.RData` when you quit R and don’t load `.RData` when you fire up R.] - The working directory is set to the project directory and all dependent directory paths are subfolders of the main project directory <br><br><br> .center.bold[This convention guarantees that the project can be moved around on your computer or onto other computers and will still “just work”] --- # R Projects (`.RProj`) .pull-left[ .font120.bold.center[Creating] `File >> New Project`: <img src="images/create-r-project.png" width="2120" style="display: block; margin: auto;" /> ] .pull-right[ .font120.bold.center[Benefits] RStudio projects make it straightforward to divide your work into individual projects, each with their own: - working directory - source documents - workspace - history ] --- class: yourturn # Your Turn! <br><br> .font130.center[ Go ahead and create an R Project for this workshop. Create this project in an ___Existing Directory___ and save it in the folder you downloaded for this workshop. ] --- # So what's different? .pull-left[ <br> <img src="images/r-project-window.png" width="1635" style="display: block; margin: auto;" /> ] .pull-right[ When a project is opened: - A new R session is started - The current working directory is set to the project directory - The .Rprofile file in the project's main directory is sourced - <s>The .RData file in the project's main directory is loaded</s> - The .Rhistory file in the project's main director is loaded - Previously edited source docs are restored to the editor tab - Other RStudio settings (active tabs, splitter positions, etc) are restored ] <br> .center.bold[.content-box-gray[We can also work with multiple projects at one time]] --- # Referring to subdirectory locations - When you open an R Project, your working directory is the folder where .RProj resides - What if you want to refer to a subdirectory within your R script? .pull-left[ Example 1: script is in the main directory but data is in a subdirectory <img src="images/r-project-example1.png" width="30%" style="display: block; margin: auto;" /> ] .pull-right[ Example 2: script and data are in two separate subdirectories <img src="images/r-project-example2.png" width="30%" style="display: block; margin: auto;" /> ] --- # Referring to subdirectories Example 1: script is in the main directory but data is in a subdirectory .pull-left[ <img src="images/r-project-example1.png" width="40%" style="display: block; margin: auto;" /> ] .pull-right[ ```r readr::read_csv("data/Month-01.csv") ## # A tibble: 54,535 x 10 ## Account_ID Transaction_Timest… Factor_A Factor_B Factor_C Factor_D ## <int> <dttm> <int> <int> <chr> <int> ## 1 5 2009-01-08 00:16:41 2 6 VI 20 ## 2 16 2009-01-20 22:40:08 2 6 VI 20 ## 3 28 2009-01-19 13:24:55 2 6 VI 21 ## 4 40 2009-01-05 16:10:58 2 6 VI 20 ## 5 62 2009-01-21 19:13:13 2 6 VI 20 ## 6 64 2009-01-01 18:53:02 7 6 MC 20 ## 7 69 2009-01-08 00:15:19 2 6 VI 20 ## 8 69 2009-01-19 09:33:22 2 6 VI 20 ## 9 70 2009-01-05 12:07:47 2 6 VI 20 ## 10 79 2009-01-07 19:41:18 7 6 MC 20 ## # … with 54,525 more rows, and 4 more variables: Factor_E <chr>, ## # Response <int>, Transaction_Status <chr>, Month <chr> ``` ] --- # Referring to subdirectories Example 2: script and data are in two separate subdirectories .pull-left[ <img src="images/r-project-example2.png" width="40%" style="display: block; margin: auto;" /> ] .pull-right[ ```r readr::read_csv("../data/Month-01.csv") ## # A tibble: 54,535 x 10 ## Account_ID Transaction_Timest… Factor_A Factor_B Factor_C Factor_D ## <int> <dttm> <int> <int> <chr> <int> ## 1 5 2009-01-08 00:16:41 2 6 VI 20 ## 2 16 2009-01-20 22:40:08 2 6 VI 20 ## 3 28 2009-01-19 13:24:55 2 6 VI 21 ## 4 40 2009-01-05 16:10:58 2 6 VI 20 ## 5 62 2009-01-21 19:13:13 2 6 VI 20 ## 6 64 2009-01-01 18:53:02 7 6 MC 20 ## 7 69 2009-01-08 00:15:19 2 6 VI 20 ## 8 69 2009-01-19 09:33:22 2 6 VI 20 ## 9 70 2009-01-05 12:07:47 2 6 VI 20 ## 10 79 2009-01-07 19:41:18 7 6 MC 20 ## # … with 54,525 more rows, and 4 more variables: Factor_E <chr>, ## # Response <int>, Transaction_Status <chr>, Month <chr> ``` ] --- # Referring to subdirectories .pull-left[ .center.font120.bold[Explicit] ```r # refer to first level subdirectory "/data/file.csv" # refer to second level subdirectory "/folder/data/file.csv" # refer to another subdirectory "../data/file.csv" # refer to a subdirectory up two levels then down one "../../data/file.csv" ``` Two problems: 1. Windows (`"\\"`) vs Mac & Linux (`"/"`) 2. Can get confusing ] -- .pull-right[ .center.bold[ .font150[`here`] .font120[package]] ```r # automatically provides path to .RProj file here::here() ## [1] "/Users/b294776/Desktop/Training/UC/Intermediate-R" # get path to a "data" subdirectory here::here("data") ## [1] "/Users/b294776/Desktop/Training/UC/Intermediate-R/data" # get path to specific csv file here::here("data", "Month-01.csv") ## [1] "/Users/b294776/Desktop/Training/UC/Intermediate-R/data/Month-01.csv" ``` * Agnostic to OS * But all references must be at the same level or a subdirectory of the `.RProj` file ] --- # Some extra thoughts .pull-left[ .font120.bold.center[When not (required) to use] - One time analysis, for your own use, and you're 100% sure you will never need to reproduce later on - Server setting where workflow is pre-set - Productionalized scripts ] .pull-right[ .font120.bold.center[When to use] - All other times! ] --- class: yourturn # Your Turn! .pull-left[ ### Challenge 1. Use `here::here()` to get the path to the "Month-09.csv" file in the "data" subfolder of your R project. 2. Use #1 within `readr::read_csv()` to import that .csv file into your current R session. ] -- .pull-right[ ### Solution ```r # 1 here::here("data", "Month-09.csv") ## [1] "/Users/b294776/Desktop/Training/UC/Intermediate-R/data/Month-09.csv" # 2 readr::read_csv(here::here("data", "Month-09.csv")) ## # A tibble: 71,855 x 10 ## Account_ID Transaction_Timest… Factor_A Factor_B Factor_C Factor_D Factor_E Response ## <int> <dttm> <int> <int> <chr> <int> <chr> <int> ## 1 38 2009-09-12 09:51:42 10 6 DI 20 NULL 1020 ## 2 44 2009-09-28 17:04:10 8 23 AX 21 NULL 1020 ## 3 46 2009-09-13 14:27:57 2 6 VI 20 C 1020 ## 4 47 2009-09-11 10:01:57 10 6 DI 20 NULL 1020 ## 5 48 2009-09-11 18:22:22 2 6 VI 20 B 1020 ## 6 48 2009-09-23 21:40:03 2 6 VI 20 B 1020 ## 7 54 2009-09-18 04:35:50 2 6 VI 20 A 1020 ## 8 59 2009-09-20 14:35:43 10 6 DI 20 NULL 1020 ## 9 72 2009-09-21 18:53:42 7 15 MC 20 MCW 1020 ## 10 72 2009-09-28 16:53:15 7 15 MC 20 MCW 1020 ## # … with 71,845 more rows, and 2 more variables: Transaction_Status <chr>, Month <chr> ``` ] --- class: clear, center, middle background-image: url(images/rmarkdown-image1.png) <br><br><br> .font300.bold[Deliverables with R] .grey[___Fully reproducible approach to turn your analyses into high quality documents, reports, presentations and dashboards.___] --- # Project deliverables <br> .pull-left[ Typically, our analytic projects result in some type of deliverable... * Research report * Academic paper * Presentation to a decision maker * Dashboard or app for interactive use ] .pull-right[ .center.bold.font120[.red[Traditional] Deliverable Workflow] <img src="images/traditional-deliverable-workflow.png" width="65%" style="display: block; margin: auto;" /> ] --- # Project deliverables <br> .pull-left[ Typically, our analytic projects result in some type of deliverable... * Research report * Academic paper * Presentation to a decision maker * Dashboard or app for interactive use ] .pull-right[ .center.bold.font120[.green[Integrated] Deliverable Workflow] <img src="images/integrated-deliverable-workflow.png" width="65%" style="display: block; margin: auto;" /> ] --- # R Markdown .font120[General idea of R Markdown] .pull-left[ * 1 file that contains - Text - Code * 1 common process to create - PDF - Word - HTML - Slides * Simple typesetting code but if you know `\(\LaTeX\)` or HTML you can highly customize ] .pull-right[ <br><br> <img src="images/idea-of-rmarkdown.png" width="1731" style="display: block; margin: auto;" /> ] --- # Create Your First R Markdown File .center.bold[File » New File » R Markdown] <img src="images/rmarkdown-create.gif" style="display: block; margin: auto;" /> --- # Primary Components of an R Markdown File There are .red[three general components] of an R Markdown file that you will eventually become accustomed to: .pull-left[ * YAML * general markdown (or text) component * code chunks ] .pull-right[ ] --- # Primary Components of an R Markdown File There are .red[three general components] of an R Markdown file that you will eventually become accustomed to: .pull-left[ * .bold.red[YAML] * general markdown (or text) component * code chunks ] .pull-right[ <img src="images/rmarkdown-components-yaml.png" width="955" style="display: block; margin: auto;" /> ] --- # Primary Components of an R Markdown File There are .red[three general components] of an R Markdown file that you will eventually become accustomed to: .pull-left[ * YAML * .bold.red[general markdown (or text) component] * code chunks ] .pull-right[ <img src="images/rmarkdown-components-text.png" width="955" style="display: block; margin: auto;" /> ] --- # Primary Components of an R Markdown File There are .red[three general components] of an R Markdown file that you will eventually become accustomed to: .pull-left[ * YAML * general markdown (or text) component * .bold.red[code chunks] ] .pull-right[ <img src="images/rmarkdown-components-code.png" width="955" style="display: block; margin: auto;" /> ] --- # YAML .pull-left[ * First few lines in the R Markdown file * Starts and ends with `---` * Update this R Markdown template with ```r --- title: "R Markdown Demo" author: "Brad Boehmke" date: "`r Sys.Date()`" output: html_document --- ``` ] -- .pull-right[ <img src="images/updated-yaml.png" width="1108" style="display: block; margin: auto;" /> ] --- # Text Formatting .pull-left[ For the text component, much of your writing is similar to when you type a Word document; however, to perform many of the basic text formatting you use basic markdown code such as as that shown in .center[`Help >> Markdown Quick Reference`] <br> .red[ See if you can: - italicize a word - create a third level header - create an unordered list ] ] .pull-right[ <img src="images/markdown-text-reference.png" width="75%" style="display: block; margin: auto;" /> ] --- # Code Chunks .pull-left[ * code chunks can be used as a means to render R output into documents or to simply display code for illustration. * start with: ` ```{r chunk_name}` and end with <code>```</code> * keyboard shortcut Cmd + Option + I (Windows Ctrl + Alt + I). * You can run an individual code chunk by pressing
<i class="fas fa-play faa-FALSE animated " style=" color:green;"></i>
at the far right hand side of the code chunk. * .red[Go ahead and run each code chunk in your initial R markdown file] ] .pull-right[ <br><br> <img src="images/run-code-chunk.gif" style="display: block; margin: auto;" /> ] --- # Knitting the R Markdown File When you are all done writing your .Rmd document you can click the drop down arrow next to the knit button on the RStudio toolbar, select the document format (HTML, PDF, Word) and your report will be developed. <img src="images/rmarkdown-generate.gif" style="display: block; margin: auto;" /> --- class: yourturn # R Markdown exercise .font120[ Now that you know the basics: * We're going to create an R Markdown report together - Code: Open stock-code.R file - Text: Open stock-text.docx file * Leverage the Markdown Quick Reference guide - `Help >> Markdown Quick Reference guide` * If you don't have a Tex operator (for PDF reports) run ```r install.packages("tinytex") tinytex::install_tinytex() ``` ] --- #
LOTS
of options The following output formats are available to use with R Markdown. .scrollable90[ * Documents - [html_notebook](http://rmarkdown.rstudio.com/r_notebooks.html) - Interactive R Notebooks - [html_document](http://rmarkdown.rstudio.com/html_document_format.html) - HTML document w/ Bootstrap CSS - [pdf_document](http://rmarkdown.rstudio.com/pdf_document_format.html) - PDF document (via LaTeX template) - [word_document](http://rmarkdown.rstudio.com/word_document_format.html) - Microsoft Word document (docx) - [odt_document](http://rmarkdown.rstudio.com/odt_document_format.html) - OpenDocument Text document - [rtf_document](http://rmarkdown.rstudio.com/rtf_document_format.html) - Rich Text Format document - [md_document](http://rmarkdown.rstudio.com/markdown_document_format.html) - Markdown document (various flavors) * Presentations (slides) - [xaringan::moon_reader](https://github.com/yihui/xaringan) - HTML presentation with remark.js - [ioslides_presentation](http://rmarkdown.rstudio.com/ioslides_presentation_format.html) - HTML presentation with ioslides - [revealjs::revealjs_presentation](http://rmarkdown.rstudio.com/revealjs_presentation_format.html) - HTML presentation with reveal.js - [slidy_presentation](http://rmarkdown.rstudio.com/slidy_presentation_format.html) - HTML presentation with W3C Slidy - [beamer_presentation](https://bookdown.org/yihui/rmarkdown/beamer-presentation.html) - PDF presentation with LaTeX Beamer * More - [flexdashboard::flex_dashboard](http://rmarkdown.rstudio.com/flexdashboard/) - Interactive dashboards - [tufte::tufte_handout](http://rmarkdown.rstudio.com/tufte_handout_format.html) - PDF handouts in the style of Edward Tufte - [tufte::tufte_html](http://rmarkdown.rstudio.com/tufte_handout_format.html) - HTML handouts in the style of Edward Tufte - [tufte::tufte_book](http://rmarkdown.rstudio.com/tufte_handout_format.html) - PDF books in the style of Edward Tufte - [html_vignette](http://rmarkdown.rstudio.com/package_vignette_format.html) - R package vignette (HTML) - [github_document](http://rmarkdown.rstudio.com/github_document_format.html) - GitHub Flavored Markdown document - [bookdown](https://bookdown.org/) - Write HTML, PDF, ePub, and Kindle books with R Markdown - [blogdown](https://bookdown.org/yihui/blogdown/) - Create websites using R Markdown and Hugo ] --- # Learn More .pull-left[ <br><br><br><br><br> https://bookdown.org/yihui/rmarkdown/ ] .pull-right[ <img src="images/definitive-guide-rmarkdown.png" width="65%" style="display: block; margin: auto;" /> ] --- class: clear, center, middle background-image: url(images/github-mark.png) .font200.bold[Version Control] --- # It can be painful but learn it! .center[ We do not have time to cover version control but you should definitely start by reading this book: http://happygitwithr.com/ ] <img src="images/happy-git-with-r.png" width="80%" style="display: block; margin: auto;" /> --- # Summary <br> .font130[ * R Projects (`.RProj`) for Project-oriented Workflow * `here::here()` for generalized paths * R Markdown for efficient & reproducible deliverables * Integrated Git for version control ] --- # Questions <br> <img src="images/questions.png" width="450" height="450" style="display: block; margin: auto;" />