class: clear, center, middle background-image: url("images/function-icon.jpg") background-size: contain <br><br><br><br><br><br><br><br><br><br><br><br><br> .center.font300.bold[Writing Functions] .center[_"Writing good functions is a lifetime journey."_ - Hadley Wickham] --- # Overview - Why functions are good - Anatomy of a function - Defining functions - Defining your arguments - Checking arguments and other conditions - Lazy evaluation - Lexical scoping - Distributing your function(s) - Learning more --- # Why functions are good .pull-left[ * A function centralizes a common task to a single, abstract method * This helps to: - increase .green[reusability] which... - reduces .red[verbosity] - reduces .red[repetitiveness] - reduces the .red[chance of making an error] - reduces .red[instances requiring updates] when changes are needed - reduces .red[amount of code to test] ] -- .pull-right[ Consider the following from Hadley Wickham’s book _R for Data Science_. Where's the .red[error]? ```r df <- tibble::tibble( a = rnorm(10), b = rnorm(10), c = rnorm(10), d = rnorm(10) ) df$a <- (df$a - min(df$a, na.rm = TRUE)) / (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE)) df$b <- (df$b - min(df$b, na.rm = TRUE)) / (max(df$b, na.rm = TRUE) - min(df$b, na.rm = TRUE)) df$c <- (df$c - min(df$c, na.rm = TRUE)) / (max(df$c, na.rm = TRUE) - min(df$c, na.rm = TRUE)) df$d <- (df$d - min(df$d, na.rm = TRUE)) / (max(df$d, na.rm = TRUE) - min(df$d, na.rm = TRUE)) ``` ] --- # Why functions are good .pull-left[ * A function centralizes a common task to a single, abstract method * This helps to: - increase .green[reusability] which... - reduces .red[verbosity] - reduces .red[repetitiveness] - reduces the .red[chance of making an error] - reduces .red[instances requiring updates] when changes are needed - reduces .red[amount of code to test] ] .pull-right[ Consider the following from Hadley Wickham’s book _R for Data Science_. Where's the .red[error]? ```r df <- tibble::tibble( a = rnorm(10), b = rnorm(10), c = rnorm(10), d = rnorm(10) ) df$a <- (df$a - min(df$a, na.rm = TRUE)) / (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE)) df$b <- (df$b - min(df$b, na.rm = TRUE)) / * (max(df$b, na.rm = TRUE) - min(df$a, na.rm = TRUE)) df$c <- (df$c - min(df$c, na.rm = TRUE)) / (max(df$c, na.rm = TRUE) - min(df$c, na.rm = TRUE)) df$d <- (df$d - min(df$d, na.rm = TRUE)) / (max(df$d, na.rm = TRUE) - min(df$d, na.rm = TRUE)) ``` ] --- # Prereqs .center.font200.bold[Base R] -- <img src="https://media.giphy.com/media/tLql6mMHC6wvK/giphy.gif" style="display: block; margin: auto;" /> -- <br> .center.bold[I will use some packages throughout but these are simply for examples. You will be notified but sometimes you just need to learn to
<i class="fas fa-heart faa-pulse animated " style=" color:red;"></i>
base R.] --- # Key ingredients of a function <br> .pull-left[ .font140[ 1. Name 2. arguments 3. body 4. environment ] ] .pull-left[ <br> ```r my_fun <- function(arg1, arg2) { << body >> } ``` ] --- # Key ingredients of a function .pull-left[ <br> .font140[ 1. Name 2. arguments 3. body 4. environment ] ] .pull-right[ ```r pv <- function(fv, r, n = 5) { fv / (1 + r)^n } ``` ```r args(pv) ## function (fv, r, n = 5) ## NULL ``` ```r body(pv) ## { ## fv/(1 + r)^n ## } ``` ```r environment(pv) ## <environment: R_GlobalEnv> ``` ] -- <br> .center[.content-box-gray[Note how the environment is our global environment]] --- class: yourturn # Your Turn! .pull-left[ Identify the arguments, body, and environment of: 1. `read.csv` 2. `dplyr::add_count` 3. `sum` ] -- .pull-right[ ```r # #1 read.csv args(read.csv) ## function (file, header = TRUE, sep = ",", quote = "\"", dec = ".", ## fill = TRUE, comment.char = "", ...) ## NULL body(read.csv) ## read.table(file = file, header = header, sep = sep, quote = quote, ## dec = dec, fill = fill, comment.char = comment.char, ...) environment(read.csv) ## <environment: namespace:utils> ``` ] --- class: yourturn # Your Turn! .pull-left[ Identify the arguments, body, and environment of: 1. `read.csv` 2. `dplyr::add_count` 3. `sum` ] .pull-right[ ```r # #2 dplyr::add_count args(dplyr::add_count) ## function (x, ..., wt = NULL, sort = FALSE) ## NULL body(dplyr::add_count) ## { ## g <- group_vars(x) ## grouped <- group_by(x, ..., add = TRUE) ## out <- add_tally(grouped, wt = !!enquo(wt), sort = sort) ## grouped_df(out, g) ## } environment(dplyr::add_count) ## <environment: namespace:dplyr> ``` ] --- class: yourturn # Your Turn! .pull-left[ Identify the arguments, body, and environment of: 1. `read.csv` 2. `dplyr::add_count` 3. `sum` ] .pull-right[ ```r # #3 sum args(sum) ## function (..., na.rm = FALSE) ## NULL *body(sum) # NULL because its a primitive ## NULL *environment(sum) # NULL because its a primitive ## NULL ``` ] <br><br> .center[.content-box-gray[`sum()` is a primitive function]] --- # Defining a function .pull-left[ * We define a function with `<-` just like we define any other R object * Use informative names; strive to use verbs when possible ] .pull-right[ ```r present_value <- function() { } ``` ```r compute_pv <- function() { } ``` ] --- # Defining a function .pull-left[ * We define a function with `<-` just like we define any other R object * Use informative names; strive to use verbs when possible * We can define a function with no arguments; however, this is rarely useful ] .pull-right[ ```r present_value <- function() { } ``` ```r compute_pv <- function() { } ``` ```r compute_pv <- function() { 1000 / (1 + 0.05)^10 } compute_pv() ## [1] 614 ``` ] <br> -- .center[___We want a function that does one thing consistently well, yet provide flexibility in the inputs provided.___] --- # Defining a function .pull-left[ .opacity20[ * We define a function with `<-` just like we define any other R object * Use informative names; strive to use verbs when possible * We can define a function with no arguments; however, this is rarely useful ] .font120[ * Consequently, most functions have all key ingredients ] ] .pull-right[ .opacity20[ ```r present_value <- function() { } ``` ] .opacity20[ ```r compute_pv <- function() { } ``` ] .opacity20[ ```r compute_pv <- function() { 1000 / (1 + 0.05)^10 } ``` ] .code80[ ```r compute_pv <- function(fv, r, n) { fv / (1 + r)^n } compute_pv(fv = 1000, r = .05, n = 10) ## [1] 614 compute_pv(fv = 2000, r = .08, n = 5) ## [1] 1361 ``` ] ] --- class: yourturn # Your Turn! .pull-left[ * Define a function titled ratio that takes arguments `x` and `y` and returns their ratio, `x / y` * Call `ratio()` with arguments 3 and 4 ] -- .pull-right[ ```r ratio <- function(x, y) { x / y } ratio(3, 4) ## [1] 0.75 ``` ] --- # Calling arguments in different ways .font120[Many ways to call arguments:] .pull-left[ .font140[ * Using argument names ] ] .pull-right[ ```r compute_pv(fv = 1000, r = .05, n = 10) ## [1] 613.9133 ``` ] --- # Calling arguments in different ways .font120[Many ways to call arguments:] .pull-left[ .font120[ * Using argument names * Positional matching ] ] .pull-right[ ```r compute_pv(fv = 1000, r = .05, n = 10) ## [1] 613.9133 ``` ```r compute_pv(1000, .05, 10) ## [1] 613.9133 ``` ] --- # Calling arguments in different ways .font120[Many ways to call arguments:] .pull-left[ .font120[ * Using argument names * Positional matching * Must use names if you change order * ...otherwise error or incorrect computation will occur ] ] .pull-right[ ```r compute_pv(fv = 1000, r = .05, n = 10) ## [1] 613.9133 ``` ```r compute_pv(1000, .05, 10) ## [1] 613.9133 ``` ```r compute_pv(r = .05, fv = 1000, n = 10) ## [1] 613.9133 ``` ```r compute_pv(.05, 1000, 10) ## [1] 4.950274e-32 ``` ] --- # Calling arguments in different ways .font120[Many ways to call arguments:] .pull-left[ .font120[ * Using argument names * Positional matching * Must use names if you change order * ...otherwise error or incorrect computation will occur * missing arguments results in error ] ] .pull-right[ ```r compute_pv(fv = 1000, r = .05, n = 10) ## [1] 613.9133 ``` ```r compute_pv(1000, .05, 10) ## [1] 613.9133 ``` ```r compute_pv(r = .05, fv = 1000, n = 10) ## [1] 613.9133 ``` ```r compute_pv(.05, 1000, 10) ## [1] 4.950274e-32 ``` ```r compute_pv(1000, .05) ## Error in compute_pv(1000, 0.05): argument "n" is missing, with no default ``` ] --- # Ordering your arguments Ordering arguments in your functions is important: * pipe (.font120[`%>%`]) operator - tidyverse pkgs consistently have data as first argument - makes for intuitive syntax & easy flow * positional matching - which args are most vs less important - helps dictate where defaults need to go (next slide) ```r # bad --> rounding to digits is least important parameter compute_pv <- function(digits, r, n, fv) { << body >> } # better but could still be improved --> does a user always need to specify digits to round to? compute_pv <- function(fv, r, n, digits) { << body >> } ``` --- # Setting (.green[good]) defaults .pull-left[ * As our functions require more inputs... * many of them can likely do with defaults * Usually, the first 1-3 arguments are not preset but the rest are * Good defaults are largely based on - methodology best practices (i.e. hyperparameters, removing missing values) - industry best practices or current values (i.e. current interest rates) - organizational best practices (i.e. golden rules) ] .pull-right[ ```r compute_pv <- function(fv, r, n, digits = 2) { round(fv / (1 + r)^n, digits = digits) } # by default, will round to 2 digits compute_pv(1000, .05, 10) ## [1] 613.91 # to adjust, specify by position compute_pv(1000, .05, 10, 0) ## [1] 614 # or by name compute_pv(1000, .05, 10, digits = 0) ## [1] 614 ``` ] --- class: yourturn # Your Turn - Part 1 .pull-left[ * Earlier in these slides you saw the following code duplicated: ```r (df$a - min(df$a, na.rm = TRUE)) / (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE)) ``` * Can you write a function called `rescale()` that takes argument `x` and executes this code? * Test it on the following vector: ```r set.seed(123) test_vector <- runif(20, min = 25, max = 40) ``` ] -- .pull-right[ ```r rescale <- function(x){ rng <- range(x, na.rm = TRUE) (x - rng[1]) / (rng[2] - rng[1]) } rescale(test_vector) ## [1] 0.2684 0.8158 0.4011 0.9193 0.9821 0.0038 0.5313 0.9296 0.5568 0.4532 ## [11] 1.0000 0.4496 0.6947 0.5800 0.0665 0.9377 0.2230 0.0000 0.3125 0.9975 ``` ] --- class: yourturn # Your Turn - Part 2 .pull-left[ * Now add an argument to `rescale()` that allows you to round the output to a specified decimal. * Set the default to 2. ] -- .pull-right[ ```r rescale <- function(x, digits = 2){ rng <- range(x, na.rm = TRUE) scaled <- (x - rng[1]) / (rng[2] - rng[1]) round(scaled, digits = digits) } # default rescale(test_vector) ## [1] 0.27 0.82 0.40 0.92 0.98 0.00 0.53 0.93 0.56 0.45 1.00 0.45 0.69 0.58 ## [15] 0.07 0.94 0.22 0.00 0.31 1.00 # 3 digits rescale(test_vector, digits = 3) ## [1] 0.268 0.816 0.401 0.919 0.982 0.004 0.531 0.930 0.557 0.453 1.000 ## [12] 0.450 0.695 0.580 0.067 0.938 0.223 0.000 0.312 0.997 ``` ] --- # Checking arguments and other conditions * We've created a function that allows users to specify their inputs * But how do we ensure they provide us with the right kind of inputs? * Or what if we want to be able to provide some sort of feedback to the user? * There are several ways to signal __conditions__ to function users: -- - .font110[`stop()`] & .font110[`stopifnot()`]: signal an .red[error] (no way for a function to continue and execution must stop) -- - .font110[`warning()`]: signal a warning (something has gone wrong but the function has been able to at least partially recover.) -- - .font110[`message()`]: signal an informative message (function works fine but user should be informed of something.) .center[___The most common you will use are .font120[`stop()`] & .font120[`message()`]___] --- # `stop()` & `stopifnot()` .scrollable90[ .pull-left[ * The .font120[`stop`] functions are most commonly used to check for proper inputs * but can be used to .red[stop] the function procedures because the user's environment is not properly established * Weigh the benefits of highly custom, informative error messages (via `stop`) versus short to the point (via `stopifnot`) * Execution will stop at the first violation ] .pull-right[ ```r # stop compute_pv <- function(fv, r, n, digits = 2) { if(!is.numeric(fv)) { * stop("`fv` must be numeric", call. = FALSE) } if(!is.numeric(r)) { * stop("`r` must be numeric", call. = FALSE) } if(!is.numeric(n)) { * stop("`n` must be numeric", call. = FALSE) } if(!is.numeric(digits)) { * stop("`digits` must be numeric", call. = FALSE) } round(fv / (1 + r)^n, digits = digits) } compute_pv("1000", ".05", 10) ## Error: `fv` must be numeric ``` ```r # stopifnot compute_pv <- function(fv, r, n, digits = 2) { * stopifnot(is.numeric(fv), is.numeric(r), is.numeric(n), is.numeric(digits)) round(fv / (1 + r)^n, digits = digits) } compute_pv("1000", ".05", 10) ## Error in compute_pv("1000", ".05", 10): is.numeric(fv) is not TRUE ``` ] ] --- # `warning()` .pull-left[ * `warnings()` are rarely used but... * can be useful to signal .blue[non-terminating] concerns - hyperparameter clashes in a grid search - an extremely long compute time and the user may want to re-assess - deprecated functions - NAs, NULLs, Infs exist, which result in NA output ```r lubridate::mdy(c("2-1-2019", "1995")) ## Warning: 1 failed to parse. ## [1] "2019-02-01" NA ``` ] -- .pull-right[ ```r present_value <- function(fv, r, n, digits = 2) { * warning("`present_value()` is deprecated. Use `compute_pv()`", call. = FALSE) stopifnot(is.numeric(fv), is.numeric(r), is.numeric(n), is.numeric(digits)) round(fv / (1 + r)^n, digits = digits) } present_value(1000, .05, 10) ## Warning: `present_value()` is deprecated. Use `compute_pv()` ## [1] 613.91 ``` ] --- # `message()` .pull-left[ * designed to be informational * use them to tell the user that you’ve done something on their behalf * may represent: ] --- # `message()` .pull-left[ * designed to be informational * use them to tell the user that you’ve done something on their behalf * may represent: - default argument that may be non-trivial ] .pull-right[ ```r library(ggplot2) ggplot(iris, aes(Sepal.Length)) + geom_histogram() ## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. ``` <img src="day-2d-functions_files/figure-html/messages-ggplot-example-1.png" style="display: block; margin: auto;" /> ] --- # `message()` .pull-left[ * designed to be informational * use them to tell the user that you’ve done something on their behalf * may represent: - default argument that may be non-trivial - how information was interpreted by the function ] .pull-right[ ```r df <- readr::read_csv("data/Month-01.csv") ## Parsed with column specification: ## cols( ## Account_ID = col_integer(), ## Transaction_Timestamp = col_datetime(format = ""), ## Factor_A = col_integer(), ## Factor_B = col_integer(), ## Factor_C = col_character(), ## Factor_D = col_integer(), ## Factor_E = col_character(), ## Response = col_integer(), ## Transaction_Status = col_character(), ## Month = col_character() ## ) ``` ] --- # `message()` .pull-left[ * designed to be informational * use them to tell the user that you’ve done something on their behalf * may represent: - default argument that may be non-trivial - how information was interpreted by the function - words of encouragement! ] .pull-right[ <br> <img src="images/gitr-example.png" width="663" style="display: block; margin: auto;" /> ] --- # `message()` .pull-left[ * designed to be informational * use them to tell the user that you’ve done something on their behalf * may represent: - default argument that may be non-trivial - how information was interpreted by the function - words of encouragement! - or even industry/business best practices ] .pull-right[ ```r compute_pv <- function(fv, r, n, digits = 2) { stopifnot(is.numeric(fv), is.numeric(r), is.numeric(n), is.numeric(digits)) * if(r < 0.01 || r > 0.25) { * message("Your interest rate exceeds the normal ranges of 1-25%") * } round(fv / (1 + r)^n, digits = digits) } compute_pv(1000, .26, 10) ## Your interest rate exceeds the normal ranges of 1-25% ## [1] 99.15 ``` ] --- # `message()` .pull-left[ * designed to be informational * use them to tell the user that you’ve done something on their behalf * may represent: - default argument that may be non-trivial - how information was interpreted by the function - words of encouragement! - or even industry/business best practices * .blue[Always provide an option to suppress messages] ] .pull-right[ ```r *compute_pv <- function(fv, r, n, digits = 2, quiet = FALSE) { stopifnot(is.numeric(fv), is.numeric(r), is.numeric(n), is.numeric(digits)) * if(!quiet) { if(r < 0.01 || r > 0.25) { message("Your interest rate exceeds the normal ranges of 1-25%") } # other messages here... } round(fv / (1 + r)^n, digits = digits) } # default provides message compute_pv(1000, .26, 10) ## Your interest rate exceeds the normal ranges of 1-25% ## [1] 99.15 # user can silence compute_pv(1000, .26, 10, quiet = TRUE) ## [1] 99.15 ``` ] --- class: yourturn # Your Turn! .pull-left[ Let's go back to our `rescale()` function: ```r rescale <- function(x, digits = 2){ rng <- range(x, na.rm = TRUE) scaled <- (x - rng[1]) / (rng[2] - rng[1]) round(scaled, digits = digits) } ``` ] --- class: yourturn # Your Turn! .scrollable90[ .pull-left[ Let's go back to our `rescale()` function: ```r rescale <- function(x, digits = 2){ * rng <- range(x, na.rm = TRUE) scaled <- (x - rng[1]) / (rng[2] - rng[1]) round(scaled, digits = digits) } ``` * Move the `na.rm = TRUE` option to the function arguments so that the user can control whether or not to remove missing values. * Add a `warning()` that tells the user NAs are present (if they use `na.rm = FALSE`) * Add a `message()` that tells the user how many NAs were removed * add a missing value to your test_vector and test out your revised function ```r test_vector <- c(NA, test_vector) ``` ] .pull-right[ Your results should look something like this: ```r # results in an error rescale(letters) ## Error: `x` must be a numeric atomic vector # results in warning rescale(test_vector, na.rm = FALSE) ## Warning: There are 1 NAs. Remove them with `na.rm = TRUE` ## [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA # results in message rescale(test_vector) ## 1 NAs were removed. ## [1] 0.27 0.82 0.40 0.92 0.98 0.00 0.53 0.93 0.56 0.45 1.00 0.45 0.69 0.58 ## [15] 0.07 0.94 0.22 0.00 0.31 1.00 ``` ] ] --- class: yourturn # Your Turn! .scrollable90[ .pull-left[ Let's go back to our `rescale()` function: ```r rescale <- function(x, digits = 2){ * rng <- range(x, na.rm = TRUE) scaled <- (x - rng[1]) / (rng[2] - rng[1]) round(scaled, digits = digits) } ``` * Move the `na.rm = TRUE` option to the function arguments so that the user can control whether or not to remove missing values. * Add a `warning()` that tells the user NAs are present (if they use `na.rm = FALSE`) * Add a `message()` that tells the user how many NAs were removed * add a missing value to your test_vector and test out your revised function ```r test_vector <- c(NA, test_vector) ``` ] .pull-right[ ```r rescale <- function(x, digits = 2, na.rm = TRUE, quiet = FALSE){ # make sure x an numeric atomic vector if(!is.atomic(x) || !is.numeric(x)) { stop("`x` must be a numeric atomic vector", call. = FALSE) } # compute number of missing values missing_n <- sum(is.na(x)) # warning if na.rm = FALSE and NAs are present if(!na.rm) { warning(paste( "There are", missing_n, "NAs. Remove them with `na.rm = TRUE`"), call. = FALSE) } # message to report how many NAs were removed if(na.rm) { x <- x[!is.na(x)] if(!quiet) { message(paste(missing_n, "NAs were removed.")) } } # compute scaling rng <- range(x) scaled <- (x - rng[1]) / (rng[2] - rng[1]) round(scaled, digits = digits) } # results in an error rescale(letters) ## Error: `x` must be a numeric atomic vector # results in warning rescale(test_vector, na.rm = FALSE) ## Warning: There are 1 NAs. Remove them with `na.rm = TRUE` ## [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA # results in message rescale(test_vector) ## 1 NAs were removed. ## [1] 0.27 0.82 0.40 0.92 0.98 0.00 0.53 0.93 0.56 0.45 1.00 0.45 0.69 0.58 ## [15] 0.07 0.94 0.22 0.00 0.31 1.00 ``` ] ] --- # Lazy evaluation .pull-left[ .font120[ * In R, function arguments are .blue[__lazily evaluated: *only evaluated if accessed*__] ] ] .pull-right[ ```r h01 <- function(x) { 10 } h01(x = stop("This is an error!")) ## [1] 10 ``` ] --- # Lazy evaluation .pull-left[ * In R, function arguments are .blue[__lazily evaluated: *only evaluated if accessed*__] * This provides you with lots of flexibility in when and how to evaluate inputs ] .pull-right[ ```r h02 <- function(x, y, z) { if(missing(z)) { sum(x, y) } else { sum(x, y, z) } } # z does not exist h02(x = 1, y = 2) ## [1] 3 # z does exist h02(x = 1, y = 2, z = 3) ## [1] 6 ``` ] --- # Lazy evaluation .pull-left[ * In R, function arguments are .blue[__lazily evaluated: *only evaluated if accessed*__] * This provides you with lots of flexibility in when and how to evaluate inputs * You need to be careful with how you use this, as it even allows you to use arguments outside of your function environment ] .pull-right[ ```r h03 <- function(x, y) { if(exists("z")) { x + y + z } else { x + y } } # z does not exist h03(x = 1, y = 2) ## [1] 3 # z exists in global environment z <- 3 h03(x = 1, y = 2) ## [1] 6 ``` ] --- # Lazy evaluation .pull-left[ * In R, function arguments are .blue[__lazily evaluated: *only evaluated if accessed*__] * This provides you with lots of flexibility in when and how to evaluate inputs * You need to be careful with how you use this, as it even allows you to use arguments outside of your function environment * One of the more common applications is to .blue[use `NULL` to indicate that a variable is not required but can be used if supplied] ] .pull-right[ ```r *compute_pv <- function(fv, r, n, multiplier = NULL) { # compute present value pv <- round(fv / (1 + r)^n) * # apply multiplier if its supplied * if(is.null(multiplier)) { * pv * } else { * pv * multiplier * } } # w/o multiplier compute_pv(1000, .05, 10) ## [1] 614 # with multiplier compute_pv(1000, .05, 10, multiplier = 1.25) ## [1] 767.5 ``` ] --- # Lexical scoping .pull-left[ * R uses a nested environment structure * a function has: - its own environment - plus one or more parent environments - always a fresh start * a function will search: 1. itself 2. through hierarchical parental environments * a function will never: - use a parameter from a parent environment when it exists in the current environment - search within another function unless it's its parent ] .pull-right[ <img src="images/nested-environments1.png" width="50%" style="display: block; margin: auto;" /> <img src="images/nested-environments2.png" width="50%" style="display: block; margin: auto;" /> ] --- # Pure functions .pull-left[ * This is why `h03()` in the previous example worked * However, we now have an ___impure___ function * .blue.bold[Pure function]: - returns the same result if given the same arguments - does not cause any observable side effects ] --- # Pure functions .pull-left[ * This is why `h03()` in the previous example worked * However, we now have an impure function * .blue.bold[Pure function]: - ___returns the same result if given the same arguments___ - does not cause any observable side effects <br> .center.bold.red[_Constency is our friend!_] ] .pull-right[ .code70[ ```r # h03() changes depending on z's value # outside of the function call z <- 4 h03(x = 1, y = 2) ## [1] 7 z <- 5 h03(x = 1, y = 2) ## [1] 8 # if your function is dependent on randomness # allow users to set seed get_n_samples <- function(x, n, seed = NULL) { if (!is.null(seed)) { set.seed(seed) } sample(x, size = n) } x <- 1:10 # allows user to always get same results get_n_samples(x, 5, seed = 123) ## [1] 3 8 4 7 6 get_n_samples(x, 5, seed = 123) ## [1] 3 8 4 7 6 ``` ] ] --- # Pure functions .pull-left[ * This is why `h03()` in the previous example worked * However, we now have an impure function * .blue.bold[Pure function]: - returns the same result if given the same arguments - ___does not cause any observable side effects___ <br> .center.bold.red[_Don't change things the user doesn't want or expect!_] ] .pull-right[ ```r get_n_samples <- function(x, n, seed = NULL) { if (!is.null(seed)) { set.seed(seed) } samples <- sample(x, size = n) * # overwrite x * x <<- samples * # change options to print 8 decimals * options(digits = 2) samples } x <- seq(.0001, 1, by = 0.001) get_n_samples(x, 5, seed = 123) ## [1] 0.29 0.79 0.41 0.88 0.94 x ## [1] 0.29 0.79 0.41 0.88 0.94 ``` ] --- # Pure functions .font140[ Take-away: * functions should do one thing * functions should do that one thing consistently * functions should not do anything else ] --- # Distributing your function(s) .pull-left[ Two main ways to distribute your functions: .font120[ 1. using the `source()` function to read an R script - easy to get started - all functions/objects in the sourced .R script will be imported to your global environment - does not scale well ] ] .pull-right[ <br> ```r source("my-functions/custom-stat-functions.R") ls() ## [1] "skewness" "std_dev" "std_error" "variance" purrr::map_dbl(mtcars, skewness) ## mpg cyl disp hp drat wt qsec vs am gear carb ## 0.65 -0.19 0.41 0.77 0.28 0.45 0.39 0.26 0.39 0.56 1.12 ``` ] --- # Distributing your function(s) .pull-left[ Two main ways to distribute your functions: 1. using the `source()` function to read an R script - easy to get started - all functions/objects in the sourced .R script will be imported to your global environment - does not scale well 2. writing a package (package writing will not be covered) - once you have a few functions that serve a common use - hard to get started - scales beautifully ] .pull-right[ <br> ```r library(gitr) lsf.str("package:gitr") ## git_access_token : function () ## git_add : function (files = "all", verbose = TRUE) ## git_clone : function (repo, path = ".", branch = NULL) ## git_commit : function (message, verbose = TRUE) ## git_file_size : function (files = "all", sort = TRUE) ## git_log : function (limit = Inf, detailed = TRUE, author = NULL, stats = FALSE, ## show_diff = FALSE, message = NULL, file = NULL) ## git_pull : function (force = FALSE) ## git_push : function (force = FALSE) ## git_rdone : function (message, files = "all", force = FALSE, verbose = TRUE) ## git_status : function (verbose = FALSE) ## install_internal_pkg : function (package, github_acct = "Analytics-Tools", host = "github.8451.com/api/v3", ## auth_token = NULL, ...) ## launch_github : function (...) ``` ] --- # Bells & whistles .pull-left[ We can do ___many___ other things with functions such as: .font120[ * add progress bars ] ] .pull-right[ ```r import_monthly_data <- function(files, quiet = FALSE) { if(!quiet) { pb <- progress_estimated(length(files)) } special_import <- function(x) { if (quiet) { suppressMessages(readr::read_csv(x)) } else { pb$tick()$print() suppressMessages(readr::read_csv(x)) } } purrr::map_dfr(paste0("data/", files), special_import) } import_monthly_data(list.files("data")) ## |=================================================================|100% ~0 s remaining ## # A tibble: 698,159 x 10 ## Account_ID Transaction_Timest… Factor_A Factor_B Factor_C Factor_D Factor_E Response ## <int> <dttm> <int> <int> <chr> <int> <chr> <int> ## 1 5 2009-01-08 00:16:41 2 6 VI 20 A 1020 ## 2 16 2009-01-20 22:40:08 2 6 VI 20 H 1020 ## 3 28 2009-01-19 13:24:55 2 6 VI 21 NULL 1020 ## 4 40 2009-01-05 16:10:58 2 6 VI 20 H 1020 ## 5 62 2009-01-21 19:13:13 2 6 VI 20 B 1020 ## 6 64 2009-01-01 18:53:02 7 6 MC 20 NULL 1020 ## 7 69 2009-01-08 00:15:19 2 6 VI 20 H 1020 ## 8 69 2009-01-19 09:33:22 2 6 VI 20 H 1020 ## 9 70 2009-01-05 12:07:47 2 6 VI 20 B 1020 ## 10 79 2009-01-07 19:41:18 7 6 MC 20 NULL 1020 ## # … with 698,149 more rows, and 2 more variables: Transaction_Status <chr>, Month <chr> ``` ] --- # Bells & whistles .pull-left[ We can do ___many___ other things with functions such as: * add progress bars * take any number of additional arguments with .font120[`...`] to: - pass arguments on to internal arguments ] .pull-right[ ```r plot_strongest_corr <- function(data, response, ...) { # find strong correlation vars <- setdiff(names(data), response) correlations <- sapply(data[, vars], function(x) cor(data[, response], x)) x <- names(which.max(abs(correlations))) # plot relationship plot(data[, x], data[, response], xlab = x, ylab = response, ...) } plot_strongest_corr(mtcars, "mpg", pch = 19, col = "blue", main = "Some cool title") ``` <img src="day-2d-functions_files/figure-html/unnamed-chunk-27-1.png" width="50%" height="50%" style="display: block; margin: auto;" /> ] --- # Bells & whistles .pull-left[ We can do ___many___ other things with functions such as: * add progress bars * take any number of additional arguments with .font120[`...`] to: - pass arguments on to internal arguments - or accept unknown number of inputs ] .pull-right[ ```r select <- function(.data, ...) { # convert ... to a list and then to a string arguments <- substitute(...()) vars <- paste(arguments) # index for .data[, vars] } select(mtcars, mpg, wt) ## mpg wt ## Mazda RX4 21 2.6 ## Mazda RX4 Wag 21 2.9 ## Datsun 710 23 2.3 ## Hornet 4 Drive 21 3.2 ## Hornet Sportabout 19 3.4 ## Valiant 18 3.5 ## Duster 360 14 3.6 ## Merc 240D 24 3.2 ## Merc 230 23 3.1 ## Merc 280 19 3.4 ## Merc 280C 18 3.4 ## Merc 450SE 16 4.1 ## Merc 450SL 17 3.7 ## Merc 450SLC 15 3.8 ## Cadillac Fleetwood 10 5.2 ## Lincoln Continental 10 5.4 ## Chrysler Imperial 15 5.3 ## Fiat 128 32 2.2 ## Honda Civic 30 1.6 ## Toyota Corolla 34 1.8 ## Toyota Corona 22 2.5 ## Dodge Challenger 16 3.5 ## AMC Javelin 15 3.4 ## Camaro Z28 13 3.8 ## Pontiac Firebird 19 3.8 ## Fiat X1-9 27 1.9 ## Porsche 914-2 26 2.1 ## Lotus Europa 30 1.5 ## Ford Pantera L 16 3.2 ## Ferrari Dino 20 2.8 ## Maserati Bora 15 3.6 ## Volvo 142E 21 2.8 ``` ] --- # Bells & whistles .scrollable90[ .pull-left[ We can do ___many___ other things with functions such as: * add progress bars * take any number of additional arguments with .font120[`...`] * improve efficiency with C++ ] .pull-right[ ```r # create our own sum function sumR <- function(x) { total <- 0 for (i in seq_along(x)) { total <- total + x[i] } total } # use Rcpp to write a sum function in C++ library(Rcpp) cppFunction('double sumC(NumericVector x) { int n = x.size(); double total = 0; for(int i = 0; i < n; ++i) { total += x[i]; } return total; }') x <- runif(1e3) bench::mark( sum(x), sumC(x), sumR(x) )[1:6] ## # A tibble: 3 x 6 ## expression min mean median max `itr/sec` ## <chr> <bch:tm> <bch:tm> <bch:tm> <bch:tm> <dbl> ## 1 sum(x) 950ns 1.14µs 968ns 21.39µs 877949. ## 2 sumC(x) 2.12µs 4.42µs 4.54µs 1.09ms 226007. ## 3 sumR(x) 35.82µs 39.34µs 37.27µs 256.51µs 25421. ``` ] ] --- # Bells & whistles .scrollable90[ .pull-left[ We can do ___many___ other things with functions such as: * add progress bars * take any number of additional arguments with .font120[`...`] * improve efficiency with C++ <br> .center.bold[.content-box-gray[_And so much more!_]] ] .pull-right[ ```r # create our own sum function sumR <- function(x) { total <- 0 for (i in seq_along(x)) { total <- total + x[i] } total } # use Rcpp to write a sum function in C++ library(Rcpp) cppFunction('double sumC(NumericVector x) { int n = x.size(); double total = 0; for(int i = 0; i < n; ++i) { total += x[i]; } return total; }') x <- runif(1e3) bench::mark( sum(x), sumC(x), sumR(x) )[1:6] ## # A tibble: 3 x 6 ## expression min mean median max `itr/sec` ## <chr> <bch:tm> <bch:tm> <bch:tm> <bch:tm> <dbl> ## 1 sum(x) 950ns 1.27µs 995ns 37.5µs 784899. ## 2 sumC(x) 2.12µs 3.67µs 2.51µs 1.24ms 272609. ## 3 sumR(x) 35.63µs 38.98µs 37.22µs 132.34µs 25655. ``` ] ] --- class: yourturn # Time permitting Practice writing the following functions and test them on the given test vector .pull-left[ ```r # test vector set.seed(123) x <- rlnorm(100) # create a function that computes the variance of x (1 / (length(x) - 1)) * sum((x - mean(x))^2) # create a function that computes the standard deviation of x sqrt((1 / (length(x) - 1)) * sum((x - mean(x))^2)) # create a function that computes the standard error of x var_x <- (1 / (length(x) - 1)) * sum((x - mean(x))^2) sqrt(var_x / length(x)) # create a function that computes the skewness of x n <- length(x) v <- var(x) m <- mean(x) third.moment <- (1 / (n - 2)) * sum((x - m)^3) third.moment / (var(x)^(3 / 2)) ``` ] .pull-right[ ```r # test vector set.seed(123) x <- rlnorm(100) # variance function output variance(x) ## [1] 3 # stardard dev output std_dev(x) ## [1] 1.7 # standard error output std_error(x) ## [1] 0.17 # skewness output skewness(x) ## [1] 2.3 ``` ] --- # Learning more .center.font150.bold[Advanced R: https://adv-r.hadley.nz/] <img src="images/adv-r-cover.png" width="333" style="display: block; margin: auto;" /> --- # Questions <br> <img src="images/questions.png" width="450" height="450" style="display: block; margin: auto;" />