Writing Functions

class: clear, center, middle

background-image: url("images/function-icon.jpg")
background-size: contain

.center.font300.bold[Writing Functions]
.center[_"Writing good functions is a lifetime journey."_ - Hadley Wickham]

---
# Overview

- Why functions are good

- Anatomy of a function

- Defining functions

- Defining your arguments

- Checking arguments and other conditions

- Lazy evaluation

- Lexical scoping

- Distributing your function(s)

- Learning more

---
# Why functions are good

.pull-left[
* A function centralizes a common task to a single, abstract method

* This helps to:

- increase .green[reusability] which...
   - reduces .red[verbosity]
   - reduces .red[repetitiveness]
   - reduces the .red[chance of making an error]
   - reduces .red[instances requiring updates] when changes are needed
   - reduces .red[amount of code to test]
]

.pull-right[

Consider the following from Hadley Wickham’s book _R for Data Science_.  Where's the .red[error]?

```r
df <- tibble::tibble(
 a = rnorm(10),
 b = rnorm(10),
 c = rnorm(10),
 d = rnorm(10)
)

df$a <- (df$a - min(df$a, na.rm = TRUE)) /
 (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))
df$b <- (df$b - min(df$b, na.rm = TRUE)) /
 (max(df$b, na.rm = TRUE) - min(df$b, na.rm = TRUE))
df$c <- (df$c - min(df$c, na.rm = TRUE)) /
 (max(df$c, na.rm = TRUE) - min(df$c, na.rm = TRUE))
df$d <- (df$d - min(df$d, na.rm = TRUE)) /
 (max(df$d, na.rm = TRUE) - min(df$d, na.rm = TRUE))
```

]

---
# Why functions are good

.pull-left[
* A function centralizes a common task to a single, abstract method

* This helps to:

.pull-right[

Consider the following from Hadley Wickham’s book _R for Data Science_.  Where's the .red[error]?

```r
df <- tibble::tibble(
 a = rnorm(10),
 b = rnorm(10),
 c = rnorm(10),
 d = rnorm(10)
)

df$a <- (df$a - min(df$a, na.rm = TRUE)) / 
 (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))
df$b <- (df$b - min(df$b, na.rm = TRUE)) / 
* (max(df$b, na.rm = TRUE) - min(df$a, na.rm = TRUE))
df$c <- (df$c - min(df$c, na.rm = TRUE)) / 
 (max(df$c, na.rm = TRUE) - min(df$c, na.rm = TRUE))
df$d <- (df$d - min(df$d, na.rm = TRUE)) / 
 (max(df$d, na.rm = TRUE) - min(df$d, na.rm = TRUE))
```

]

---
# Prereqs

.center.font200.bold[Base R]

.center.bold[I will use some packages throughout but these are simply for examples. You will be notified but sometimes you just need to learn to base R.]

---
# Key ingredients of a function

.pull-left[
.font140[
1. Name

2. arguments

3. body

4. environment
]
]

.pull-left[

```r
my_fun <- function(arg1, arg2) {
 << body >>
 }
```

]

---
# Key ingredients of a function

.pull-left[
 
.font140[

1. Name

2. arguments

3. body

4. environment
]

]

.pull-right[

```r
pv <- function(fv, r, n = 5) {
 fv / (1 + r)^n
}
```

```r
args(pv)
## function (fv, r, n = 5) 
## NULL
```

```r
body(pv)
## {
##     fv/(1 + r)^n
## }
```

```r
environment(pv)
## <environment: R_GlobalEnv>
```

]

.center[.content-box-gray[Note how the environment is our global environment]]

---
class: yourturn
# Your Turn!

.pull-left[

Identify the arguments, body, and environment of:

1. `read.csv`

2. `dplyr::add_count`

3. `sum`

]

.pull-right[

```r
# #1 read.csv
args(read.csv)
## function (file, header = TRUE, sep = ",", quote = "\"", dec = ".", 
##     fill = TRUE, comment.char = "", ...) 
## NULL

body(read.csv)
## read.table(file = file, header = header, sep = sep, quote = quote, 
##     dec = dec, fill = fill, comment.char = comment.char, ...)

environment(read.csv)
## <environment: namespace:utils>
```

]

---
class: yourturn
# Your Turn!

.pull-left[

Identify the arguments, body, and environment of:

1. `read.csv`

2. `dplyr::add_count`

3. `sum`

]

.pull-right[

```r
# #2 dplyr::add_count
args(dplyr::add_count)
## function (x, ..., wt = NULL, sort = FALSE) 
## NULL

body(dplyr::add_count)
## {
## g <- group_vars(x)
## grouped <- group_by(x, ..., add = TRUE)
## out <- add_tally(grouped, wt = !!enquo(wt), sort = sort)
## grouped_df(out, g)
## }

environment(dplyr::add_count)
## <environment: namespace:dplyr>
```

]

---
class: yourturn
# Your Turn!

.pull-left[

Identify the arguments, body, and environment of:

1. `read.csv`

2. `dplyr::add_count`

3. `sum`

]

.pull-right[

```r
# #3 sum
args(sum)
## function (..., na.rm = FALSE) 
## NULL

*body(sum) # NULL because its a primitive
## NULL

*environment(sum) # NULL because its a primitive
## NULL
```

]

.center[.content-box-gray[`sum()` is a primitive function]]

---
# Defining a function

.pull-left[

* We define a function with `<-` just like we define any other R object

* Use informative names; strive to use verbs when possible

]

.pull-right[

```r
present_value <- function() {
 
}
```

```r
compute_pv <- function() {
 
}
```

]

---
# Defining a function

.pull-left[

* We define a function with `<-` just like we define any other R object

* Use informative names; strive to use verbs when possible

* We can define a function with no arguments; however, this is rarely useful

]

.pull-right[

```r
present_value <- function() {
 
}
```

```r
compute_pv <- function() {
 
}
```

```r
compute_pv <- function() {
 1000 / (1 + 0.05)^10
}

compute_pv()
## [1] 614
```

]

.center[___We want a function that does one thing consistently well, yet provide flexibility in the inputs provided.___]

---
# Defining a function

.pull-left[
.opacity20[
* We define a function with `<-` just like we define any other R object

* Use informative names; strive to use verbs when possible

* We can define a function with no arguments; however, this is rarely useful
]
.font120[
* Consequently, most functions have all key ingredients
]
]

.pull-right[

.opacity20[

```r
present_value <- function() {
 
}
```
]
.opacity20[

```r
compute_pv <- function() {
 
}
```
]
.opacity20[

```r
compute_pv <- function() {
 1000 / (1 + 0.05)^10
}
```
]
.code80[

```r
compute_pv <- function(fv, r, n) {
 fv / (1 + r)^n
}

compute_pv(fv = 1000, r = .05, n = 10)
## [1] 614

compute_pv(fv = 2000, r = .08, n = 5)
## [1] 1361
```
]
]

---
class: yourturn
# Your Turn!

.pull-left[
* Define a function titled ratio that takes arguments `x` and `y` and returns their ratio, `x / y`

* Call `ratio()` with arguments 3 and 4
]
--
.pull-right[

```r
ratio <- function(x, y) {
 x / y
}

ratio(3, 4)
## [1] 0.75
```
]

---
# Calling arguments in different ways

.font120[Many ways to call arguments:]

.pull-left[
.font140[
* Using argument names

]
]

.pull-right[

```r
compute_pv(fv = 1000, r = .05, n = 10)
## [1] 613.9133
```

]

---
# Calling arguments in different ways

.font120[Many ways to call arguments:]

.pull-left[
.font120[
* Using argument names

* Positional matching

]
]

.pull-right[

```r
compute_pv(fv = 1000, r = .05, n = 10)
## [1] 613.9133
```

```r
compute_pv(1000, .05, 10)
## [1] 613.9133
```

]

---
# Calling arguments in different ways

.font120[Many ways to call arguments:]

.pull-left[
.font120[
* Using argument names

* Positional matching

* Must use names if you change order

* ...otherwise error or incorrect computation will occur

]
]

.pull-right[

```r
compute_pv(fv = 1000, r = .05, n = 10)
## [1] 613.9133
```

```r
compute_pv(1000, .05, 10)
## [1] 613.9133
```

```r
compute_pv(r = .05, fv = 1000, n = 10)
## [1] 613.9133
```

```r
compute_pv(.05, 1000, 10)
## [1] 4.950274e-32
```

]

---
# Calling arguments in different ways

.font120[Many ways to call arguments:]

.pull-left[
.font120[
* Using argument names

* Positional matching

* Must use names if you change order

* ...otherwise error or incorrect computation will occur

* missing arguments results in error
]
]

.pull-right[

```r
compute_pv(fv = 1000, r = .05, n = 10)
## [1] 613.9133
```

```r
compute_pv(1000, .05, 10)
## [1] 613.9133
```

```r
compute_pv(r = .05, fv = 1000, n = 10)
## [1] 613.9133
```

```r
compute_pv(.05, 1000, 10)
## [1] 4.950274e-32
```

```r
compute_pv(1000, .05)
## Error in compute_pv(1000, 0.05): argument "n" is missing, with no default
```

]

---
# Ordering your arguments

Ordering arguments in your functions is important:

* pipe (.font120[`%>%`]) operator
   - tidyverse pkgs consistently have data as first argument
   - makes for intuitive syntax & easy flow

* positional matching
   - which args are most vs less important
   - helps dictate where defaults need to go (next slide)

```r
# bad --> rounding to digits is least important parameter
compute_pv <- function(digits, r, n, fv) {
 << body >>
}

# better but could still be improved --> does a user always need to specify digits to round to?
compute_pv <- function(fv, r, n, digits) {
 << body >>
}
```

---
# Setting (.green[good]) defaults

.pull-left[

* As our functions require more inputs...

* many of them can likely do with defaults

* Usually, the first 1-3 arguments are not preset but the rest are

* Good defaults are largely based on
   - methodology best practices (i.e. hyperparameters, removing missing values)
   - industry best practices or current values (i.e. current interest rates)
   - organizational best practices (i.e. golden rules)

]

.pull-right[

```r
compute_pv <- function(fv, r, n, digits = 2) {
 round(fv / (1 + r)^n, digits = digits)
}

# by default, will round to 2 digits
compute_pv(1000, .05, 10)
## [1] 613.91

# to adjust, specify by position
compute_pv(1000, .05, 10, 0)
## [1] 614

# or by name
compute_pv(1000, .05, 10, digits = 0)
## [1] 614
```

]

---
class: yourturn
# Your Turn - Part 1

.pull-left[
* Earlier in these slides you saw the following code duplicated:

```r
(df$a - min(df$a, na.rm = TRUE)) /  
  (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))
```

* Can you write a function called `rescale()` that takes argument `x` and executes this code?

* Test it on the following vector:

```r
set.seed(123)
test_vector <- runif(20, min = 25, max = 40)
```

]
--
.pull-right[

```r
rescale <- function(x){
 rng <- range(x, na.rm = TRUE)
 (x - rng[1]) / (rng[2] - rng[1])
}

rescale(test_vector)
##  [1] 0.2684 0.8158 0.4011 0.9193 0.9821 0.0038 0.5313 0.9296 0.5568 0.4532
## [11] 1.0000 0.4496 0.6947 0.5800 0.0665 0.9377 0.2230 0.0000 0.3125 0.9975
```

]

---
class: yourturn
# Your Turn - Part 2

.pull-left[

* Now add an argument to `rescale()` that allows you to round the output to a specified decimal.

* Set the default to 2.
]

.pull-right[

```r
rescale <- function(x, digits = 2){
 rng <- range(x, na.rm = TRUE)
 scaled <- (x - rng[1]) / (rng[2] - rng[1])
 round(scaled, digits = digits)
}

# default
rescale(test_vector)
##  [1] 0.27 0.82 0.40 0.92 0.98 0.00 0.53 0.93 0.56 0.45 1.00 0.45 0.69 0.58
## [15] 0.07 0.94 0.22 0.00 0.31 1.00

# 3 digits
rescale(test_vector, digits = 3)
##  [1] 0.268 0.816 0.401 0.919 0.982 0.004 0.531 0.930 0.557 0.453 1.000
## [12] 0.450 0.695 0.580 0.067 0.938 0.223 0.000 0.312 0.997
```

]

---
# Checking arguments and other conditions

* We've created a function that allows users to specify their inputs

* But how do we ensure they provide us with the right kind of inputs?

* Or what if we want to be able to provide some sort of feedback to the user?

* There are several ways to signal __conditions__ to function users:

--
   - .font110[`stop()`] & .font110[`stopifnot()`]: signal an .red[error] (no way for a function to continue and execution must stop)
   
--
   
   - .font110[`warning()`]: signal a warning (something has gone wrong but the function has been able to at least partially recover.)
   
--
   
   - .font110[`message()`]: signal an informative message (function works fine but user should be informed of something.)
   
.center[___The most common you will use are .font120[`stop()`] & .font120[`message()`]___]

---
# `stop()` & `stopifnot()`

.scrollable90[
.pull-left[

* The .font120[`stop`] functions are most commonly used to check for proper inputs

* but can be used to .red[stop] the function procedures because the user's environment is not properly established

* Weigh the benefits of highly custom, informative error messages (via `stop`) versus short to the point (via `stopifnot`)

* Execution will stop at the first violation

]

.pull-right[

```r
# stop
compute_pv <- function(fv, r, n, digits = 2) {
 
 if(!is.numeric(fv)) {
* stop("`fv` must be numeric", call. = FALSE)
 }
 
 if(!is.numeric(r)) {
* stop("`r` must be numeric", call. = FALSE)
 }
 
 if(!is.numeric(n)) {
* stop("`n` must be numeric", call. = FALSE)
 }
 
 if(!is.numeric(digits)) {
* stop("`digits` must be numeric", call. = FALSE)
 }
 
 round(fv / (1 + r)^n, digits = digits)
}

compute_pv("1000", ".05", 10)
## Error: `fv` must be numeric
```

```r
# stopifnot
compute_pv <- function(fv, r, n, digits = 2) {
 
* stopifnot(is.numeric(fv), is.numeric(r), is.numeric(n), is.numeric(digits))
 
 round(fv / (1 + r)^n, digits = digits)
}

compute_pv("1000", ".05", 10)
## Error in compute_pv("1000", ".05", 10): is.numeric(fv) is not TRUE
```
]
]

---
# `warning()`

.pull-left[
* `warnings()` are rarely used but...

* can be useful to signal .blue[non-terminating] concerns
   - hyperparameter clashes in a grid search
   - an extremely long compute time and the user may want to re-assess
   - deprecated functions
   - NAs, NULLs, Infs exist, which result in NA output

```r
lubridate::mdy(c("2-1-2019", "1995"))
## Warning: 1 failed to parse.
## [1] "2019-02-01" NA
```

]

.pull-right[

```r
present_value <- function(fv, r, n, digits = 2) {
 
* warning("`present_value()` is deprecated. Use `compute_pv()`", call. = FALSE)
 stopifnot(is.numeric(fv), is.numeric(r), is.numeric(n), is.numeric(digits))
 
 round(fv / (1 + r)^n, digits = digits)
}

present_value(1000, .05, 10)
## Warning: `present_value()` is deprecated. Use `compute_pv()`
## [1] 613.91
```

]

---
# `message()`

.pull-left[
* designed to be informational

* use them to tell the user that you’ve done something on their behalf

* may represent:
]

---
# `message()`

.pull-left[
* designed to be informational

* use them to tell the user that you’ve done something on their behalf

* may represent:
   - default argument that may be non-trivial
   
]

.pull-right[

```r
library(ggplot2)

ggplot(iris, aes(Sepal.Length)) +
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
```

]

---
# `message()`

.pull-left[
* designed to be informational

* use them to tell the user that you’ve done something on their behalf

* may represent:
   - default argument that may be non-trivial
   - how information was interpreted by the function
   
]

.pull-right[

```r
df <- readr::read_csv("data/Month-01.csv")
## Parsed with column specification:
## cols(
## Account_ID = col_integer(),
## Transaction_Timestamp = col_datetime(format = ""),
## Factor_A = col_integer(),
## Factor_B = col_integer(),
## Factor_C = col_character(),
## Factor_D = col_integer(),
## Factor_E = col_character(),
## Response = col_integer(),
## Transaction_Status = col_character(),
## Month = col_character()
## )
```

]

---
# `message()`

.pull-left[
* designed to be informational

* use them to tell the user that you’ve done something on their behalf

* may represent:
   - default argument that may be non-trivial
   - how information was interpreted by the function
   - words of encouragement!
   
]

.pull-right[
 
<img src="images/gitr-example.png" width="663" style="display: block; margin: auto;" />

]

---
# `message()`

.pull-left[
* designed to be informational

* use them to tell the user that you’ve done something on their behalf

* may represent:
   - default argument that may be non-trivial
   - how information was interpreted by the function
   - words of encouragement!
   - or even industry/business best practices

]

.pull-right[

```r
compute_pv <- function(fv, r, n, digits = 2) {
 
 stopifnot(is.numeric(fv), is.numeric(r), is.numeric(n), is.numeric(digits))
 
* if(r < 0.01 || r > 0.25) {
* message("Your interest rate exceeds the normal ranges of 1-25%")
* }
 
 round(fv / (1 + r)^n, digits = digits)
}

compute_pv(1000, .26, 10)
## Your interest rate exceeds the normal ranges of 1-25%
## [1] 99.15
```

]

---
# `message()`

.pull-left[
* designed to be informational

* use them to tell the user that you’ve done something on their behalf

* may represent:
   - default argument that may be non-trivial
   - how information was interpreted by the function
   - words of encouragement!
   - or even industry/business best practices
   
* .blue[Always provide an option to suppress messages]

]

.pull-right[

```r
*compute_pv <- function(fv, r, n, digits = 2, quiet = FALSE) {
 
 stopifnot(is.numeric(fv), is.numeric(r), is.numeric(n), is.numeric(digits))
 
* if(!quiet) {
 if(r < 0.01 || r > 0.25) { 
 message("Your interest rate exceeds the normal ranges of 1-25%") 
 }
 # other messages here...
 }
 
 round(fv / (1 + r)^n, digits = digits)
}

# default provides message
compute_pv(1000, .26, 10)
## Your interest rate exceeds the normal ranges of 1-25%
## [1] 99.15

# user can silence
compute_pv(1000, .26, 10, quiet = TRUE)
## [1] 99.15
```

]

---
class: yourturn
# Your Turn!

.pull-left[

Let's go back to our `rescale()` function:

```r
rescale <- function(x, digits = 2){
 rng <- range(x, na.rm = TRUE)
 scaled <- (x - rng[1]) / (rng[2] - rng[1])
 round(scaled, digits = digits)
}
```
]

---
class: yourturn
# Your Turn!

.scrollable90[
.pull-left[

Let's go back to our `rescale()` function:

```r
rescale <- function(x, digits = 2){
* rng <- range(x, na.rm = TRUE)
 scaled <- (x - rng[1]) / (rng[2] - rng[1])
 round(scaled, digits = digits)
}
```

* Move the `na.rm = TRUE` option to the function arguments so that the user can control whether or not to remove missing values.
* Add a `warning()` that tells the user NAs are present (if they use `na.rm = FALSE`)
* Add a `message()` that tells the user how many NAs were removed
* add a missing value to your test_vector and test out your revised function

```r
test_vector <- c(NA, test_vector)
```

]

.pull-right[

Your results should look something like this:

```r
# results in an error
rescale(letters)
## Error: `x` must be a numeric atomic vector

# results in warning
rescale(test_vector, na.rm = FALSE)
## Warning: There are 1 NAs. Remove them with `na.rm = TRUE`
##  [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

# results in message
rescale(test_vector)
## 1 NAs were removed.
##  [1] 0.27 0.82 0.40 0.92 0.98 0.00 0.53 0.93 0.56 0.45 1.00 0.45 0.69 0.58
## [15] 0.07 0.94 0.22 0.00 0.31 1.00
```

]
]

---
class: yourturn
# Your Turn!
.scrollable90[
.pull-left[

Let's go back to our `rescale()` function:

```r
rescale <- function(x, digits = 2){
* rng <- range(x, na.rm = TRUE)
 scaled <- (x - rng[1]) / (rng[2] - rng[1])
 round(scaled, digits = digits)
}
```

```r
test_vector <- c(NA, test_vector)
```

]

.pull-right[

```r
rescale <- function(x, digits = 2, na.rm = TRUE, quiet = FALSE){
 # make sure x an numeric atomic vector
 if(!is.atomic(x) || !is.numeric(x)) {
 stop("`x` must be a numeric atomic vector", call. = FALSE)
 }
 
 # compute number of missing values
 missing_n <- sum(is.na(x))
 
 # warning if na.rm = FALSE and NAs are present
 if(!na.rm) {
 warning(paste(
 "There are", 
 missing_n,
 "NAs. Remove them with `na.rm = TRUE`"),
 call. = FALSE)
 }
 
 # message to report how many NAs were removed
 if(na.rm) {
 x <- x[!is.na(x)]
 if(!quiet) {
 message(paste(missing_n, "NAs were removed."))
 }
 }
 
 # compute scaling
 rng <- range(x)
 scaled <- (x - rng[1]) / (rng[2] - rng[1])
 round(scaled, digits = digits)
}

# results in an error
rescale(letters)
## Error: `x` must be a numeric atomic vector

# results in warning
rescale(test_vector, na.rm = FALSE)
## Warning: There are 1 NAs. Remove them with `na.rm = TRUE`
##  [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

# results in message
rescale(test_vector)
## 1 NAs were removed.
##  [1] 0.27 0.82 0.40 0.92 0.98 0.00 0.53 0.93 0.56 0.45 1.00 0.45 0.69 0.58
## [15] 0.07 0.94 0.22 0.00 0.31 1.00
```
]
]

---
# Lazy evaluation

.pull-left[
.font120[
* In R, function arguments are .blue[__lazily evaluated: *only evaluated if accessed*__]
]
]

.pull-right[

```r
h01 <- function(x) {
 10
}

h01(x = stop("This is an error!"))
## [1] 10
```
]

---
# Lazy evaluation

.pull-left[
* In R, function arguments are .blue[__lazily evaluated: *only evaluated if accessed*__]

* This provides you with lots of flexibility in when and how to evaluate inputs

]

.pull-right[

```r
h02 <- function(x, y, z) {
 if(missing(z)) {
 sum(x, y)
 } else {
 sum(x, y, z)
 }
}

# z does not exist
h02(x = 1, y = 2)
## [1] 3

# z does exist
h02(x = 1, y = 2, z = 3)
## [1] 6
```
]

---
# Lazy evaluation

.pull-left[
* In R, function arguments are .blue[__lazily evaluated: *only evaluated if accessed*__]

* This provides you with lots of flexibility in when and how to evaluate inputs

* You need to be careful with how you use this, as it even allows you to use arguments outside of your function environment

]

.pull-right[

```r
h03 <- function(x, y) {
 if(exists("z")) {
 x + y + z
 } else {
 x + y
 }
}

# z does not exist
h03(x = 1, y = 2)
## [1] 3

# z exists in global environment
z <- 3
h03(x = 1, y = 2)
## [1] 6
```
]

---
# Lazy evaluation

.pull-left[
* In R, function arguments are .blue[__lazily evaluated: *only evaluated if accessed*__]

* This provides you with lots of flexibility in when and how to evaluate inputs

* You need to be careful with how you use this, as it even allows you to use arguments outside of your function environment

* One of the more common applications is to .blue[use `NULL` to indicate that a variable is not required but can be used if supplied]

]

.pull-right[

```r
*compute_pv <- function(fv, r, n, multiplier = NULL) {
 
 # compute present value
 pv <- round(fv / (1 + r)^n)
 
* # apply multiplier if its supplied
* if(is.null(multiplier)) {
* pv
* } else {
* pv * multiplier
* }
}

# w/o multiplier
compute_pv(1000, .05, 10)
## [1] 614

# with multiplier
compute_pv(1000, .05, 10, multiplier = 1.25)
## [1] 767.5
```
]

---
# Lexical scoping

.pull-left[
* R uses a nested environment structure

* a function has:
   - its own environment
   - plus one or more parent environments
   - always a fresh start
   
* a function will search:
   1. itself
   2. through hierarchical parental environments
   
* a function will never:
   - use a parameter from a parent environment when it exists in the current environment
   - search within another function unless it's its parent
]

.pull-right[

]

---
# Pure functions

.pull-left[

* This is why `h03()` in the previous example worked

* However, we now have an ___impure___ function

* .blue.bold[Pure function]:
   - returns the same result if given the same arguments 
   - does not cause any observable side effects

]

---
# Pure functions

.pull-left[

* This is why `h03()` in the previous example worked

* However, we now have an impure function

* .blue.bold[Pure function]:
   - ___returns the same result if given the same arguments___ 
   - does not cause any observable side effects

.center.bold.red[_Constency is our friend!_]

]

.pull-right[
.code70[

```r
# h03() changes depending on z's value
# outside of the function call
z <- 4
h03(x = 1, y = 2)
## [1] 7

z <- 5
h03(x = 1, y = 2)
## [1] 8

# if your function is dependent on randomness
# allow users to set seed
get_n_samples <- function(x, n, seed = NULL) {
 if (!is.null(seed)) {
 set.seed(seed)
 }
 sample(x, size = n)
}

x <- 1:10

# allows user to always get same results
get_n_samples(x, 5, seed = 123)
## [1] 3 8 4 7 6
get_n_samples(x, 5, seed = 123)
## [1] 3 8 4 7 6
```
]
]

---
# Pure functions

.pull-left[

* This is why `h03()` in the previous example worked

* However, we now have an impure function

* .blue.bold[Pure function]:
   - returns the same result if given the same arguments 
   - ___does not cause any observable side effects___

.center.bold.red[_Don't change things the user doesn't want or expect!_]

]

.pull-right[

```r
get_n_samples <- function(x, n, seed = NULL) {
 if (!is.null(seed)) {
 set.seed(seed)
 }
 
 samples <- sample(x, size = n)
 
* # overwrite x
* x <<- samples
 
* # change options to print 8 decimals
* options(digits = 2)
 
 samples
}

x <- seq(.0001, 1, by = 0.001)

get_n_samples(x, 5, seed = 123)
## [1] 0.29 0.79 0.41 0.88 0.94

x
## [1] 0.29 0.79 0.41 0.88 0.94
```

]

---
# Pure functions

.font140[
Take-away:

* functions should do one thing

* functions should do that one thing consistently

* functions should not do anything else
]

---
# Distributing your function(s)

.pull-left[
Two main ways to distribute your functions:

.font120[
1. using the `source()` function to read an R script 
   - easy to get started
   - all functions/objects in the sourced .R script will be imported to your global environment
   - does not scale well
]   
]

.pull-right[

```r
source("my-functions/custom-stat-functions.R")

ls()
## [1] "skewness"  "std_dev"   "std_error" "variance"

purrr::map_dbl(mtcars, skewness)
##   mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb 
##  0.65 -0.19  0.41  0.77  0.28  0.45  0.39  0.26  0.39  0.56  1.12
```

]

---
# Distributing your function(s)

.pull-left[
Two main ways to distribute your functions:

1. using the `source()` function to read an R script 
   - easy to get started
   - all functions/objects in the sourced .R script will be imported to your global environment
   - does not scale well

2. writing a package (package writing will not be covered)
   - once you have a few functions that serve a common use
   - hard to get started
   - scales beautifully
]

.pull-right[

```r
library(gitr)
lsf.str("package:gitr")
## git_access_token : function ()  
## git_add : function (files = "all", verbose = TRUE)  
## git_clone : function (repo, path = ".", branch = NULL)  
## git_commit : function (message, verbose = TRUE)  
## git_file_size : function (files = "all", sort = TRUE)  
## git_log : function (limit = Inf, detailed = TRUE, author = NULL, stats = FALSE, 
##     show_diff = FALSE, message = NULL, file = NULL)  
## git_pull : function (force = FALSE)  
## git_push : function (force = FALSE)  
## git_rdone : function (message, files = "all", force = FALSE, verbose = TRUE)  
## git_status : function (verbose = FALSE)  
## install_internal_pkg : function (package, github_acct = "Analytics-Tools", host = "github.8451.com/api/v3", 
##     auth_token = NULL, ...)  
## launch_github : function (...)
```

]

---
# Bells & whistles

.pull-left[

We can do ___many___ other things with functions such as:
.font120[
* add progress bars
]
]

.pull-right[

```r
import_monthly_data <- function(files, quiet = FALSE) {
 
 if(!quiet) {
 pb <- progress_estimated(length(files))
 }
 
 special_import <- function(x) {
 if (quiet) {
 suppressMessages(readr::read_csv(x))
 } else {
 pb$tick()$print()
 suppressMessages(readr::read_csv(x))
 }
 }
 
 purrr::map_dfr(paste0("data/", files), special_import)
}

import_monthly_data(list.files("data"))
## |=================================================================|100% ~0 s remaining 
## # A tibble: 698,159 x 10
## Account_ID Transaction_Timest… Factor_A Factor_B Factor_C Factor_D Factor_E Response
## <int> <dttm> <int> <int> <chr> <int> <chr> <int>
## 1 5 2009-01-08 00:16:41 2 6 VI 20 A 1020
## 2 16 2009-01-20 22:40:08 2 6 VI 20 H 1020
## 3 28 2009-01-19 13:24:55 2 6 VI 21 NULL 1020
## 4 40 2009-01-05 16:10:58 2 6 VI 20 H 1020
## 5 62 2009-01-21 19:13:13 2 6 VI 20 B 1020
## 6 64 2009-01-01 18:53:02 7 6 MC 20 NULL 1020
## 7 69 2009-01-08 00:15:19 2 6 VI 20 H 1020
## 8 69 2009-01-19 09:33:22 2 6 VI 20 H 1020
## 9 70 2009-01-05 12:07:47 2 6 VI 20 B 1020
## 10 79 2009-01-07 19:41:18 7 6 MC 20 NULL 1020
## # … with 698,149 more rows, and 2 more variables: Transaction_Status <chr>, Month <chr>
```

]

---
# Bells & whistles

.pull-left[

We can do ___many___ other things with functions such as:

* add progress bars

* take any number of additional arguments with .font120[`...`] to:
   - pass arguments on to internal arguments

]

.pull-right[

```r
plot_strongest_corr <- function(data, response, ...) {
 # find strong correlation
 vars <- setdiff(names(data), response)
 correlations <- sapply(data[, vars], function(x) cor(data[, response], x))
 x <- names(which.max(abs(correlations)))
 
 # plot relationship
 plot(data[, x], data[, response], xlab = x, ylab = response, ...)
}

plot_strongest_corr(mtcars, "mpg", pch = 19, col = "blue", main = "Some cool title")
```

]

---
# Bells & whistles

.pull-left[

We can do ___many___ other things with functions such as:

* add progress bars

* take any number of additional arguments with .font120[`...`] to:
   - pass arguments on to internal arguments
   - or accept unknown number of inputs

]

.pull-right[

```r
select <- function(.data, ...) {
 # convert ... to a list and then to a string
 arguments <- substitute(...())
 vars <- paste(arguments)
 
 # index for 
 .data[, vars]
}

select(mtcars, mpg, wt)
##                     mpg  wt
## Mazda RX4            21 2.6
## Mazda RX4 Wag        21 2.9
## Datsun 710           23 2.3
## Hornet 4 Drive       21 3.2
## Hornet Sportabout    19 3.4
## Valiant              18 3.5
## Duster 360           14 3.6
## Merc 240D            24 3.2
## Merc 230             23 3.1
## Merc 280             19 3.4
## Merc 280C            18 3.4
## Merc 450SE           16 4.1
## Merc 450SL           17 3.7
## Merc 450SLC          15 3.8
## Cadillac Fleetwood   10 5.2
## Lincoln Continental  10 5.4
## Chrysler Imperial    15 5.3
## Fiat 128             32 2.2
## Honda Civic          30 1.6
## Toyota Corolla       34 1.8
## Toyota Corona        22 2.5
## Dodge Challenger     16 3.5
## AMC Javelin          15 3.4
## Camaro Z28           13 3.8
## Pontiac Firebird     19 3.8
## Fiat X1-9            27 1.9
## Porsche 914-2        26 2.1
## Lotus Europa         30 1.5
## Ford Pantera L       16 3.2
## Ferrari Dino         20 2.8
## Maserati Bora        15 3.6
## Volvo 142E           21 2.8
```

]

---
# Bells & whistles

.scrollable90[
.pull-left[

We can do ___many___ other things with functions such as:

* add progress bars

* take any number of additional arguments with .font120[`...`]

* improve efficiency with C++

]

.pull-right[

```r
# create our own sum function
sumR <- function(x) {
 total <- 0
 for (i in seq_along(x)) {
 total <- total + x[i]
 }
 total
}

# use Rcpp to write a sum function in C++
library(Rcpp)
cppFunction('double sumC(NumericVector x) {
 int n = x.size();
 double total = 0;
 for(int i = 0; i < n; ++i) {
 total += x[i];
 }
 return total;
}')

x <- runif(1e3)
bench::mark(
 sum(x),
 sumC(x),
 sumR(x)
)[1:6]
## # A tibble: 3 x 6
## expression min mean median max `itr/sec`
## <chr> <bch:tm> <bch:tm> <bch:tm> <bch:tm> <dbl>
## 1 sum(x) 950ns 1.14µs 968ns 21.39µs 877949.
## 2 sumC(x) 2.12µs 4.42µs 4.54µs 1.09ms 226007.
## 3 sumR(x) 35.82µs 39.34µs 37.27µs 256.51µs 25421.
```
]
]

---
# Bells & whistles

.scrollable90[
.pull-left[

We can do ___many___ other things with functions such as:

* add progress bars

* take any number of additional arguments with .font120[`...`]

* improve efficiency with C++

.center.bold[.content-box-gray[_And so much more!_]]

]

.pull-right[

```r
# create our own sum function
sumR <- function(x) {
 total <- 0
 for (i in seq_along(x)) {
 total <- total + x[i]
 }
 total
}

x <- runif(1e3)
bench::mark(
 sum(x),
 sumC(x),
 sumR(x)
)[1:6]
## # A tibble: 3 x 6
## expression min mean median max `itr/sec`
## <chr> <bch:tm> <bch:tm> <bch:tm> <bch:tm> <dbl>
## 1 sum(x) 950ns 1.27µs 995ns 37.5µs 784899.
## 2 sumC(x) 2.12µs 3.67µs 2.51µs 1.24ms 272609.
## 3 sumR(x) 35.63µs 38.98µs 37.22µs 132.34µs 25655.
```
]
]

---
class: yourturn
# Time permitting

Practice writing the following functions and test them on the given test vector

.pull-left[

```r
# test vector
set.seed(123)
x <- rlnorm(100)

# create a function that computes the variance of x
(1 / (length(x) - 1)) * sum((x - mean(x))^2)

# create a function that computes the standard deviation of x
sqrt((1 / (length(x) - 1)) * sum((x - mean(x))^2))

# create a function that computes the standard error of x
var_x <- (1 / (length(x) - 1)) * sum((x - mean(x))^2)
sqrt(var_x / length(x))

# create a function that computes the skewness of x
n <- length(x)
v <- var(x)
m <- mean(x)
third.moment <- (1 / (n - 2)) * sum((x - m)^3)
third.moment / (var(x)^(3 / 2))
```

]

.pull-right[

```r
# test vector
set.seed(123)
x <- rlnorm(100)

# variance function output
variance(x)
## [1] 3

# stardard dev output
std_dev(x)
## [1] 1.7

# standard error output
std_error(x)
## [1] 0.17

# skewness output
skewness(x)
## [1] 2.3
```

]

---
# Learning more

.center.font150.bold[Advanced R: https://adv-r.hadley.nz/]

---
# Questions