Writing Functions
"Writing good functions is a lifetime journey." - Hadley Wickham
Why functions are good
Anatomy of a function
Defining functions
Defining your arguments
Checking arguments and other conditions
Lazy evaluation
Lexical scoping
Distributing your function(s)
Learning more
A function centralizes a common task to a single, abstract method
This helps to:
A function centralizes a common task to a single, abstract method
This helps to:
Consider the following from Hadley Wickham’s book R for Data Science. Where's the error?
df <- tibble::tibble( a = rnorm(10), b = rnorm(10), c = rnorm(10), d = rnorm(10))df$a <- (df$a - min(df$a, na.rm = TRUE)) / (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))df$b <- (df$b - min(df$b, na.rm = TRUE)) / (max(df$b, na.rm = TRUE) - min(df$b, na.rm = TRUE))df$c <- (df$c - min(df$c, na.rm = TRUE)) / (max(df$c, na.rm = TRUE) - min(df$c, na.rm = TRUE))df$d <- (df$d - min(df$d, na.rm = TRUE)) / (max(df$d, na.rm = TRUE) - min(df$d, na.rm = TRUE))
A function centralizes a common task to a single, abstract method
This helps to:
Consider the following from Hadley Wickham’s book R for Data Science. Where's the error?
df <- tibble::tibble( a = rnorm(10), b = rnorm(10), c = rnorm(10), d = rnorm(10))df$a <- (df$a - min(df$a, na.rm = TRUE)) / (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))df$b <- (df$b - min(df$b, na.rm = TRUE)) / (max(df$b, na.rm = TRUE) - min(df$a, na.rm = TRUE))df$c <- (df$c - min(df$c, na.rm = TRUE)) / (max(df$c, na.rm = TRUE) - min(df$c, na.rm = TRUE))df$d <- (df$d - min(df$d, na.rm = TRUE)) / (max(df$d, na.rm = TRUE) - min(df$d, na.rm = TRUE))
Base R
Base R
Base R
I will use some packages throughout but these are simply for examples. You will be notified but sometimes you just need to learn to base R.
Name
arguments
body
environment
my_fun <- function(arg1, arg2) { << body >> }
Name
arguments
body
environment
pv <- function(fv, r, n = 5) { fv / (1 + r)^n}
args(pv)## function (fv, r, n = 5) ## NULL
body(pv)## {## fv/(1 + r)^n## }
environment(pv)## <environment: R_GlobalEnv>
Name
arguments
body
environment
pv <- function(fv, r, n = 5) { fv / (1 + r)^n}
args(pv)## function (fv, r, n = 5) ## NULL
body(pv)## {## fv/(1 + r)^n## }
environment(pv)## <environment: R_GlobalEnv>
Note how the environment is our global environment
Identify the arguments, body, and environment of:
read.csv
dplyr::add_count
sum
Identify the arguments, body, and environment of:
read.csv
dplyr::add_count
sum
# #1 read.csvargs(read.csv)## function (file, header = TRUE, sep = ",", quote = "\"", dec = ".", ## fill = TRUE, comment.char = "", ...) ## NULLbody(read.csv)## read.table(file = file, header = header, sep = sep, quote = quote, ## dec = dec, fill = fill, comment.char = comment.char, ...)environment(read.csv)## <environment: namespace:utils>
Identify the arguments, body, and environment of:
read.csv
dplyr::add_count
sum
# #2 dplyr::add_countargs(dplyr::add_count)## function (x, ..., wt = NULL, sort = FALSE) ## NULLbody(dplyr::add_count)## {## g <- group_vars(x)## grouped <- group_by(x, ..., add = TRUE)## out <- add_tally(grouped, wt = !!enquo(wt), sort = sort)## grouped_df(out, g)## }environment(dplyr::add_count)## <environment: namespace:dplyr>
Identify the arguments, body, and environment of:
read.csv
dplyr::add_count
sum
# #3 sumargs(sum)## function (..., na.rm = FALSE) ## NULLbody(sum) # NULL because its a primitive## NULLenvironment(sum) # NULL because its a primitive## NULL
sum()
is a primitive function
We define a function with <-
just like we define any other R object
Use informative names; strive to use verbs when possible
present_value <- function() {}
compute_pv <- function() {}
We define a function with <-
just like we define any other R object
Use informative names; strive to use verbs when possible
We can define a function with no arguments; however, this is rarely useful
present_value <- function() {}
compute_pv <- function() {}
compute_pv <- function() { 1000 / (1 + 0.05)^10}compute_pv()## [1] 614
We define a function with <-
just like we define any other R object
Use informative names; strive to use verbs when possible
We can define a function with no arguments; however, this is rarely useful
present_value <- function() {}
compute_pv <- function() {}
compute_pv <- function() { 1000 / (1 + 0.05)^10}compute_pv()## [1] 614
We want a function that does one thing consistently well, yet provide flexibility in the inputs provided.
We define a function with <-
just like we define any other R object
Use informative names; strive to use verbs when possible
We can define a function with no arguments; however, this is rarely useful
present_value <- function() {}
compute_pv <- function() {}
compute_pv <- function() { 1000 / (1 + 0.05)^10}
compute_pv <- function(fv, r, n) { fv / (1 + r)^n}compute_pv(fv = 1000, r = .05, n = 10)## [1] 614compute_pv(fv = 2000, r = .08, n = 5)## [1] 1361
Define a function titled ratio that takes arguments x
and y
and returns their ratio, x / y
Call ratio()
with arguments 3 and 4
Define a function titled ratio that takes arguments x
and y
and returns their ratio, x / y
Call ratio()
with arguments 3 and 4
ratio <- function(x, y) { x / y}ratio(3, 4)## [1] 0.75
Many ways to call arguments:
compute_pv(fv = 1000, r = .05, n = 10)## [1] 613.9133
Many ways to call arguments:
Using argument names
Positional matching
compute_pv(fv = 1000, r = .05, n = 10)## [1] 613.9133
compute_pv(1000, .05, 10)## [1] 613.9133
Many ways to call arguments:
Using argument names
Positional matching
Must use names if you change order
...otherwise error or incorrect computation will occur
compute_pv(fv = 1000, r = .05, n = 10)## [1] 613.9133
compute_pv(1000, .05, 10)## [1] 613.9133
compute_pv(r = .05, fv = 1000, n = 10)## [1] 613.9133
compute_pv(.05, 1000, 10)## [1] 4.950274e-32
Many ways to call arguments:
Using argument names
Positional matching
Must use names if you change order
...otherwise error or incorrect computation will occur
missing arguments results in error
compute_pv(fv = 1000, r = .05, n = 10)## [1] 613.9133
compute_pv(1000, .05, 10)## [1] 613.9133
compute_pv(r = .05, fv = 1000, n = 10)## [1] 613.9133
compute_pv(.05, 1000, 10)## [1] 4.950274e-32
compute_pv(1000, .05)## Error in compute_pv(1000, 0.05): argument "n" is missing, with no default
Ordering arguments in your functions is important:
pipe (%>%
) operator
positional matching
# bad --> rounding to digits is least important parametercompute_pv <- function(digits, r, n, fv) { << body >>}# better but could still be improved --> does a user always need to specify digits to round to?compute_pv <- function(fv, r, n, digits) { << body >>}
As our functions require more inputs...
many of them can likely do with defaults
Usually, the first 1-3 arguments are not preset but the rest are
Good defaults are largely based on
compute_pv <- function(fv, r, n, digits = 2) { round(fv / (1 + r)^n, digits = digits)}# by default, will round to 2 digitscompute_pv(1000, .05, 10)## [1] 613.91# to adjust, specify by positioncompute_pv(1000, .05, 10, 0)## [1] 614# or by namecompute_pv(1000, .05, 10, digits = 0)## [1] 614
(df$a - min(df$a, na.rm = TRUE)) / (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))
Can you write a function called rescale()
that takes argument x
and executes this code?
Test it on the following vector:
set.seed(123)test_vector <- runif(20, min = 25, max = 40)
(df$a - min(df$a, na.rm = TRUE)) / (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))
Can you write a function called rescale()
that takes argument x
and executes this code?
Test it on the following vector:
set.seed(123)test_vector <- runif(20, min = 25, max = 40)
rescale <- function(x){ rng <- range(x, na.rm = TRUE) (x - rng[1]) / (rng[2] - rng[1])}rescale(test_vector)## [1] 0.2684 0.8158 0.4011 0.9193 0.9821 0.0038 0.5313 0.9296 0.5568 0.4532## [11] 1.0000 0.4496 0.6947 0.5800 0.0665 0.9377 0.2230 0.0000 0.3125 0.9975
Now add an argument to rescale()
that allows you to round the output to a specified decimal.
Set the default to 2.
Now add an argument to rescale()
that allows you to round the output to a specified decimal.
Set the default to 2.
rescale <- function(x, digits = 2){ rng <- range(x, na.rm = TRUE) scaled <- (x - rng[1]) / (rng[2] - rng[1]) round(scaled, digits = digits)}# defaultrescale(test_vector)## [1] 0.27 0.82 0.40 0.92 0.98 0.00 0.53 0.93 0.56 0.45 1.00 0.45 0.69 0.58## [15] 0.07 0.94 0.22 0.00 0.31 1.00# 3 digitsrescale(test_vector, digits = 3)## [1] 0.268 0.816 0.401 0.919 0.982 0.004 0.531 0.930 0.557 0.453 1.000## [12] 0.450 0.695 0.580 0.067 0.938 0.223 0.000 0.312 0.997
We've created a function that allows users to specify their inputs
But how do we ensure they provide us with the right kind of inputs?
Or what if we want to be able to provide some sort of feedback to the user?
There are several ways to signal conditions to function users:
We've created a function that allows users to specify their inputs
But how do we ensure they provide us with the right kind of inputs?
Or what if we want to be able to provide some sort of feedback to the user?
There are several ways to signal conditions to function users:
stop()
& stopifnot()
: signal an error (no way for a function to continue and execution must stop)We've created a function that allows users to specify their inputs
But how do we ensure they provide us with the right kind of inputs?
Or what if we want to be able to provide some sort of feedback to the user?
There are several ways to signal conditions to function users:
stop()
& stopifnot()
: signal an error (no way for a function to continue and execution must stop)
warning()
: signal a warning (something has gone wrong but the function has been able to at least partially recover.)
We've created a function that allows users to specify their inputs
But how do we ensure they provide us with the right kind of inputs?
Or what if we want to be able to provide some sort of feedback to the user?
There are several ways to signal conditions to function users:
stop()
& stopifnot()
: signal an error (no way for a function to continue and execution must stop)
warning()
: signal a warning (something has gone wrong but the function has been able to at least partially recover.)
message()
: signal an informative message (function works fine but user should be informed of something.)
The most common you will use are stop()
& message()
stop()
& stopifnot()
The stop
functions are most commonly used to check for proper inputs
but can be used to stop the function procedures because the user's environment is not properly established
Weigh the benefits of highly custom, informative error messages (via stop
) versus short to the point (via stopifnot
)
Execution will stop at the first violation
# stopcompute_pv <- function(fv, r, n, digits = 2) { if(!is.numeric(fv)) { stop("`fv` must be numeric", call. = FALSE) } if(!is.numeric(r)) { stop("`r` must be numeric", call. = FALSE) } if(!is.numeric(n)) { stop("`n` must be numeric", call. = FALSE) } if(!is.numeric(digits)) { stop("`digits` must be numeric", call. = FALSE) } round(fv / (1 + r)^n, digits = digits)}compute_pv("1000", ".05", 10)## Error: `fv` must be numeric
# stopifnotcompute_pv <- function(fv, r, n, digits = 2) { stopifnot(is.numeric(fv), is.numeric(r), is.numeric(n), is.numeric(digits)) round(fv / (1 + r)^n, digits = digits)}compute_pv("1000", ".05", 10)## Error in compute_pv("1000", ".05", 10): is.numeric(fv) is not TRUE
warning()
warnings()
are rarely used but...
can be useful to signal non-terminating concerns
lubridate::mdy(c("2-1-2019", "1995"))## Warning: 1 failed to parse.## [1] "2019-02-01" NA
warning()
warnings()
are rarely used but...
can be useful to signal non-terminating concerns
lubridate::mdy(c("2-1-2019", "1995"))## Warning: 1 failed to parse.## [1] "2019-02-01" NA
present_value <- function(fv, r, n, digits = 2) { warning("`present_value()` is deprecated. Use `compute_pv()`", call. = FALSE) stopifnot(is.numeric(fv), is.numeric(r), is.numeric(n), is.numeric(digits)) round(fv / (1 + r)^n, digits = digits)}present_value(1000, .05, 10)## Warning: `present_value()` is deprecated. Use `compute_pv()`## [1] 613.91
message()
designed to be informational
use them to tell the user that you’ve done something on their behalf
may represent:
message()
designed to be informational
use them to tell the user that you’ve done something on their behalf
may represent:
library(ggplot2)ggplot(iris, aes(Sepal.Length)) + geom_histogram()## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
message()
designed to be informational
use them to tell the user that you’ve done something on their behalf
may represent:
df <- readr::read_csv("data/Month-01.csv")## Parsed with column specification:## cols(## Account_ID = col_integer(),## Transaction_Timestamp = col_datetime(format = ""),## Factor_A = col_integer(),## Factor_B = col_integer(),## Factor_C = col_character(),## Factor_D = col_integer(),## Factor_E = col_character(),## Response = col_integer(),## Transaction_Status = col_character(),## Month = col_character()## )
message()
designed to be informational
use them to tell the user that you’ve done something on their behalf
may represent:
message()
designed to be informational
use them to tell the user that you’ve done something on their behalf
may represent:
compute_pv <- function(fv, r, n, digits = 2) { stopifnot(is.numeric(fv), is.numeric(r), is.numeric(n), is.numeric(digits)) if(r < 0.01 || r > 0.25) { message("Your interest rate exceeds the normal ranges of 1-25%") } round(fv / (1 + r)^n, digits = digits)}compute_pv(1000, .26, 10)## Your interest rate exceeds the normal ranges of 1-25%## [1] 99.15
message()
designed to be informational
use them to tell the user that you’ve done something on their behalf
may represent:
Always provide an option to suppress messages
compute_pv <- function(fv, r, n, digits = 2, quiet = FALSE) { stopifnot(is.numeric(fv), is.numeric(r), is.numeric(n), is.numeric(digits)) if(!quiet) { if(r < 0.01 || r > 0.25) { message("Your interest rate exceeds the normal ranges of 1-25%") } # other messages here... } round(fv / (1 + r)^n, digits = digits)}# default provides messagecompute_pv(1000, .26, 10)## Your interest rate exceeds the normal ranges of 1-25%## [1] 99.15# user can silencecompute_pv(1000, .26, 10, quiet = TRUE)## [1] 99.15
Let's go back to our rescale()
function:
rescale <- function(x, digits = 2){ rng <- range(x, na.rm = TRUE) scaled <- (x - rng[1]) / (rng[2] - rng[1]) round(scaled, digits = digits)}
Let's go back to our rescale()
function:
rescale <- function(x, digits = 2){ rng <- range(x, na.rm = TRUE) scaled <- (x - rng[1]) / (rng[2] - rng[1]) round(scaled, digits = digits)}
na.rm = TRUE
option to the function arguments so that the user can control whether or not to remove missing values.warning()
that tells the user NAs are present (if they use na.rm = FALSE
)message()
that tells the user how many NAs were removedtest_vector <- c(NA, test_vector)
Your results should look something like this:
# results in an errorrescale(letters)## Error: `x` must be a numeric atomic vector# results in warningrescale(test_vector, na.rm = FALSE)## Warning: There are 1 NAs. Remove them with `na.rm = TRUE`## [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA# results in messagerescale(test_vector)## 1 NAs were removed.## [1] 0.27 0.82 0.40 0.92 0.98 0.00 0.53 0.93 0.56 0.45 1.00 0.45 0.69 0.58## [15] 0.07 0.94 0.22 0.00 0.31 1.00
Let's go back to our rescale()
function:
rescale <- function(x, digits = 2){ rng <- range(x, na.rm = TRUE) scaled <- (x - rng[1]) / (rng[2] - rng[1]) round(scaled, digits = digits)}
na.rm = TRUE
option to the function arguments so that the user can control whether or not to remove missing values.warning()
that tells the user NAs are present (if they use na.rm = FALSE
)message()
that tells the user how many NAs were removedtest_vector <- c(NA, test_vector)
rescale <- function(x, digits = 2, na.rm = TRUE, quiet = FALSE){ # make sure x an numeric atomic vector if(!is.atomic(x) || !is.numeric(x)) { stop("`x` must be a numeric atomic vector", call. = FALSE) } # compute number of missing values missing_n <- sum(is.na(x)) # warning if na.rm = FALSE and NAs are present if(!na.rm) { warning(paste( "There are", missing_n, "NAs. Remove them with `na.rm = TRUE`"), call. = FALSE) } # message to report how many NAs were removed if(na.rm) { x <- x[!is.na(x)] if(!quiet) { message(paste(missing_n, "NAs were removed.")) } } # compute scaling rng <- range(x) scaled <- (x - rng[1]) / (rng[2] - rng[1]) round(scaled, digits = digits)}# results in an errorrescale(letters)## Error: `x` must be a numeric atomic vector# results in warningrescale(test_vector, na.rm = FALSE)## Warning: There are 1 NAs. Remove them with `na.rm = TRUE`## [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA# results in messagerescale(test_vector)## 1 NAs were removed.## [1] 0.27 0.82 0.40 0.92 0.98 0.00 0.53 0.93 0.56 0.45 1.00 0.45 0.69 0.58## [15] 0.07 0.94 0.22 0.00 0.31 1.00
h01 <- function(x) { 10}h01(x = stop("This is an error!"))## [1] 10
In R, function arguments are lazily evaluated: only evaluated if accessed
This provides you with lots of flexibility in when and how to evaluate inputs
h02 <- function(x, y, z) { if(missing(z)) { sum(x, y) } else { sum(x, y, z) }}# z does not existh02(x = 1, y = 2)## [1] 3# z does existh02(x = 1, y = 2, z = 3)## [1] 6
In R, function arguments are lazily evaluated: only evaluated if accessed
This provides you with lots of flexibility in when and how to evaluate inputs
You need to be careful with how you use this, as it even allows you to use arguments outside of your function environment
h03 <- function(x, y) { if(exists("z")) { x + y + z } else { x + y }}# z does not existh03(x = 1, y = 2)## [1] 3# z exists in global environmentz <- 3h03(x = 1, y = 2)## [1] 6
In R, function arguments are lazily evaluated: only evaluated if accessed
This provides you with lots of flexibility in when and how to evaluate inputs
You need to be careful with how you use this, as it even allows you to use arguments outside of your function environment
One of the more common applications is to use NULL
to indicate that a variable is not required but can be used if supplied
compute_pv <- function(fv, r, n, multiplier = NULL) { # compute present value pv <- round(fv / (1 + r)^n) # apply multiplier if its supplied if(is.null(multiplier)) { pv } else { pv * multiplier }}# w/o multipliercompute_pv(1000, .05, 10)## [1] 614# with multipliercompute_pv(1000, .05, 10, multiplier = 1.25)## [1] 767.5
R uses a nested environment structure
a function has:
a function will search:
a function will never:
This is why h03()
in the previous example worked
However, we now have an impure function
Pure function:
This is why h03()
in the previous example worked
However, we now have an impure function
Pure function:
Constency is our friend!
# h03() changes depending on z's value# outside of the function callz <- 4h03(x = 1, y = 2)## [1] 7z <- 5h03(x = 1, y = 2)## [1] 8# if your function is dependent on randomness# allow users to set seedget_n_samples <- function(x, n, seed = NULL) { if (!is.null(seed)) { set.seed(seed) } sample(x, size = n)}x <- 1:10# allows user to always get same resultsget_n_samples(x, 5, seed = 123)## [1] 3 8 4 7 6get_n_samples(x, 5, seed = 123)## [1] 3 8 4 7 6
This is why h03()
in the previous example worked
However, we now have an impure function
Pure function:
Don't change things the user doesn't want or expect!
get_n_samples <- function(x, n, seed = NULL) { if (!is.null(seed)) { set.seed(seed) } samples <- sample(x, size = n) # overwrite x x <<- samples # change options to print 8 decimals options(digits = 2) samples}x <- seq(.0001, 1, by = 0.001)get_n_samples(x, 5, seed = 123)## [1] 0.29 0.79 0.41 0.88 0.94x## [1] 0.29 0.79 0.41 0.88 0.94
Take-away:
functions should do one thing
functions should do that one thing consistently
functions should not do anything else
Two main ways to distribute your functions:
source()
function to read an R script source("my-functions/custom-stat-functions.R")ls()## [1] "skewness" "std_dev" "std_error" "variance"purrr::map_dbl(mtcars, skewness)## mpg cyl disp hp drat wt qsec vs am gear carb ## 0.65 -0.19 0.41 0.77 0.28 0.45 0.39 0.26 0.39 0.56 1.12
Two main ways to distribute your functions:
using the source()
function to read an R script
writing a package (package writing will not be covered)
library(gitr)lsf.str("package:gitr")## git_access_token : function () ## git_add : function (files = "all", verbose = TRUE) ## git_clone : function (repo, path = ".", branch = NULL) ## git_commit : function (message, verbose = TRUE) ## git_file_size : function (files = "all", sort = TRUE) ## git_log : function (limit = Inf, detailed = TRUE, author = NULL, stats = FALSE, ## show_diff = FALSE, message = NULL, file = NULL) ## git_pull : function (force = FALSE) ## git_push : function (force = FALSE) ## git_rdone : function (message, files = "all", force = FALSE, verbose = TRUE) ## git_status : function (verbose = FALSE) ## install_internal_pkg : function (package, github_acct = "Analytics-Tools", host = "github.8451.com/api/v3", ## auth_token = NULL, ...) ## launch_github : function (...)
We can do many other things with functions such as:
import_monthly_data <- function(files, quiet = FALSE) { if(!quiet) { pb <- progress_estimated(length(files)) } special_import <- function(x) { if (quiet) { suppressMessages(readr::read_csv(x)) } else { pb$tick()$print() suppressMessages(readr::read_csv(x)) } } purrr::map_dfr(paste0("data/", files), special_import)}import_monthly_data(list.files("data"))## |=================================================================|100% ~0 s remaining ## # A tibble: 698,159 x 10## Account_ID Transaction_Timest… Factor_A Factor_B Factor_C Factor_D Factor_E Response## <int> <dttm> <int> <int> <chr> <int> <chr> <int>## 1 5 2009-01-08 00:16:41 2 6 VI 20 A 1020## 2 16 2009-01-20 22:40:08 2 6 VI 20 H 1020## 3 28 2009-01-19 13:24:55 2 6 VI 21 NULL 1020## 4 40 2009-01-05 16:10:58 2 6 VI 20 H 1020## 5 62 2009-01-21 19:13:13 2 6 VI 20 B 1020## 6 64 2009-01-01 18:53:02 7 6 MC 20 NULL 1020## 7 69 2009-01-08 00:15:19 2 6 VI 20 H 1020## 8 69 2009-01-19 09:33:22 2 6 VI 20 H 1020## 9 70 2009-01-05 12:07:47 2 6 VI 20 B 1020## 10 79 2009-01-07 19:41:18 7 6 MC 20 NULL 1020## # … with 698,149 more rows, and 2 more variables: Transaction_Status <chr>, Month <chr>
We can do many other things with functions such as:
add progress bars
take any number of additional arguments with ...
to:
plot_strongest_corr <- function(data, response, ...) { # find strong correlation vars <- setdiff(names(data), response) correlations <- sapply(data[, vars], function(x) cor(data[, response], x)) x <- names(which.max(abs(correlations))) # plot relationship plot(data[, x], data[, response], xlab = x, ylab = response, ...)}plot_strongest_corr(mtcars, "mpg", pch = 19, col = "blue", main = "Some cool title")
We can do many other things with functions such as:
add progress bars
take any number of additional arguments with ...
to:
select <- function(.data, ...) { # convert ... to a list and then to a string arguments <- substitute(...()) vars <- paste(arguments) # index for .data[, vars]}select(mtcars, mpg, wt)## mpg wt## Mazda RX4 21 2.6## Mazda RX4 Wag 21 2.9## Datsun 710 23 2.3## Hornet 4 Drive 21 3.2## Hornet Sportabout 19 3.4## Valiant 18 3.5## Duster 360 14 3.6## Merc 240D 24 3.2## Merc 230 23 3.1## Merc 280 19 3.4## Merc 280C 18 3.4## Merc 450SE 16 4.1## Merc 450SL 17 3.7## Merc 450SLC 15 3.8## Cadillac Fleetwood 10 5.2## Lincoln Continental 10 5.4## Chrysler Imperial 15 5.3## Fiat 128 32 2.2## Honda Civic 30 1.6## Toyota Corolla 34 1.8## Toyota Corona 22 2.5## Dodge Challenger 16 3.5## AMC Javelin 15 3.4## Camaro Z28 13 3.8## Pontiac Firebird 19 3.8## Fiat X1-9 27 1.9## Porsche 914-2 26 2.1## Lotus Europa 30 1.5## Ford Pantera L 16 3.2## Ferrari Dino 20 2.8## Maserati Bora 15 3.6## Volvo 142E 21 2.8
We can do many other things with functions such as:
add progress bars
take any number of additional arguments with ...
improve efficiency with C++
# create our own sum functionsumR <- function(x) { total <- 0 for (i in seq_along(x)) { total <- total + x[i] } total}# use Rcpp to write a sum function in C++library(Rcpp)cppFunction('double sumC(NumericVector x) { int n = x.size(); double total = 0; for(int i = 0; i < n; ++i) { total += x[i]; } return total;}')x <- runif(1e3)bench::mark( sum(x), sumC(x), sumR(x))[1:6]## # A tibble: 3 x 6## expression min mean median max `itr/sec`## <chr> <bch:tm> <bch:tm> <bch:tm> <bch:tm> <dbl>## 1 sum(x) 950ns 1.14µs 968ns 21.39µs 877949.## 2 sumC(x) 2.12µs 4.42µs 4.54µs 1.09ms 226007.## 3 sumR(x) 35.82µs 39.34µs 37.27µs 256.51µs 25421.
We can do many other things with functions such as:
add progress bars
take any number of additional arguments with ...
improve efficiency with C++
And so much more!
# create our own sum functionsumR <- function(x) { total <- 0 for (i in seq_along(x)) { total <- total + x[i] } total}# use Rcpp to write a sum function in C++library(Rcpp)cppFunction('double sumC(NumericVector x) { int n = x.size(); double total = 0; for(int i = 0; i < n; ++i) { total += x[i]; } return total;}')x <- runif(1e3)bench::mark( sum(x), sumC(x), sumR(x))[1:6]## # A tibble: 3 x 6## expression min mean median max `itr/sec`## <chr> <bch:tm> <bch:tm> <bch:tm> <bch:tm> <dbl>## 1 sum(x) 950ns 1.27µs 995ns 37.5µs 784899.## 2 sumC(x) 2.12µs 3.67µs 2.51µs 1.24ms 272609.## 3 sumR(x) 35.63µs 38.98µs 37.22µs 132.34µs 25655.
Practice writing the following functions and test them on the given test vector
# test vectorset.seed(123)x <- rlnorm(100)# create a function that computes the variance of x(1 / (length(x) - 1)) * sum((x - mean(x))^2)# create a function that computes the standard deviation of xsqrt((1 / (length(x) - 1)) * sum((x - mean(x))^2))# create a function that computes the standard error of xvar_x <- (1 / (length(x) - 1)) * sum((x - mean(x))^2)sqrt(var_x / length(x))# create a function that computes the skewness of xn <- length(x)v <- var(x)m <- mean(x)third.moment <- (1 / (n - 2)) * sum((x - m)^3)third.moment / (var(x)^(3 / 2))
# test vectorset.seed(123)x <- rlnorm(100)# variance function outputvariance(x)## [1] 3# stardard dev outputstd_dev(x)## [1] 1.7# standard error outputstd_error(x)## [1] 0.17# skewness outputskewness(x)## [1] 2.3
Why functions are good
Anatomy of a function
Defining functions
Defining your arguments
Checking arguments and other conditions
Lazy evaluation
Lexical scoping
Distributing your function(s)
Learning more
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |