# Dealing with Numbers

In this section you will learn the basics of working with numbers in R. This includes understanding

## Numeric Types (integer vs. double)

The two most common numeric classes used in R are integer and double (for double precision floating point numbers). R automatically converts between these two classes when needed for mathematical purposes. As a result, it’s feasible to use R and perform analyses for years without specifying these differences.

### Creating Integer and Double Vectors

By default, when you create a numeric vector using the c() function it will produce a vector of double precision numeric values. To create a vector of integers using c() you must specify explicity by placing an L directly after each number.

# create a string of double-precision values
dbl_var <- c(1, 2.5, 4.5)
dbl_var
## [1] 1.0 2.5 4.5

# placing an L after the values creates a string of integers
int_var <- c(1L, 6L, 10L)
int_var
## [1]  1  6 10


### Checking for Numeric Type

To check whether a vector is made up of integer or double values:

# identifies the vector type (double, integer, logical, or character)
typeof(dbl_var)
## [1] "double"

typeof(int_var)
## [1] "integer"


### Converting Between Integer and Double Values

By default, if you read in data that has no decimal points or you create numeric values using the x <- 1:10 method the numeric values will be coded as integer. If you want to change a double to an integer or vice versa you can specify one of the following:

# converts integers to double-precision values
as.double(int_var)
## [1]  1  6 10

# identical to as.double()
as.numeric(int_var)
## [1]  1  6 10

# converts doubles to integers
as.integer(dbl_var)
## [1] 1 2 4


## Generating Non-random Numbers

There are a few R operators and functions that are especially useful for creating vectors of non-random numbers. These functions provide multiple ways for generating sequences of numbers.

### Specifing Numbers within a Sequence

To explicitly specify numbers in a sequence you can use the colon : operator to specify all integers between two specified numbers or the combine c() function to explicity specify all numbers in the sequence.

# create a vector of integers between 1 and 10
1:10
##  [1]  1  2  3  4  5  6  7  8  9 10

# create a vector consisting of 1, 5, and 10
c(1, 5, 10)
## [1]  1  5 10

# save the vector of integers between 1 and 10 as object x
x <- 1:10
x
##  [1]  1  2  3  4  5  6  7  8  9 10


### Generating Regular Sequences

A generalization of : is the seq() function, which generates a sequence of numbers with a specified arithmetic progression.

# generate a sequence of numbers from 1 to 21 by increments of 2
seq(from = 1, to = 21, by = 2)
##  [1]  1  3  5  7  9 11 13 15 17 19 21

# generate a sequence of numbers from 1 to 21 that has 15 equal incremented
# numbers
seq(0, 21, length.out = 15)
##  [1]  0.0  1.5  3.0  4.5  6.0  7.5  9.0 10.5 12.0 13.5 15.0 16.5 18.0 19.5
## [15] 21.0


### Generating Repeated Sequences

The rep() function allows us to conveniently repeat specified constants into long vectors. This function allows for collated and non-collated repetitions.

# replicates the values in x a specified number of times in a collated fashion
rep(1:4, times = 2)
## [1] 1 2 3 4 1 2 3 4

# replicates the values in x in an uncollated fashion
rep(1:4, each = 2)
## [1] 1 1 2 2 3 3 4 4


## Generating Random Numbers

Simulation is a common practice in data analysis. Sometimes your analysis requires the implementation of a statistical procedure that requires random number generation or sampling (i.e. Monte Carlo simulation, bootstrap sampling, etc). R comes with a set of pseudo-random number generators that allow you to simulate the most common probability distributions such as:

### Uniform numbers

To generate random numbers from a uniform distribution you can use the runif() function. Alternatively, you can use sample() to take a random sample using with or without replacements.

# generate n random numbers between the default values of 0 and 1
runif(n)

# generate n random numbers between 0 and 25
runif(n, min = 0, max = 25)

# generate n random numbers between 0 and 25 (with replacement)
sample(0:25, n, replace = TRUE)

# generate n random numbers between 0 and 25 (without replacement)
sample(0:25, n, replace = FALSE)


For example, to generate 25 random numbers between the values 0 and 10:

runif(25, min = 0, max = 10)
##  [1] 9.2494720 1.0276421 9.6061007 7.4582455 8.3666868 0.8090925 7.5638221
##  [8] 4.2810155 2.5850736 9.7962788 6.1705894 0.7037997 9.5056240 4.7589622
## [15] 7.9750129 5.3932881 5.1624935 1.2704098 8.7064680 8.6649293 0.1049461
## [22] 1.4827342 2.7337917 7.5236131 3.9803653


For each non-uniform probability distribution there are four primary functions available to generate random numbers, density (aka probability mass function), cumulative density, and quantiles. The prefixes for these functions are:

• r: random number generation
• d: density or probability mass function
• p: cumulative distribution
• q: quantiles

### Normal Distribution Numbers

The normal (or Gaussian) distribution is the most common and well know distribution. Within R, the normal distribution functions are written as norm().

# generate n random numbers from a normal distribution with given mean & st. dev.
rnorm(n, mean = 0, sd = 1)

# generate CDF probabilities for value(s) in vector q
pnorm(q, mean = 0, sd = 1)

# generate quantile for probabilities in vector p
qnorm(p, mean = 0, sd = 1)

# generate density function probabilites for value(s) in vector x
dnorm(x, mean = 0, sd = 1)


For example, to generate 25 random numbers from a normal distribution with mean = 100 and standard deviation = 15:

x <- rnorm(25, mean = 100, sd = 15)
x
##  [1] 107.84214 101.10742  73.67151 113.94035 108.47938  77.48445  73.02016
##  [8]  81.02323 101.64169 112.67715 105.28478  92.35393  85.96284 108.83169
## [15]  88.71057 115.13657 141.69830  99.91198 118.69664 110.61667  83.20282
## [22] 113.91008 109.10879  93.45276 109.01996

summary(x)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##   73.02   88.71  105.30  101.10  110.60  141.70


You can also pass a vector of values. For instance, say you want to know the CDF probabilities for each value in the vector x created above:

pnorm(x, mean = 100, sd = 15)
##  [1] 0.69944664 0.52942643 0.03960976 0.82364789 0.71406244 0.06667308
##  [7] 0.03603657 0.10291447 0.54357552 0.80098468 0.63770038 0.30511760
## [13] 0.17468526 0.72199534 0.22583658 0.84353778 0.99728111 0.49765904
## [19] 0.89369904 0.76045844 0.13139693 0.82312464 0.72815841 0.33124331
## [25] 0.72619004


### Binomial Distribution Numbers

This is conventionally interpreted as the number of successes in size = x trials and with prob = p probability of success:

# generate a vector of length n displaying the number of successes from a trial
# size = 100 with a probabilty of success = 0.5
rbinom(n, size = 100, prob = 0.5)

# generate CDF probabilities for value(s) in vector q
pbinom(q, size = 100, prob = 0.5)

# generate quantile for probabilities in vector p
qbinom(p, size = 100, prob = 0.5)

# generate density function probabilites for value(s) in vector x
dbinom(x, size = 100, prob = 0.5)


### Poisson Distribution Numbers

The Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occuring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event.

# generate a vector of length n displaying the random number of events occuring
# when lambda (mean rate) equals 4.
rpois(n, lambda = 4)

# generate CDF probabilities for value(s) in vector q when lambda (mean rate)
# equals 4.
ppois(q, lambda = 4)

# generate quantile for probabilities in vector p when lambda (mean rate)
# equals 4.
qpois(p, lambda = 4)

# generate density function probabilites for value(s) in vector x when lambda
# (mean rate) equals 4.
dpois(x, lambda = 4)


### Exponential Distribution Numbers

The Exponential probability distribution describes the time between events in a Poisson process.

# generate a vector of length n with rate = 1
rexp(n, rate = 1)

# generate CDF probabilities for value(s) in vector q when rate = 4.
pexp(q, rate = 1)

# generate quantile for probabilities in vector p when rate = 4.
qexp(p, rate = 1)

# generate density function probabilites for value(s) in vector x when rate = 4.
dexp(x, rate = 1)


### Gamma Distribution Numbers

The Gamma probability distribution is related to the Beta distribution and arises naturally in processes for which the waiting times between Poisson distributed events are relevant.

# generate a vector of length n with shape parameter = 1
rgamma(n, shape = 1)

# generate CDF probabilities for value(s) in vector q when shape parameter = 1.
pgamma(q, shape = 1)

# generate quantile for probabilities in vector p when shape parameter = 1.
qgamma(p, shape = 1)

# generate density function probabilites for value(s) in vector x when shape
# parameter = 1.
dgamma(x, shape = 1)


## Setting Seed Values

If you want to generate a sequence of random numbers and then be able to reproduce that same sequence of random numbers later you can set the random number seed generator with set.seed(). This is a critical aspect of reproducible research.

For example, we can reproduce a random generation of 10 values from a normal distribution:

set.seed(197)
rnorm(n = 10, mean = 0, sd = 1)
##  [1]  0.6091700 -1.4391423  2.0703326  0.7089004  0.6455311  0.7290563
##  [7] -0.4658103  0.5971364 -0.5135480 -0.1866703

set.seed(197)
rnorm(n = 10, mean = 0, sd = 1)
##  [1]  0.6091700 -1.4391423  2.0703326  0.7089004  0.6455311  0.7290563
##  [7] -0.4658103  0.5971364 -0.5135480 -0.1866703


## Comparing Numeric Values

There are multiple ways to compare numeric values and vectors. This includes logical operators along with testing for exact equality and also near equality.

### Comparison Operators

The normal binary operators allow you to compare numeric values and provides the answer in logical form:

x < y     # is x less than y
x > y     # is x greater than y
x <= y    # is x less than or equal to y
x >= y    # is x greater than or equal to y
x == y    # is x equal to y
x != y    # is x not equal to y


These operations can be used for single number comparison:

x <- 9
y <- 10

x == y
## [1] FALSE


and also for comparison of numbers within vectors:

x <- c(1, 4, 9, 12)
y <- c(4, 4, 9, 13)

x == y
## [1] FALSE  TRUE  TRUE FALSE


Note that logical values TRUE and FALSE equate to 1 and 0 respectively. So if you want to identify the number of equal values in two vectors you can wrap the operation in the sum() function:

# How many pairwise equal values are in vectors x and y
sum(x == y)
## [1] 2


If you need to identify the location of pairwise equalities in two vectors you can wrap the operation in the which() function:

# Where are the pairwise equal values located in vectors x and y
which(x == y)
## [1] 2 3


### Exact Equality

To test if two objects are exactly equal:

x <- c(4, 4, 9, 12)
y <- c(4, 4, 9, 13)

identical(x, y)
## [1] FALSE

x <- c(4, 4, 9, 12)
y <- c(4, 4, 9, 12)

identical(x, y)
## [1] TRUE


### Floating Point Comparison

Sometimes you wish to test for ‘near equality’. The all.equal() function allows you to test for equality with a difference tolerance of 1.5e-8.

x <- c(4.00000005, 4.00000008)
y <- c(4.00000002, 4.00000006)

all.equal(x, y)
## [1] TRUE


If the difference is greater than the tolerance level the function will return the mean relative difference:

x <- c(4.005, 4.0008)
y <- c(4.002, 4.0006)

all.equal(x, y)
## [1] "Mean relative difference: 0.0003997102"


## Rounding numeric Values

There are many ways of rounding to the nearest integer, up, down, or toward a specified decimal place. Assuming we have the following vector x:

x <- (1, 1.35, 1.7, 2.05, 2.4, 2.75, 3.1, 3.45, 3.8, 4.15, 4.5, 4.85, 5.2, 5.55, 5.9)


The following illustrates the common ways to round x:

# Round to the nearest integer
round(x)
##  [1] 1 1 2 2 2 3 3 3 4 4 4 5 5 6 6

# Round up
ceiling(x)
##  [1] 1 2 2 3 3 3 4 4 4 5 5 5 6 6 6

# Round down
floor(x)
##  [1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5

# Round to a specified decimal
round(x, digits = 1)
##  [1] 1.0 1.4 1.7 2.0 2.4 2.8 3.1 3.4 3.8 4.2 4.5 4.8 5.2 5.5 5.9