We can exclude missing values in a couple different ways. First, if we want to exclude missing values from mathematical operations use the na.rm = TRUE
argument. If you do not exclude these values most functions will return an NA
.
# A vector with missing values
x <- c(1:4, NA, 6:7, NA)
# including NA values will produce an NA output
mean(x)
## [1] NA
# excluding NA values will calculate the mathematical operation for all non-missing values
mean(x, na.rm = TRUE)
## [1] 3.833333
We may also desire to subset our data to obtain complete observations, those observations (rows) in our data that contain no missing data. We can do this a few different ways.
# data frame with missing values
df <- data.frame(col1 = c(1:3, NA),
col2 = c("this", NA,"is", "text"),
col3 = c(TRUE, FALSE, TRUE, TRUE),
col4 = c(2.5, 4.2, 3.2, NA),
stringsAsFactors = FALSE)
df
## col1 col2 col3 col4
## 1 1 this TRUE 2.5
## 2 2 <NA> FALSE 4.2
## 3 3 is TRUE 3.2
## 4 NA text TRUE NA
First, to find complete cases we can leverage the complete.cases()
function which returns a logical vector identifying rows which are complete cases. So in the following case rows 1 and 3 are complete cases. We can use this information to subset our data frame which will return the rows which complete.cases()
found to be TRUE
.
complete.cases(df)
## [1] TRUE FALSE TRUE FALSE
# subset with complete.cases to get complete cases
df[complete.cases(df), ]
## col1 col2 col3 col4
## 1 1 this TRUE 2.5
## 3 3 is TRUE 3.2
# or subset with `!` operator to get incomplete cases
df[!complete.cases(df), ]
## col1 col2 col3 col4
## 2 2 <NA> FALSE 4.2
## 4 NA text TRUE NA
An shorthand alternative is to simply use na.omit()
to omit all rows containing missing values.
# or use na.omit() to get same as above
na.omit(df)
## col1 col2 col3 col4
## 1 1 this TRUE 2.5
## 3 3 is TRUE 3.2