↩

Set Operations for Character Strings

There are also base R functions that allows for assessing the set union, intersection, difference, equality, and membership of two vectors. I also cover sorting character strings.

Set Union

To obtain the elements of the union between two character vectors use union():

set_1 <- c("lagunitas", "bells", "dogfish", "summit", "odell")
set_2 <- c("sierra", "bells", "harpoon", "lagunitas", "founders")

union(set_1, set_2)
## [1] "lagunitas" "bells"     "dogfish"   "summit"    "odell"     "sierra"   
## [7] "harpoon"   "founders"

Set Intersection

To obtain the common elements of two character vectors use intersect():

intersect(set_1, set_2)
## [1] "lagunitas" "bells"

Identifying Different Elements

To obtain the non-common elements, or the difference, of two character vectors use setdiff():

# returns elements in set_1 not in set_2
setdiff(set_1, set_2)
## [1] "dogfish" "summit"  "odell"

# returns elements in set_2 not in set_1
setdiff(set_2, set_1)
## [1] "sierra"   "harpoon"  "founders"

Testing for Element Equality

To test if two vectors contain the same elements regardless of order use setequal():

set_3 <- c("woody", "buzz", "rex")
set_4 <- c("woody", "andy", "buzz")
set_5 <- c("andy", "buzz", "woody")

setequal(set_3, set_4)
## [1] FALSE

setequal(set_4, set_5)
## [1] TRUE

Testing for Exact Equality

To test if two character vectors are equal in content and order use identical():

set_6 <- c("woody", "andy", "buzz")
set_7 <- c("andy", "buzz", "woody")
set_8 <- c("woody", "andy", "buzz")

identical(set_6, set_7)
## [1] FALSE

identical(set_6, set_8)
## [1] TRUE

Identifying if Elements are Contained in a String

To test if an element is contained within a character vector use is.element() or %in%:

good <- "andy"
bad <- "sid"

is.element(good, set_8)
## [1] TRUE

good %in% set_8
## [1] TRUE

bad %in% set_8
## [1] FALSE

Sorting a String

To sort a character vector use sort():

sort(set_8)
## [1] "andy"  "buzz"  "woody"

sort(set_8, decreasing = TRUE)
## [1] "woody" "buzz"  "andy"

UC Business Analytics R Programming Guide