--- title: "purrr <-> base R" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{purrr <-> base R} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4.5, fig.align = "center" ) options(tibble.print_min = 6, tibble.print_max = 6) modern_r <- getRversion() >= "4.1.0" ``` # Introduction This vignette compares purrr's functionals to their base R equivalents, focusing primarily on the map family and related functions. This helps those familiar with base R understand better what purrr does, and shows purrr users how you might express the same ideas in base R code. We'll start with a rough overview of the major differences, give a rough translation guide, and then show a few examples. ```{r setup} library(purrr) library(tibble) ``` ## Key differences There are two primary differences between the base apply family and the purrr map family: purrr functions are named more consistently, and more fully explore the space of input and output variants. - purrr functions consistently use `.` as prefix to avoid [inadvertently matching arguments](https://adv-r.hadley.nz/functionals.html#argument-names) of the purrr function, instead of the function that you're trying to call. Base functions use a variety of techniques including upper case (e.g. `lapply(X, FUN, ...)`) or require anonymous functions (e.g. `Map()`). - All map functions are type stable: you can predict the type of the output using little information about the inputs. In contrast, the base functions `sapply()` and `mapply()` automatically simplify making the return value hard to predict. - The map functions all start with the data, followed by the function, then any additional constant argument. Most base apply functions also follow this pattern, but `mapply()` starts with the function, and `Map()` has no way to supply additional constant arguments. - purrr functions provide all combinations of input and output variants, and include variants specifically for the common two argument case. ## Direct translations The following sections give a high-level translation between base R commands and their purrr equivalents. See function documentation for the details. ### `Map` functions Here `x` denotes a vector and `f` denotes a function | Output | Input | Base R | purrr | |------------------|------------------|------------------|-------------------| | List | 1 vector | `lapply()` | `map()` | | List | 2 vectors | `mapply()`, `Map()` | `map2()` | | List | \>2 vectors | `mapply()`, `Map()` | `pmap()` | | Atomic vector of desired type | 1 vector | `vapply()` | `map_lgl()` (logical), `map_int()` (integer), `map_dbl()` (double), `map_chr()` (character), `map_raw()` (raw) | | Atomic vector of desired type | 2 vectors | `mapply()`, `Map()`, then `is.*()` to check type | `map2_lgl()` (logical), `map2_int()` (integer), `map2_dbl()` (double), `map2_chr()` (character), `map2_raw()` (raw) | | Atomic vector of desired type | \>2 vectors | `mapply()`, `Map()`, then `is.*()` to check type | `pmap_lgl()` (logical), `pmap_int()` (integer), `pmap_dbl()` (double), `pmap_chr()` (character), `pmap_raw()` (raw) | | Side effect only | 1 vector | loops | `walk()` | | Side effect only | 2 vectors | loops | `walk2()` | | Side effect only | \>2 vectors | loops | `pwalk()` | | Data frame (`rbind` outputs) | 1 vector | `lapply()` then `rbind()` | `map_dfr()` | | Data frame (`rbind` outputs) | 2 vectors | `mapply()`/`Map()` then `rbind()` | `map2_dfr()` | | Data frame (`rbind` outputs) | \>2 vectors | `mapply()`/`Map()` then `rbind()` | `pmap_dfr()` | | Data frame (`cbind` outputs) | 1 vector | `lapply()` then `cbind()` | `map_dfc()` | | Data frame (`cbind` outputs) | 2 vectors | `mapply()`/`Map()` then `cbind()` | `map2_dfc()` | | Data frame (`cbind` outputs) | \>2 vectors | `mapply()`/`Map()` then `cbind()` | `pmap_dfc()` | | Any | Vector and its names | `l/s/vapply(X, function(x) f(x, names(x)))` or `mapply/Map(f, x, names(x))` | `imap()`, `imap_*()` (`lgl`, `dbl`, `dfr`, and etc. just like for `map()`, `map2()`, and `pmap()`) | | Any | Selected elements of the vector | `l/s/vapply(X[index], FUN, ...)` | `map_if()`, `map_at()` | | List | Recursively apply to list within list | `rapply()` | `map_depth()` | | List | List only | `lapply()` | `lmap()`, `lmap_at()`, `lmap_if()` | ### Extractor shorthands Since a common use case for map functions is list extracting components, purrr provides a handful of shortcut functions for various uses of `[[`. | Input | base R | purrr | |-------------------|--------------------------|---------------------------| | Extract by name | `` lapply(x, `[[`, "a") `` | `map(x, "a")` | | Extract by position | `` lapply(x, `[[`, 3) `` | `map(x, 3)` | | Extract deeply | `lapply(x, \(y) y[[1]][["x"]][[3]])` | `map(x, list(1, "x", 3))` | | Extract with default value | `lapply(x, function(y) tryCatch(y[[3]], error = function(e) NA))` | `map(x, 3, .default = NA)` | ### Predicates Here `p`, a predicate, denotes a function that returns `TRUE` or `FALSE` indicating whether an object fulfills a criterion, e.g. `is.character()`. | Description | base R | purrr | |-----------------------------|--------------------|-----------------------| | Find a matching element | `Find(p, x)` | `detect(x, p)`, | | Find position of matching element | `Position(p, x)` | `detect_index(x, p)` | | Do all elements of a vector satisfy a predicate? | `all(sapply(x, p))` | `every(x, p)` | | Does any elements of a vector satisfy a predicate? | `any(sapply(x, p))` | `some(x, p)` | | Does a list contain an object? | `any(sapply(x, identical, obj))` | `has_element(x, obj)` | | Keep elements that satisfy a predicate | `x[sapply(x, p)]` | `keep(x, p)` | | Discard elements that satisfy a predicate | `x[!sapply(x, p)]` | `discard(x, p)` | | Negate a predicate function | `function(x) !p(x)` | `negate(p)` | ### Other vector transforms | Description | base R | purrr | |-----------------------------|--------------------|-----------------------| | Accumulate intermediate results of a vector reduction | `Reduce(f, x, accumulate = TRUE)` | `accumulate(x, f)` | | Recursively combine two lists | `c(X, Y)`, but more complicated to merge recursively | `list_merge()`, `list_modify()` | | Reduce a list to a single value by iteratively applying a binary function | `Reduce(f, x)` | `reduce(x, f)` | ## Examples ### Varying inputs #### One input Suppose we would like to generate a list of samples of 5 from normal distributions with different means: ```{r} means <- 1:4 ``` There's little difference when generating the samples: - Base R uses `lapply()`: ```{r} set.seed(2020) samples <- lapply(means, rnorm, n = 5, sd = 1) str(samples) ``` - purrr uses `map()`: ```{r} set.seed(2020) samples <- map(means, rnorm, n = 5, sd = 1) str(samples) ``` #### Two inputs Lets make the example a little more complicated by also varying the standard deviations: ```{r} means <- 1:4 sds <- 1:4 ``` - This is relatively tricky in base R because we have to adjust a number of `mapply()`'s defaults. ```{r} set.seed(2020) samples <- mapply( rnorm, mean = means, sd = sds, MoreArgs = list(n = 5), SIMPLIFY = FALSE ) str(samples) ``` Alternatively, we could use `Map()` which doesn't simply, but also doesn't take any constant arguments, so we need to use an anonymous function: ```{r} samples <- Map(function(...) rnorm(..., n = 5), mean = means, sd = sds) ``` In R 4.1 and up, you could use the shorter anonymous function form: ```{r, eval = modern_r} samples <- Map(\(...) rnorm(..., n = 5), mean = means, sd = sds) ``` - Working with a pair of vectors is a common situation so purrr provides the `map2()` family of functions: ```{r} set.seed(2020) samples <- map2(means, sds, rnorm, n = 5) str(samples) ``` #### Any number of inputs We can make the challenge still more complex by also varying the number of samples: ```{r} ns <- 4:1 ``` - Using base R's `Map()` becomes more straightforward because there are no constant arguments. ```{r} set.seed(2020) samples <- Map(rnorm, mean = means, sd = sds, n = ns) str(samples) ``` - In purrr, we need to switch from `map2()` to `pmap()` which takes a list of any number of arguments. ```{r} set.seed(2020) samples <- pmap(list(mean = means, sd = sds, n = ns), rnorm) str(samples) ``` ### Outputs Given the samples, imagine we want to compute their means. A mean is a single number, so we want the output to be a numeric vector rather than a list. - There are two options in base R: `vapply()` or `sapply()`. `vapply()` requires you to specific the output type (so is relatively verbose), but will always return a numeric vector. `sapply()` is concise, but if you supply an empty list you'll get a list instead of a numeric vector. ```{r} # type stable medians <- vapply(samples, median, FUN.VALUE = numeric(1L)) medians # not type stable medians <- sapply(samples, median) ``` - purrr is little more compact because we can use `map_dbl()`. ```{r} medians <- map_dbl(samples, median) medians ``` What if we want just the side effect, such as a plot or a file output, but not the returned values? - In base R we can either use a for loop or hide the results of `lapply`. ```{r, fig.show='hide'} # for loop for (s in samples) { hist(s, xlab = "value", main = "") } # lapply invisible(lapply(samples, function(s) { hist(s, xlab = "value", main = "") })) ``` - In purrr, we can use `walk()`. ```{r, fig.show='hide'} walk(samples, ~ hist(.x, xlab = "value", main = "")) ``` ### Pipes You can join multiple steps together either using the magrittr pipe: ```{r} set.seed(2020) means %>% map(rnorm, n = 5, sd = 1) %>% map_dbl(median) ``` Or the base pipe R: ```{r, eval = modern_r} set.seed(2020) means |> lapply(rnorm, n = 5, sd = 1) |> sapply(median) ``` (And of course you can mix and match the piping style with either base R or purrr.) The pipe is particularly compelling when working with longer transformations. For example, the following code splits `mtcars` up by `cyl`, fits a linear model, extracts the coefficients, and extracts the first one (the intercept). ```{r, eval = modern_r} mtcars %>% split(mtcars$cyl) %>% map(\(df) lm(mpg ~ wt, data = df)) %>% map(coef) %>% map_dbl(1) ```