Package 'forcats' reference manual

Title:	Tools for Working with Categorical Variables (Factors)
Description:	Helpers for reordering factor levels (including moving specified levels to front, ordering by first appearance, reversing, and randomly shuffling), and tools for modifying factor levels (including collapsing rare levels into other, 'anonymising', and manually 'recoding').
Authors:	Hadley Wickham [aut, cre], Posit Software, PBC [cph, fnd]
Maintainer:	Hadley Wickham <[email protected]>
License:	MIT + file LICENSE
Version:	1.0.0.9000
Built:	2025-03-24 04:38:40 UTC
Source:	https://github.com/tidyverse/forcats

Convert input to a factor

Description

Compared to base R, when x is a character, this function creates levels in the order in which they appear, which will be the same on every platform. (Base R sorts in the current locale which can vary from place to place.) When x is numeric, the ordering is based on the numeric value and consistent with base R.

Usage

as_factor(x, ...)

## S3 method for class 'factor'
as_factor(x, ...)

## S3 method for class 'character'
as_factor(x, ...)

## S3 method for class 'numeric'
as_factor(x, ...)

## S3 method for class 'logical'
as_factor(x, ...)
as_factor(x, ...)

## S3 method for class 'factor'
as_factor(x, ...)

## S3 method for class 'character'
as_factor(x, ...)

## S3 method for class 'numeric'
as_factor(x, ...)

## S3 method for class 'logical'
as_factor(x, ...)

Arguments

`x`	Object to coerce to a factor.
`...`	Other arguments passed down to method.

Details

This is a generic function.

Examples

# Character object
x <- c("a", "z", "g")
as_factor(x)
as.factor(x)

# Character object containing numbers
y <- c("1.1", "11", "2.2", "22")
as_factor(y)
as.factor(y)

# Numeric object
z <- as.numeric(y)
as_factor(z)
as.factor(z)
# Character object
x <- c("a", "z", "g")
as_factor(x)
as.factor(x)

# Character object containing numbers
y <- c("1.1", "11", "2.2", "22")
as_factor(y)
as.factor(y)

# Numeric object
z <- as.numeric(y)
as_factor(z)
as.factor(z)

Create a factor

Description

fct() is a stricter version of factor() that errors if your specification of levels is inconsistent with the values in x.

Usage

fct(x = character(), levels = NULL, na = character())
fct(x = character(), levels = NULL, na = character())

Arguments

`x`	A character vector. Values must occur in either `levels` or `na`.
`levels`	A character vector of known levels. If not supplied, will be computed from the unique values of `x`, in the order in which they occur.
`na`	A character vector of values that should become missing values.

Value

A factor.

Examples

# Use factors when you know the set of possible values a variable might take
x <- c("A", "O", "O", "AB", "A")
fct(x, levels = c("O", "A", "B", "AB"))

# If you don't specify the levels, fct will create from the data
# in the order that they're seen
fct(x)


# Differences with base R -----------------------------------------------
# factor() silently generates NAs
x <- c("a", "b", "c")
factor(x, levels = c("a", "b"))
# fct() errors
try(fct(x, levels = c("a", "b")))
# Unless you explicitly supply NA:
fct(x, levels = c("a", "b"), na = "c")

# factor() sorts default levels:
factor(c("y", "x"))
# fct() uses in order of appearance:
fct(c("y", "x"))
# Use factors when you know the set of possible values a variable might take
x <- c("A", "O", "O", "AB", "A")
fct(x, levels = c("O", "A", "B", "AB"))

# If you don't specify the levels, fct will create from the data
# in the order that they're seen
fct(x)


# Differences with base R -----------------------------------------------
# factor() silently generates NAs
x <- c("a", "b", "c")
factor(x, levels = c("a", "b"))
# fct() errors
try(fct(x, levels = c("a", "b")))
# Unless you explicitly supply NA:
fct(x, levels = c("a", "b"), na = "c")

# factor() sorts default levels:
factor(c("y", "x"))
# fct() uses in order of appearance:
fct(c("y", "x"))

Anonymise factor levels

Description

Replaces factor levels with arbitrary numeric identifiers. Neither the values nor the order of the levels are preserved.

Usage

fct_anon(f, prefix = "")
fct_anon(f, prefix = "")

Arguments

`f`	A factor.
`prefix`	A character prefix to insert in front of the random labels.

Examples

gss_cat$relig %>% fct_count()
gss_cat$relig %>%
  fct_anon() %>%
  fct_count()
gss_cat$relig %>%
  fct_anon("X") %>%
  fct_count()
gss_cat$relig %>% fct_count()
gss_cat$relig %>%
  fct_anon() %>%
  fct_count()
gss_cat$relig %>%
  fct_anon("X") %>%
  fct_count()

Concatenate factors, combining levels

Description

This is a useful way of patching together factors from multiple sources that really should have the same levels but don't.

Usage

fct_c(...)
fct_c(...)

Arguments

...

<dynamic-dots> Individual factors. Uses tidy dots, so you can splice in a list of factors with ⁠!!!⁠.

Examples

fa <- factor("a")
fb <- factor("b")
fab <- factor(c("a", "b"))

c(fa, fb, fab)
fct_c(fa, fb, fab)

# You can also pass a list of factors with !!!
fs <- list(fa, fb, fab)
fct_c(!!!fs)
fa <- factor("a")
fb <- factor("b")
fab <- factor(c("a", "b"))

c(fa, fb, fab)
fct_c(fa, fb, fab)

# You can also pass a list of factors with !!!
fs <- list(fa, fb, fab)
fct_c(!!!fs)

Collapse factor levels into manually defined groups

Description

Collapse factor levels into manually defined groups

Usage

fct_collapse(.f, ..., other_level = NULL, group_other = "DEPRECATED")
fct_collapse(.f, ..., other_level = NULL, group_other = "DEPRECATED")

Arguments

`.f`	A factor (or character vector).
`...`	<`dynamic-dots`> A series of named character vectors. The levels in each vector will be replaced with the name.
`other_level`	Value of level used for "other" values. Always placed at end of levels.
`group_other`	Deprecated. Replace all levels not named in `...` with "Other"?

Examples

fct_count(gss_cat$partyid)

partyid2 <- fct_collapse(gss_cat$partyid,
  missing = c("No answer", "Don't know"),
  other = "Other party",
  rep = c("Strong republican", "Not str republican"),
  ind = c("Ind,near rep", "Independent", "Ind,near dem"),
  dem = c("Not str democrat", "Strong democrat")
)
fct_count(partyid2)
fct_count(gss_cat$partyid)

partyid2 <- fct_collapse(gss_cat$partyid,
  missing = c("No answer", "Don't know"),
  other = "Other party",
  rep = c("Strong republican", "Not str republican"),
  ind = c("Ind,near rep", "Independent", "Ind,near dem"),
  dem = c("Not str democrat", "Strong democrat")
)
fct_count(partyid2)

Count entries in a factor

Description

Count entries in a factor

Usage

fct_count(f, sort = FALSE, prop = FALSE)
fct_count(f, sort = FALSE, prop = FALSE)

Arguments

`f`	A factor (or character vector).
`sort`	If `TRUE`, sort the result so that the most common values float to the top.
`prop`	If `TRUE`, compute the fraction of marginal table.

Value

A tibble with columns f, n and p, if prop is TRUE.

Examples

f <- factor(sample(letters)[rpois(1000, 10)])
table(f)
fct_count(f)
fct_count(f, sort = TRUE)
fct_count(f, sort = TRUE, prop = TRUE)
f <- factor(sample(letters)[rpois(1000, 10)])
table(f)
fct_count(f)
fct_count(f, sort = TRUE)
fct_count(f, sort = TRUE, prop = TRUE)

Combine levels from two or more factors to create a new factor

Description

Computes a factor whose levels are all the combinations of the levels of the input factors.

Usage

fct_cross(..., sep = ":", keep_empty = FALSE)
fct_cross(..., sep = ":", keep_empty = FALSE)

Arguments

`...`	<`dynamic-dots`> Additional factors or character vectors.
`sep`	A character string to separate the levels
`keep_empty`	If TRUE, keep combinations with no observations as levels

Value

The new factor

Examples

fruit <- factor(c("apple", "kiwi", "apple", "apple"))
colour <- factor(c("green", "green", "red", "green"))
eaten <- c("yes", "no", "yes", "no")
fct_cross(fruit, colour)
fct_cross(fruit, colour, eaten)
fct_cross(fruit, colour, keep_empty = TRUE)
fruit <- factor(c("apple", "kiwi", "apple", "apple"))
colour <- factor(c("green", "green", "red", "green"))
eaten <- c("yes", "no", "yes", "no")
fct_cross(fruit, colour)
fct_cross(fruit, colour, eaten)
fct_cross(fruit, colour, keep_empty = TRUE)

Drop unused levels

Description

Compared to base::droplevels(), does not drop NA levels that have values.

Usage

fct_drop(f, only = NULL)
fct_drop(f, only = NULL)

Arguments

`f`	A factor (or character vector).
`only`	A character vector restricting the set of levels to be dropped. If supplied, only levels that have no entries and appear in this vector will be removed.

Examples

f <- factor(c("a", "b"), levels = c("a", "b", "c"))
f
fct_drop(f)

# Set only to restrict which levels to drop
fct_drop(f, only = "a")
fct_drop(f, only = "c")
f <- factor(c("a", "b"), levels = c("a", "b", "c"))
f
fct_drop(f)

# Set only to restrict which levels to drop
fct_drop(f, only = "a")
fct_drop(f, only = "c")

Add additional levels to a factor

Description

Add additional levels to a factor

Usage

fct_expand(f, ..., after = Inf)
fct_expand(f, ..., after = Inf)

Arguments

`f`	A factor (or character vector).
`...`	Additional levels to add to the factor. Levels that already exist will be silently ignored.
`after`	Where should the new values be placed?

Examples

f <- factor(sample(letters[1:3], 20, replace = TRUE))
f
fct_expand(f, "d", "e", "f")
fct_expand(f, letters[1:6])
fct_expand(f, "Z", after = 0)
f <- factor(sample(letters[1:3], 20, replace = TRUE))
f
fct_expand(f, "d", "e", "f")
fct_expand(f, letters[1:6])
fct_expand(f, "Z", after = 0)

Reorder factor levels by first appearance, frequency, or numeric order

Description

This family of functions changes only the order of the levels.

fct_inorder(): by the order in which they first appear.
fct_infreq(): by number of observations with each level (largest first)
fct_inseq(): by numeric value of level.

Usage

fct_inorder(f, ordered = NA)

fct_infreq(f, w = NULL, ordered = NA)

fct_inseq(f, ordered = NA)
fct_inorder(f, ordered = NA)

fct_infreq(f, w = NULL, ordered = NA)

fct_inseq(f, ordered = NA)

Arguments

`f`	A factor
`ordered`	A logical which determines the "ordered" status of the output factor. `NA` preserves the existing status of the factor.
`w`	An optional numeric vector giving weights for frequency of each value (not level) in f.

Examples

f <- factor(c("b", "b", "a", "c", "c", "c"))
f
fct_inorder(f)
fct_infreq(f)

f <- factor(1:3, levels = c("3", "2", "1"))
f
fct_inseq(f)
f <- factor(c("b", "b", "a", "c", "c", "c"))
f
fct_inorder(f)
fct_infreq(f)

f <- factor(1:3, levels = c("3", "2", "1"))
f
fct_inseq(f)

Lump uncommon factor together levels into "other"

Description

A family for lumping together levels that meet some criteria.

fct_lump_min(): lumps levels that appear fewer than min times.
fct_lump_prop(): lumps levels that appear in fewer than (or equal to) prop * n times.
fct_lump_n() lumps all levels except for the n most frequent (or least frequent if n < 0)
fct_lump_lowfreq() lumps together the least frequent levels, ensuring that "other" is still the smallest level.

fct_lump() exists primarily for historical reasons, as it automatically picks between these different methods depending on its arguments. We no longer recommend that you use it.

Usage

fct_lump(
  f,
  n,
  prop,
  w = NULL,
  other_level = "Other",
  ties.method = c("min", "average", "first", "last", "random", "max")
)

fct_lump_min(f, min, w = NULL, other_level = "Other")

fct_lump_prop(f, prop, w = NULL, other_level = "Other")

fct_lump_n(
  f,
  n,
  w = NULL,
  other_level = "Other",
  ties.method = c("min", "average", "first", "last", "random", "max")
)

fct_lump_lowfreq(f, w = NULL, other_level = "Other")
fct_lump(
  f,
  n,
  prop,
  w = NULL,
  other_level = "Other",
  ties.method = c("min", "average", "first", "last", "random", "max")
)

fct_lump_min(f, min, w = NULL, other_level = "Other")

fct_lump_prop(f, prop, w = NULL, other_level = "Other")

fct_lump_n(
  f,
  n,
  w = NULL,
  other_level = "Other",
  ties.method = c("min", "average", "first", "last", "random", "max")
)

fct_lump_lowfreq(f, w = NULL, other_level = "Other")

Arguments

`f`	A factor (or character vector).
`n`	Positive `n` preserves the most common `n` values. Negative `n` preserves the least common `-n` values. It there are ties, you will get at least `abs(n)` values.
`prop`	Positive `prop` lumps values which do not appear at least `prop` of the time. Negative `prop` lumps values that do not appear at most `-prop` of the time.
`w`	An optional numeric vector giving weights for frequency of each value (not level) in f.
`other_level`	Value of level used for "other" values. Always placed at end of levels.
`ties.method`	A character string specifying how ties are treated. See `rank()` for details.
`min`	Preserve levels that appear at least `min` number of times.

Examples

x <- factor(rep(LETTERS[1:9], times = c(40, 10, 5, 27, 1, 1, 1, 1, 1)))
x %>% table()
x %>%
  fct_lump_n(3) %>%
  table()
x %>%
  fct_lump_prop(0.10) %>%
  table()
x %>%
  fct_lump_min(5) %>%
  table()
x %>%
  fct_lump_lowfreq() %>%
  table()

x <- factor(letters[rpois(100, 5)])
x
table(x)
table(fct_lump_lowfreq(x))

# Use positive values to collapse the rarest
fct_lump_n(x, n = 3)
fct_lump_prop(x, prop = 0.1)

# Use negative values to collapse the most common
fct_lump_n(x, n = -3)
fct_lump_prop(x, prop = -0.1)

# Use weighted frequencies
w <- c(rep(2, 50), rep(1, 50))
fct_lump_n(x, n = 5, w = w)

# Use ties.method to control how tied factors are collapsed
fct_lump_n(x, n = 6)
fct_lump_n(x, n = 6, ties.method = "max")

# Use fct_lump_min() to lump together all levels with fewer than `n` values
table(fct_lump_min(x, min = 10))
table(fct_lump_min(x, min = 15))
x <- factor(rep(LETTERS[1:9], times = c(40, 10, 5, 27, 1, 1, 1, 1, 1)))
x %>% table()
x %>%
  fct_lump_n(3) %>%
  table()
x %>%
  fct_lump_prop(0.10) %>%
  table()
x %>%
  fct_lump_min(5) %>%
  table()
x %>%
  fct_lump_lowfreq() %>%
  table()

x <- factor(letters[rpois(100, 5)])
x
table(x)
table(fct_lump_lowfreq(x))

# Use positive values to collapse the rarest
fct_lump_n(x, n = 3)
fct_lump_prop(x, prop = 0.1)

# Use negative values to collapse the most common
fct_lump_n(x, n = -3)
fct_lump_prop(x, prop = -0.1)

# Use weighted frequencies
w <- c(rep(2, 50), rep(1, 50))
fct_lump_n(x, n = 5, w = w)

# Use ties.method to control how tied factors are collapsed
fct_lump_n(x, n = 6)
fct_lump_n(x, n = 6, ties.method = "max")

# Use fct_lump_min() to lump together all levels with fewer than `n` values
table(fct_lump_min(x, min = 10))
table(fct_lump_min(x, min = 15))

Test for presence of levels in a factor

Description

Do any of lvls occur in f? Compared to %in%, this function validates lvls to ensure that they're actually present in f. In other words, x %in% "not present" will return FALSE, but fct_match(x, "not present") will throw an error.

Usage

fct_match(f, lvls)
fct_match(f, lvls)

Arguments

`f`	A factor (or character vector).
`lvls`	A character vector specifying levels to look for.

Value

A logical vector

Examples

table(fct_match(gss_cat$marital, c("Married", "Divorced")))

# Compare to %in%, misspelled levels throw an error
table(gss_cat$marital %in% c("Maried", "Davorced"))
## Not run: 
table(fct_match(gss_cat$marital, c("Maried", "Davorced")))

## End(Not run)
table(fct_match(gss_cat$marital, c("Married", "Divorced")))

# Compare to %in%, misspelled levels throw an error
table(gss_cat$marital %in% c("Maried", "Davorced"))
## Not run: 
table(fct_match(gss_cat$marital, c("Maried", "Davorced")))

## End(Not run)

Convert between `NA` values and `NA` levels

Description

There are two ways to represent missing values in factors: in the values and in the levels. NAs in the values are most useful for data analysis (since is.na() returns what you expect), but because the NA is not explicitly recorded in the levels, there's no way to control its position (it's almost always displayed last or not at all). Putting the NAs in the levels allows you to control its display, at the cost of losing accurate is.na() reporting.

(It is possible to have a factor with missing values in both the values and the levels but it requires some explicit gymnastics and we don't recommend it.)

Usage

fct_na_value_to_level(f, level = NA)

fct_na_level_to_value(f, extra_levels = NULL)
fct_na_value_to_level(f, level = NA)

fct_na_level_to_value(f, extra_levels = NULL)

Arguments

`f`	A factor (or character vector).
`level`	Optionally, instead of converting the `NA` values to an `NA` level, convert it to a level with this value.
`extra_levels`	Optionally, a character vector giving additional levels that should also be converted to `NA` values.

Examples

# Most factors store NAs in the values:
f1 <- fct(c("a", "b", NA, "c", "b", NA))
levels(f1)
as.integer(f1)
is.na(f1)

# But it's also possible to store them in the levels
f2 <- fct_na_value_to_level(f1)
levels(f2)
as.integer(f2)
is.na(f2)

# If needed, you can convert back to NAs in the values:
f3 <- fct_na_level_to_value(f2)
levels(f3)
as.integer(f3)
is.na(f3)
# Most factors store NAs in the values:
f1 <- fct(c("a", "b", NA, "c", "b", NA))
levels(f1)
as.integer(f1)
is.na(f1)

# But it's also possible to store them in the levels
f2 <- fct_na_value_to_level(f1)
levels(f2)
as.integer(f2)
is.na(f2)

# If needed, you can convert back to NAs in the values:
f3 <- fct_na_level_to_value(f2)
levels(f3)
as.integer(f3)
is.na(f3)

Manually replace levels with "other"

Description

Manually replace levels with "other"

Usage

fct_other(f, keep, drop, other_level = "Other")
fct_other(f, keep, drop, other_level = "Other")

Arguments

f

A factor (or character vector).

keep, drop

Pick one of keep and drop:

keep will preserve listed levels, replacing all others with other_level.
drop will replace listed levels with other_level, keeping all as is.

other_level

Value of level used for "other" values. Always placed at end of levels.

Examples

x <- factor(rep(LETTERS[1:9], times = c(40, 10, 5, 27, 1, 1, 1, 1, 1)))

fct_other(x, keep = c("A", "B"))
fct_other(x, drop = c("A", "B"))
x <- factor(rep(LETTERS[1:9], times = c(40, 10, 5, 27, 1, 1, 1, 1, 1)))

fct_other(x, keep = c("A", "B"))
fct_other(x, drop = c("A", "B"))

Change factor levels by hand

Description

Change factor levels by hand

Usage

fct_recode(.f, ...)
fct_recode(.f, ...)

Arguments

`.f`	A factor (or character vector).
`...`	<`dynamic-dots`> A sequence of named character vectors where the name gives the new level, and the value gives the old level. Levels not otherwise mentioned will be left as is. Levels can be removed by naming them `NULL`.

Examples

x <- factor(c("apple", "bear", "banana", "dear"))
fct_recode(x, fruit = "apple", fruit = "banana")

# If you make a mistake you'll get a warning
fct_recode(x, fruit = "apple", fruit = "bananana")

# If you name the level NULL it will be removed
fct_recode(x, NULL = "apple", fruit = "banana")

# Wrap the left hand side in quotes if it contains special variables
fct_recode(x, "an apple" = "apple", "a bear" = "bear")

# When passing a named vector to rename levels use !!! to splice
x <- factor(c("apple", "bear", "banana", "dear"))
levels <- c(fruit = "apple", fruit = "banana")
fct_recode(x, !!!levels)
x <- factor(c("apple", "bear", "banana", "dear"))
fct_recode(x, fruit = "apple", fruit = "banana")

# If you make a mistake you'll get a warning
fct_recode(x, fruit = "apple", fruit = "bananana")

# If you name the level NULL it will be removed
fct_recode(x, NULL = "apple", fruit = "banana")

# Wrap the left hand side in quotes if it contains special variables
fct_recode(x, "an apple" = "apple", "a bear" = "bear")

# When passing a named vector to rename levels use !!! to splice
x <- factor(c("apple", "bear", "banana", "dear"))
levels <- c(fruit = "apple", fruit = "banana")
fct_recode(x, !!!levels)

Relabel factor levels with a function, collapsing as necessary

Description

Relabel factor levels with a function, collapsing as necessary

Usage

fct_relabel(.f, .fun, ...)
fct_relabel(.f, .fun, ...)

Arguments

.f

A factor (or character vector).

.fun

A function to be applied to each level. Must accept one character argument and return a character vector of the same length as its input.

You can also use ~ to create as shorthand (in the style of purrr). ~ paste(., "x") is equivalent to function(.) paste(., "x")

...

Additional arguments to fun.

Examples

gss_cat$partyid %>% fct_count()
gss_cat$partyid %>%
  fct_relabel(~ gsub(",", ", ", .x)) %>%
  fct_count()

convert_income <- function(x) {
  regex <- "^(?:Lt |)[$]([0-9]+).*$"
  is_range <- grepl(regex, x)
  num_income <- as.numeric(gsub(regex, "\\1", x[is_range]))
  num_income <- trunc(num_income / 5000) * 5000
  x[is_range] <- paste0("Gt $", num_income)
  x
}
fct_count(gss_cat$rincome)
convert_income(levels(gss_cat$rincome))
rincome2 <- fct_relabel(gss_cat$rincome, convert_income)
fct_count(rincome2)
gss_cat$partyid %>% fct_count()
gss_cat$partyid %>%
  fct_relabel(~ gsub(",", ", ", .x)) %>%
  fct_count()

convert_income <- function(x) {
  regex <- "^(?:Lt |)[$]([0-9]+).*$"
  is_range <- grepl(regex, x)
  num_income <- as.numeric(gsub(regex, "\\1", x[is_range]))
  num_income <- trunc(num_income / 5000) * 5000
  x[is_range] <- paste0("Gt $", num_income)
  x
}
fct_count(gss_cat$rincome)
convert_income(levels(gss_cat$rincome))
rincome2 <- fct_relabel(gss_cat$rincome, convert_income)
fct_count(rincome2)

Reorder factor levels by hand

Description

This is a generalisation of stats::relevel() that allows you to move any number of levels to any location.

Usage

fct_relevel(.f, ..., after = 0L)
fct_relevel(.f, ..., after = 0L)

Arguments

.f

A factor (or character vector).

...

Either a function (or formula), or character levels.

A function will be called with the current levels as input, and the return value (which must be a character vector) will be used to relevel the factor.

Any levels not mentioned will be left in their existing order, by default after the explicitly mentioned levels. Supports tidy dots.

after

Where should the new values be placed?

Examples

f <- factor(c("a", "b", "c", "d"), levels = c("b", "c", "d", "a"))
fct_relevel(f)
fct_relevel(f, "a")
fct_relevel(f, "b", "a")

# Move to the third position
fct_relevel(f, "a", after = 2)

# Relevel to the end
fct_relevel(f, "a", after = Inf)
fct_relevel(f, "a", after = 3)

# Relevel with a function
fct_relevel(f, sort)
fct_relevel(f, sample)
fct_relevel(f, rev)

# Using 'Inf' allows you to relevel to the end when the number
# of levels is unknown or variable (e.g. vectorised operations)
df <- forcats::gss_cat[, c("rincome", "denom")]
lapply(df, levels)

df2 <- lapply(df, fct_relevel, "Don't know", after = Inf)
lapply(df2, levels)

# You'll get a warning if the levels don't exist
fct_relevel(f, "e")
f <- factor(c("a", "b", "c", "d"), levels = c("b", "c", "d", "a"))
fct_relevel(f)
fct_relevel(f, "a")
fct_relevel(f, "b", "a")

# Move to the third position
fct_relevel(f, "a", after = 2)

# Relevel to the end
fct_relevel(f, "a", after = Inf)
fct_relevel(f, "a", after = 3)

# Relevel with a function
fct_relevel(f, sort)
fct_relevel(f, sample)
fct_relevel(f, rev)

# Using 'Inf' allows you to relevel to the end when the number
# of levels is unknown or variable (e.g. vectorised operations)
df <- forcats::gss_cat[, c("rincome", "denom")]
lapply(df, levels)

df2 <- lapply(df, fct_relevel, "Don't know", after = Inf)
lapply(df2, levels)

# You'll get a warning if the levels don't exist
fct_relevel(f, "e")

Reorder factor levels by sorting along another variable

Description

fct_reorder() is useful for 1d displays where the factor is mapped to position; fct_reorder2() for 2d displays where the factor is mapped to a non-position aesthetic. last2() and first2() are helpers for fct_reorder2(); last2() finds the last value of y when sorted by x; first2() finds the first value.

Usage

fct_reorder(
  .f,
  .x,
  .fun = median,
  ...,
  .na_rm = NULL,
  .default = Inf,
  .desc = FALSE
)

fct_reorder2(
  .f,
  .x,
  .y,
  .fun = last2,
  ...,
  .na_rm = NULL,
  .default = -Inf,
  .desc = TRUE
)

last2(.x, .y)

first2(.x, .y)
fct_reorder(
  .f,
  .x,
  .fun = median,
  ...,
  .na_rm = NULL,
  .default = Inf,
  .desc = FALSE
)

fct_reorder2(
  .f,
  .x,
  .y,
  .fun = last2,
  ...,
  .na_rm = NULL,
  .default = -Inf,
  .desc = TRUE
)

last2(.x, .y)

first2(.x, .y)

Arguments

`.f`	A factor (or character vector).
`.x`, `.y`	The levels of `f` are reordered so that the values of `.fun(.x)` (for `fct_reorder()`) and `fun(.x, .y)` (for `fct_reorder2()`) are in ascending order.
`.fun`	n summary function. It should take one vector for `fct_reorder`, and two vectors for `fct_reorder2`, and return a single value.
`...`	Other arguments passed on to `.fun`.
`.na_rm`	Should `fct_reorder()` remove missing values? If `NULL`, the default, will remove missing values with a warning. Set to `FALSE` to preserve `NA`s (if you `.fun` already handles them) and `TRUE` to remove silently.
`.default`	What default value should we use for `.fun` for empty levels? Use this to control where empty levels appear in the output.
`.desc`	Order in descending order? Note the default is different between `fct_reorder` and `fct_reorder2`, in order to match the default ordering of factors in the legend.

Examples

# fct_reorder() -------------------------------------------------------------
# Useful when a categorical variable is mapped to position
boxplot(Sepal.Width ~ Species, data = iris)
boxplot(Sepal.Width ~ fct_reorder(Species, Sepal.Width), data = iris)

# or with
library(ggplot2)
ggplot(iris, aes(fct_reorder(Species, Sepal.Width), Sepal.Width)) +
  geom_boxplot()

# fct_reorder2() -------------------------------------------------------------
# Useful when a categorical variable is mapped to color, size, shape etc

chks <- subset(ChickWeight, as.integer(Chick) < 10)
chks <- transform(chks, Chick = fct_shuffle(Chick))

# Without reordering it's hard to match line to legend
ggplot(chks, aes(Time, weight, colour = Chick)) +
  geom_point() +
  geom_line()

# With reordering it's much easier
ggplot(chks, aes(Time, weight, colour = fct_reorder2(Chick, Time, weight))) +
  geom_point() +
  geom_line() +
  labs(colour = "Chick")
# fct_reorder() -------------------------------------------------------------
# Useful when a categorical variable is mapped to position
boxplot(Sepal.Width ~ Species, data = iris)
boxplot(Sepal.Width ~ fct_reorder(Species, Sepal.Width), data = iris)

# or with
library(ggplot2)
ggplot(iris, aes(fct_reorder(Species, Sepal.Width), Sepal.Width)) +
  geom_boxplot()

# fct_reorder2() -------------------------------------------------------------
# Useful when a categorical variable is mapped to color, size, shape etc

chks <- subset(ChickWeight, as.integer(Chick) < 10)
chks <- transform(chks, Chick = fct_shuffle(Chick))

# Without reordering it's hard to match line to legend
ggplot(chks, aes(Time, weight, colour = Chick)) +
  geom_point() +
  geom_line()

# With reordering it's much easier
ggplot(chks, aes(Time, weight, colour = fct_reorder2(Chick, Time, weight))) +
  geom_point() +
  geom_line() +
  labs(colour = "Chick")

Reverse order of factor levels

Description

This is sometimes useful when plotting a factor.

Usage

fct_rev(f)
fct_rev(f)

Arguments

`f`	A factor (or character vector).

Examples

f <- factor(c("a", "b", "c"))
fct_rev(f)
f <- factor(c("a", "b", "c"))
fct_rev(f)

Shift factor levels to left or right, wrapping around at end

Description

This is useful when the levels of an ordered factor are actually cyclical, with different conventions on the starting point.

Usage

fct_shift(f, n = 1L)
fct_shift(f, n = 1L)

Arguments

`f`	A factor.
`n`	Positive values shift to the left; negative values shift to the right.

Examples

x <- factor(
  c("Mon", "Tue", "Wed"),
  levels = c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"),
  ordered = TRUE
)
x
fct_shift(x)
fct_shift(x, 2)
fct_shift(x, -1)
x <- factor(
  c("Mon", "Tue", "Wed"),
  levels = c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"),
  ordered = TRUE
)
x
fct_shift(x)
fct_shift(x, 2)
fct_shift(x, -1)

Randomly permute factor levels

Description

Randomly permute factor levels

Usage

fct_shuffle(f)
fct_shuffle(f)

Arguments

`f`	A factor (or character vector).

Examples

f <- factor(c("a", "b", "c"))
fct_shuffle(f)
fct_shuffle(f)
f <- factor(c("a", "b", "c"))
fct_shuffle(f)
fct_shuffle(f)

Unify the levels in a list of factors

Description

Unify the levels in a list of factors

Usage

fct_unify(fs, levels = lvls_union(fs))
fct_unify(fs, levels = lvls_union(fs))

Arguments

`fs`	A list of factors
`levels`	Set of levels to apply to every factor. Default to union of all factor levels

Examples

fs <- list(factor("a"), factor("b"), factor(c("a", "b")))
fct_unify(fs)
fs <- list(factor("a"), factor("b"), factor(c("a", "b")))
fct_unify(fs)

Unique values of a factor, as a factor

Description

fct_unique() extracts the complete set of possible values from the levels of the factor, rather than looking at the actual values, like unique().

fct_unique() only uses the values of f in one way: it looks for implicit missing values so that they can be included in the result.

Usage

fct_unique(f)
fct_unique(f)

Arguments

f

A factor.

Value

A factor.

Examples

f <- fct(letters[rpois(100, 10)])
unique(f)     # in order of appearance
fct_unique(f) # in order of levels

f <- fct(letters[rpois(100, 2)], letters[1:20])
unique(f)     # levels that appear in data
fct_unique(f) # all possible levels
f <- fct(letters[rpois(100, 10)])
unique(f)     # in order of appearance
fct_unique(f) # in order of levels

f <- fct(letters[rpois(100, 2)], letters[1:20])
unique(f)     # levels that appear in data
fct_unique(f) # all possible levels

A sample of categorical variables from the General Social survey

Description

A sample of categorical variables from the General Social survey

Usage

gss_cat
gss_cat

Format

year: year of survey, 2000–2014 (every other year)
age: age. Maximum age truncated to 89.
marital: marital status
race: race
rincome: reported income
partyid: party affiliation
relig: religion
denom: denomination
tvhours: hours per day watching tv

Source

Downloaded from https://gssdataexplorer.norc.org/.

Examples

gss_cat

fct_count(gss_cat$relig)
fct_count(fct_lump(gss_cat$relig))
gss_cat

fct_count(gss_cat$relig)
fct_count(fct_lump(gss_cat$relig))

Low-level functions for manipulating levels

Description

lvls_reorder leaves values as they are, but changes the order. lvls_revalue changes the values of existing levels; there must be one new level for each old level. lvls_expand expands the set of levels; the new levels must include the old levels.

Usage

lvls_reorder(f, idx, ordered = NA)

lvls_revalue(f, new_levels)

lvls_expand(f, new_levels)
lvls_reorder(f, idx, ordered = NA)

lvls_revalue(f, new_levels)

lvls_expand(f, new_levels)

Arguments

`f`	A factor (or character vector).
`idx`	A integer index, with one integer for each existing level.
`ordered`	A logical which determines the "ordered" status of the output factor. `NA` preserves the existing status of the factor.
`new_levels`	A character vector of new levels.

Details

These functions are less helpful than the higher-level fct_ functions, but are safer than the very low-level manipulation of levels directly, because they are more specific, and hence can more carefully check their arguments.

Examples

f <- factor(c("a", "b", "c"))
lvls_reorder(f, 3:1)
lvls_revalue(f, c("apple", "banana", "carrot"))
lvls_expand(f, c("a", "b", "c", "d"))
f <- factor(c("a", "b", "c"))
lvls_reorder(f, 3:1)
lvls_revalue(f, c("apple", "banana", "carrot"))
lvls_expand(f, c("a", "b", "c", "d"))

Find all levels in a list of factors

Description

Find all levels in a list of factors

Usage

lvls_union(fs)
lvls_union(fs)

Arguments

`fs`	A list of factors.

Examples

fs <- list(factor("a"), factor("b"), factor(c("a", "b")))
lvls_union(fs)
fs <- list(factor("a"), factor("b"), factor(c("a", "b")))
lvls_union(fs)

Package 'forcats'

Help Index

Convert input to a factor

Description

Usage

Arguments

Details

Examples

Create a factor

Description

Usage

Arguments

Value

Examples

Anonymise factor levels

Description

Usage

Arguments

Examples

Concatenate factors, combining levels

Description

Usage

Arguments

Examples

Collapse factor levels into manually defined groups

Description

Usage

Arguments

Examples

Count entries in a factor

Description

Usage

Arguments

Value

Examples

Combine levels from two or more factors to create a new factor

Description

Usage

Arguments

Value

Examples

Drop unused levels

Description

Usage

Arguments

See Also

Examples

Add additional levels to a factor

Description

Usage

Arguments

See Also

Examples

Reorder factor levels by first appearance, frequency, or numeric order

Description

Usage

Arguments

Examples

Lump uncommon factor together levels into "other"

Description

Usage

Arguments

See Also

Examples

Test for presence of levels in a factor

Description

Usage

Arguments

Value

Examples

Convert between NA values and NA levels

Description

Usage

Arguments

Examples

Manually replace levels with "other"

Description

Usage

Arguments

See Also

Convert between `NA` values and `NA` levels