Title: | Tools for Working with Categorical Variables (Factors) |
---|---|
Description: | Helpers for reordering factor levels (including moving specified levels to front, ordering by first appearance, reversing, and randomly shuffling), and tools for modifying factor levels (including collapsing rare levels into other, 'anonymising', and manually 'recoding'). |
Authors: | Hadley Wickham [aut, cre], Posit Software, PBC [cph, fnd] |
Maintainer: | Hadley Wickham <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.0.9000 |
Built: | 2024-11-24 04:55:56 UTC |
Source: | https://github.com/tidyverse/forcats |
Compared to base R, when x
is a character, this function creates
levels in the order in which they appear, which will be the same on every
platform. (Base R sorts in the current locale which can vary from place
to place.) When x
is numeric, the ordering is based on the numeric
value and consistent with base R.
as_factor(x, ...) ## S3 method for class 'factor' as_factor(x, ...) ## S3 method for class 'character' as_factor(x, ...) ## S3 method for class 'numeric' as_factor(x, ...) ## S3 method for class 'logical' as_factor(x, ...)
as_factor(x, ...) ## S3 method for class 'factor' as_factor(x, ...) ## S3 method for class 'character' as_factor(x, ...) ## S3 method for class 'numeric' as_factor(x, ...) ## S3 method for class 'logical' as_factor(x, ...)
x |
Object to coerce to a factor. |
... |
Other arguments passed down to method. |
This is a generic function.
# Character object x <- c("a", "z", "g") as_factor(x) as.factor(x) # Character object containing numbers y <- c("1.1", "11", "2.2", "22") as_factor(y) as.factor(y) # Numeric object z <- as.numeric(y) as_factor(z) as.factor(z)
# Character object x <- c("a", "z", "g") as_factor(x) as.factor(x) # Character object containing numbers y <- c("1.1", "11", "2.2", "22") as_factor(y) as.factor(y) # Numeric object z <- as.numeric(y) as_factor(z) as.factor(z)
fct()
is a stricter version of factor()
that errors if your
specification of levels
is inconsistent with the values in x
.
fct(x = character(), levels = NULL, na = character())
fct(x = character(), levels = NULL, na = character())
x |
A character vector. Values must occur in either |
levels |
A character vector of known levels. If not supplied, will
be computed from the unique values of |
na |
A character vector of values that should become missing values. |
A factor.
# Use factors when you know the set of possible values a variable might take x <- c("A", "O", "O", "AB", "A") fct(x, levels = c("O", "A", "B", "AB")) # If you don't specify the levels, fct will create from the data # in the order that they're seen fct(x) # Differences with base R ----------------------------------------------- # factor() silently generates NAs x <- c("a", "b", "c") factor(x, levels = c("a", "b")) # fct() errors try(fct(x, levels = c("a", "b"))) # Unless you explicitly supply NA: fct(x, levels = c("a", "b"), na = "c") # factor() sorts default levels: factor(c("y", "x")) # fct() uses in order of appearance: fct(c("y", "x"))
# Use factors when you know the set of possible values a variable might take x <- c("A", "O", "O", "AB", "A") fct(x, levels = c("O", "A", "B", "AB")) # If you don't specify the levels, fct will create from the data # in the order that they're seen fct(x) # Differences with base R ----------------------------------------------- # factor() silently generates NAs x <- c("a", "b", "c") factor(x, levels = c("a", "b")) # fct() errors try(fct(x, levels = c("a", "b"))) # Unless you explicitly supply NA: fct(x, levels = c("a", "b"), na = "c") # factor() sorts default levels: factor(c("y", "x")) # fct() uses in order of appearance: fct(c("y", "x"))
Replaces factor levels with arbitrary numeric identifiers. Neither the values nor the order of the levels are preserved.
fct_anon(f, prefix = "")
fct_anon(f, prefix = "")
f |
A factor. |
prefix |
A character prefix to insert in front of the random labels. |
gss_cat$relig %>% fct_count() gss_cat$relig %>% fct_anon() %>% fct_count() gss_cat$relig %>% fct_anon("X") %>% fct_count()
gss_cat$relig %>% fct_count() gss_cat$relig %>% fct_anon() %>% fct_count() gss_cat$relig %>% fct_anon("X") %>% fct_count()
This is a useful way of patching together factors from multiple sources that really should have the same levels but don't.
fct_c(...)
fct_c(...)
... |
< |
fa <- factor("a") fb <- factor("b") fab <- factor(c("a", "b")) c(fa, fb, fab) fct_c(fa, fb, fab) # You can also pass a list of factors with !!! fs <- list(fa, fb, fab) fct_c(!!!fs)
fa <- factor("a") fb <- factor("b") fab <- factor(c("a", "b")) c(fa, fb, fab) fct_c(fa, fb, fab) # You can also pass a list of factors with !!! fs <- list(fa, fb, fab) fct_c(!!!fs)
Collapse factor levels into manually defined groups
fct_collapse(.f, ..., other_level = NULL, group_other = "DEPRECATED")
fct_collapse(.f, ..., other_level = NULL, group_other = "DEPRECATED")
.f |
A factor (or character vector). |
... |
< |
other_level |
Value of level used for "other" values. Always placed at end of levels. |
group_other |
Deprecated. Replace all levels not named in |
fct_count(gss_cat$partyid) partyid2 <- fct_collapse(gss_cat$partyid, missing = c("No answer", "Don't know"), other = "Other party", rep = c("Strong republican", "Not str republican"), ind = c("Ind,near rep", "Independent", "Ind,near dem"), dem = c("Not str democrat", "Strong democrat") ) fct_count(partyid2)
fct_count(gss_cat$partyid) partyid2 <- fct_collapse(gss_cat$partyid, missing = c("No answer", "Don't know"), other = "Other party", rep = c("Strong republican", "Not str republican"), ind = c("Ind,near rep", "Independent", "Ind,near dem"), dem = c("Not str democrat", "Strong democrat") ) fct_count(partyid2)
Count entries in a factor
fct_count(f, sort = FALSE, prop = FALSE)
fct_count(f, sort = FALSE, prop = FALSE)
f |
A factor (or character vector). |
sort |
If |
prop |
If |
A tibble with columns f
, n
and p
, if prop is TRUE
.
f <- factor(sample(letters)[rpois(1000, 10)]) table(f) fct_count(f) fct_count(f, sort = TRUE) fct_count(f, sort = TRUE, prop = TRUE)
f <- factor(sample(letters)[rpois(1000, 10)]) table(f) fct_count(f) fct_count(f, sort = TRUE) fct_count(f, sort = TRUE, prop = TRUE)
Computes a factor whose levels are all the combinations of the levels of the input factors.
fct_cross(..., sep = ":", keep_empty = FALSE)
fct_cross(..., sep = ":", keep_empty = FALSE)
... |
< |
sep |
A character string to separate the levels |
keep_empty |
If TRUE, keep combinations with no observations as levels |
The new factor
fruit <- factor(c("apple", "kiwi", "apple", "apple")) colour <- factor(c("green", "green", "red", "green")) eaten <- c("yes", "no", "yes", "no") fct_cross(fruit, colour) fct_cross(fruit, colour, eaten) fct_cross(fruit, colour, keep_empty = TRUE)
fruit <- factor(c("apple", "kiwi", "apple", "apple")) colour <- factor(c("green", "green", "red", "green")) eaten <- c("yes", "no", "yes", "no") fct_cross(fruit, colour) fct_cross(fruit, colour, eaten) fct_cross(fruit, colour, keep_empty = TRUE)
Compared to base::droplevels()
, does not drop NA
levels that have values.
fct_drop(f, only = NULL)
fct_drop(f, only = NULL)
f |
A factor (or character vector). |
only |
A character vector restricting the set of levels to be dropped. If supplied, only levels that have no entries and appear in this vector will be removed. |
fct_expand()
to add additional levels to a factor.
f <- factor(c("a", "b"), levels = c("a", "b", "c")) f fct_drop(f) # Set only to restrict which levels to drop fct_drop(f, only = "a") fct_drop(f, only = "c")
f <- factor(c("a", "b"), levels = c("a", "b", "c")) f fct_drop(f) # Set only to restrict which levels to drop fct_drop(f, only = "a") fct_drop(f, only = "c")
Add additional levels to a factor
fct_expand(f, ..., after = Inf)
fct_expand(f, ..., after = Inf)
f |
A factor (or character vector). |
... |
Additional levels to add to the factor. Levels that already exist will be silently ignored. |
after |
Where should the new values be placed? |
fct_drop()
to drop unused factor levels.
f <- factor(sample(letters[1:3], 20, replace = TRUE)) f fct_expand(f, "d", "e", "f") fct_expand(f, letters[1:6]) fct_expand(f, "Z", after = 0)
f <- factor(sample(letters[1:3], 20, replace = TRUE)) f fct_expand(f, "d", "e", "f") fct_expand(f, letters[1:6]) fct_expand(f, "Z", after = 0)
This family of functions changes only the order of the levels.
fct_inorder()
: by the order in which they first appear.
fct_infreq()
: by number of observations with each level (largest first)
fct_inseq()
: by numeric value of level.
fct_inorder(f, ordered = NA) fct_infreq(f, w = NULL, ordered = NA) fct_inseq(f, ordered = NA)
fct_inorder(f, ordered = NA) fct_infreq(f, w = NULL, ordered = NA) fct_inseq(f, ordered = NA)
f |
A factor |
ordered |
A logical which determines the "ordered" status of the
output factor. |
w |
An optional numeric vector giving weights for frequency of each value (not level) in f. |
f <- factor(c("b", "b", "a", "c", "c", "c")) f fct_inorder(f) fct_infreq(f) f <- factor(1:3, levels = c("3", "2", "1")) f fct_inseq(f)
f <- factor(c("b", "b", "a", "c", "c", "c")) f fct_inorder(f) fct_infreq(f) f <- factor(1:3, levels = c("3", "2", "1")) f fct_inseq(f)
A family for lumping together levels that meet some criteria.
fct_lump_min()
: lumps levels that appear fewer than min
times.
fct_lump_prop()
: lumps levels that appear in fewer than (or equal to)
prop * n
times.
fct_lump_n()
lumps all levels except for the n
most frequent
(or least frequent if n < 0
)
fct_lump_lowfreq()
lumps together the least frequent levels, ensuring
that "other" is still the smallest level.
fct_lump()
exists primarily for historical reasons, as it automatically
picks between these different methods depending on its arguments.
We no longer recommend that you use it.
fct_lump( f, n, prop, w = NULL, other_level = "Other", ties.method = c("min", "average", "first", "last", "random", "max") ) fct_lump_min(f, min, w = NULL, other_level = "Other") fct_lump_prop(f, prop, w = NULL, other_level = "Other") fct_lump_n( f, n, w = NULL, other_level = "Other", ties.method = c("min", "average", "first", "last", "random", "max") ) fct_lump_lowfreq(f, w = NULL, other_level = "Other")
fct_lump( f, n, prop, w = NULL, other_level = "Other", ties.method = c("min", "average", "first", "last", "random", "max") ) fct_lump_min(f, min, w = NULL, other_level = "Other") fct_lump_prop(f, prop, w = NULL, other_level = "Other") fct_lump_n( f, n, w = NULL, other_level = "Other", ties.method = c("min", "average", "first", "last", "random", "max") ) fct_lump_lowfreq(f, w = NULL, other_level = "Other")
f |
A factor (or character vector). |
n |
Positive |
prop |
Positive |
w |
An optional numeric vector giving weights for frequency of each value (not level) in f. |
other_level |
Value of level used for "other" values. Always placed at end of levels. |
ties.method |
A character string specifying how ties are
treated. See |
min |
Preserve levels that appear at least |
fct_other()
to convert specified levels to other.
x <- factor(rep(LETTERS[1:9], times = c(40, 10, 5, 27, 1, 1, 1, 1, 1))) x %>% table() x %>% fct_lump_n(3) %>% table() x %>% fct_lump_prop(0.10) %>% table() x %>% fct_lump_min(5) %>% table() x %>% fct_lump_lowfreq() %>% table() x <- factor(letters[rpois(100, 5)]) x table(x) table(fct_lump_lowfreq(x)) # Use positive values to collapse the rarest fct_lump_n(x, n = 3) fct_lump_prop(x, prop = 0.1) # Use negative values to collapse the most common fct_lump_n(x, n = -3) fct_lump_prop(x, prop = -0.1) # Use weighted frequencies w <- c(rep(2, 50), rep(1, 50)) fct_lump_n(x, n = 5, w = w) # Use ties.method to control how tied factors are collapsed fct_lump_n(x, n = 6) fct_lump_n(x, n = 6, ties.method = "max") # Use fct_lump_min() to lump together all levels with fewer than `n` values table(fct_lump_min(x, min = 10)) table(fct_lump_min(x, min = 15))
x <- factor(rep(LETTERS[1:9], times = c(40, 10, 5, 27, 1, 1, 1, 1, 1))) x %>% table() x %>% fct_lump_n(3) %>% table() x %>% fct_lump_prop(0.10) %>% table() x %>% fct_lump_min(5) %>% table() x %>% fct_lump_lowfreq() %>% table() x <- factor(letters[rpois(100, 5)]) x table(x) table(fct_lump_lowfreq(x)) # Use positive values to collapse the rarest fct_lump_n(x, n = 3) fct_lump_prop(x, prop = 0.1) # Use negative values to collapse the most common fct_lump_n(x, n = -3) fct_lump_prop(x, prop = -0.1) # Use weighted frequencies w <- c(rep(2, 50), rep(1, 50)) fct_lump_n(x, n = 5, w = w) # Use ties.method to control how tied factors are collapsed fct_lump_n(x, n = 6) fct_lump_n(x, n = 6, ties.method = "max") # Use fct_lump_min() to lump together all levels with fewer than `n` values table(fct_lump_min(x, min = 10)) table(fct_lump_min(x, min = 15))
Do any of lvls
occur in f
? Compared to %in%, this function validates
lvls
to ensure that they're actually present in f
. In other words,
x %in% "not present"
will return FALSE
, but fct_match(x, "not present")
will throw an error.
fct_match(f, lvls)
fct_match(f, lvls)
f |
A factor (or character vector). |
lvls |
A character vector specifying levels to look for. |
A logical vector
table(fct_match(gss_cat$marital, c("Married", "Divorced"))) # Compare to %in%, misspelled levels throw an error table(gss_cat$marital %in% c("Maried", "Davorced")) ## Not run: table(fct_match(gss_cat$marital, c("Maried", "Davorced"))) ## End(Not run)
table(fct_match(gss_cat$marital, c("Married", "Divorced"))) # Compare to %in%, misspelled levels throw an error table(gss_cat$marital %in% c("Maried", "Davorced")) ## Not run: table(fct_match(gss_cat$marital, c("Maried", "Davorced"))) ## End(Not run)
NA
values and NA
levelsThere are two ways to represent missing values in factors: in the values
and in the levels. NA
s in the values are most useful for data analysis
(since is.na()
returns what you expect), but because the NA
is not
explicitly recorded in the levels, there's no way to control its position
(it's almost always displayed last or not at all). Putting the NA
s in the levels allows
you to control its display, at the cost of losing accurate is.na()
reporting.
(It is possible to have a factor with missing values in both the values and the levels but it requires some explicit gymnastics and we don't recommend it.)
fct_na_value_to_level(f, level = NA) fct_na_level_to_value(f, extra_levels = NULL)
fct_na_value_to_level(f, level = NA) fct_na_level_to_value(f, extra_levels = NULL)
f |
A factor (or character vector). |
level |
Optionally, instead of converting the |
extra_levels |
Optionally, a character vector giving additional levels
that should also be converted to |
# Most factors store NAs in the values: f1 <- fct(c("a", "b", NA, "c", "b", NA)) levels(f1) as.integer(f1) is.na(f1) # But it's also possible to store them in the levels f2 <- fct_na_value_to_level(f1) levels(f2) as.integer(f2) is.na(f2) # If needed, you can convert back to NAs in the values: f3 <- fct_na_level_to_value(f2) levels(f3) as.integer(f3) is.na(f3)
# Most factors store NAs in the values: f1 <- fct(c("a", "b", NA, "c", "b", NA)) levels(f1) as.integer(f1) is.na(f1) # But it's also possible to store them in the levels f2 <- fct_na_value_to_level(f1) levels(f2) as.integer(f2) is.na(f2) # If needed, you can convert back to NAs in the values: f3 <- fct_na_level_to_value(f2) levels(f3) as.integer(f3) is.na(f3)
Manually replace levels with "other"
fct_other(f, keep, drop, other_level = "Other")
fct_other(f, keep, drop, other_level = "Other")
f |
A factor (or character vector). |
keep , drop
|
Pick one of
|
other_level |
Value of level used for "other" values. Always placed at end of levels. |
fct_lump()
to automatically convert the rarest (or most
common) levels to "other".
x <- factor(rep(LETTERS[1:9], times = c(40, 10, 5, 27, 1, 1, 1, 1, 1))) fct_other(x, keep = c("A", "B")) fct_other(x, drop = c("A", "B"))
x <- factor(rep(LETTERS[1:9], times = c(40, 10, 5, 27, 1, 1, 1, 1, 1))) fct_other(x, keep = c("A", "B")) fct_other(x, drop = c("A", "B"))
Change factor levels by hand
fct_recode(.f, ...)
fct_recode(.f, ...)
.f |
A factor (or character vector). |
... |
< |
x <- factor(c("apple", "bear", "banana", "dear")) fct_recode(x, fruit = "apple", fruit = "banana") # If you make a mistake you'll get a warning fct_recode(x, fruit = "apple", fruit = "bananana") # If you name the level NULL it will be removed fct_recode(x, NULL = "apple", fruit = "banana") # Wrap the left hand side in quotes if it contains special variables fct_recode(x, "an apple" = "apple", "a bear" = "bear") # When passing a named vector to rename levels use !!! to splice x <- factor(c("apple", "bear", "banana", "dear")) levels <- c(fruit = "apple", fruit = "banana") fct_recode(x, !!!levels)
x <- factor(c("apple", "bear", "banana", "dear")) fct_recode(x, fruit = "apple", fruit = "banana") # If you make a mistake you'll get a warning fct_recode(x, fruit = "apple", fruit = "bananana") # If you name the level NULL it will be removed fct_recode(x, NULL = "apple", fruit = "banana") # Wrap the left hand side in quotes if it contains special variables fct_recode(x, "an apple" = "apple", "a bear" = "bear") # When passing a named vector to rename levels use !!! to splice x <- factor(c("apple", "bear", "banana", "dear")) levels <- c(fruit = "apple", fruit = "banana") fct_recode(x, !!!levels)
Relabel factor levels with a function, collapsing as necessary
fct_relabel(.f, .fun, ...)
fct_relabel(.f, .fun, ...)
.f |
A factor (or character vector). |
.fun |
A function to be applied to each level. Must accept one character argument and return a character vector of the same length as its input. You can also use |
... |
Additional arguments to |
gss_cat$partyid %>% fct_count() gss_cat$partyid %>% fct_relabel(~ gsub(",", ", ", .x)) %>% fct_count() convert_income <- function(x) { regex <- "^(?:Lt |)[$]([0-9]+).*$" is_range <- grepl(regex, x) num_income <- as.numeric(gsub(regex, "\\1", x[is_range])) num_income <- trunc(num_income / 5000) * 5000 x[is_range] <- paste0("Gt $", num_income) x } fct_count(gss_cat$rincome) convert_income(levels(gss_cat$rincome)) rincome2 <- fct_relabel(gss_cat$rincome, convert_income) fct_count(rincome2)
gss_cat$partyid %>% fct_count() gss_cat$partyid %>% fct_relabel(~ gsub(",", ", ", .x)) %>% fct_count() convert_income <- function(x) { regex <- "^(?:Lt |)[$]([0-9]+).*$" is_range <- grepl(regex, x) num_income <- as.numeric(gsub(regex, "\\1", x[is_range])) num_income <- trunc(num_income / 5000) * 5000 x[is_range] <- paste0("Gt $", num_income) x } fct_count(gss_cat$rincome) convert_income(levels(gss_cat$rincome)) rincome2 <- fct_relabel(gss_cat$rincome, convert_income) fct_count(rincome2)
This is a generalisation of stats::relevel()
that allows you to move any
number of levels to any location.
fct_relevel(.f, ..., after = 0L)
fct_relevel(.f, ..., after = 0L)
.f |
A factor (or character vector). |
... |
Either a function (or formula), or character levels. A function will be called with the current levels as input, and the return value (which must be a character vector) will be used to relevel the factor. Any levels not mentioned will be left in their existing order, by default after the explicitly mentioned levels. Supports tidy dots. |
after |
Where should the new values be placed? |
f <- factor(c("a", "b", "c", "d"), levels = c("b", "c", "d", "a")) fct_relevel(f) fct_relevel(f, "a") fct_relevel(f, "b", "a") # Move to the third position fct_relevel(f, "a", after = 2) # Relevel to the end fct_relevel(f, "a", after = Inf) fct_relevel(f, "a", after = 3) # Relevel with a function fct_relevel(f, sort) fct_relevel(f, sample) fct_relevel(f, rev) # Using 'Inf' allows you to relevel to the end when the number # of levels is unknown or variable (e.g. vectorised operations) df <- forcats::gss_cat[, c("rincome", "denom")] lapply(df, levels) df2 <- lapply(df, fct_relevel, "Don't know", after = Inf) lapply(df2, levels) # You'll get a warning if the levels don't exist fct_relevel(f, "e")
f <- factor(c("a", "b", "c", "d"), levels = c("b", "c", "d", "a")) fct_relevel(f) fct_relevel(f, "a") fct_relevel(f, "b", "a") # Move to the third position fct_relevel(f, "a", after = 2) # Relevel to the end fct_relevel(f, "a", after = Inf) fct_relevel(f, "a", after = 3) # Relevel with a function fct_relevel(f, sort) fct_relevel(f, sample) fct_relevel(f, rev) # Using 'Inf' allows you to relevel to the end when the number # of levels is unknown or variable (e.g. vectorised operations) df <- forcats::gss_cat[, c("rincome", "denom")] lapply(df, levels) df2 <- lapply(df, fct_relevel, "Don't know", after = Inf) lapply(df2, levels) # You'll get a warning if the levels don't exist fct_relevel(f, "e")
fct_reorder()
is useful for 1d displays where the factor is mapped to
position; fct_reorder2()
for 2d displays where the factor is mapped to
a non-position aesthetic. last2()
and first2()
are helpers for fct_reorder2()
;
last2()
finds the last value of y
when sorted by x
; first2()
finds the first value.
fct_reorder( .f, .x, .fun = median, ..., .na_rm = NULL, .default = Inf, .desc = FALSE ) fct_reorder2( .f, .x, .y, .fun = last2, ..., .na_rm = NULL, .default = -Inf, .desc = TRUE ) last2(.x, .y) first2(.x, .y)
fct_reorder( .f, .x, .fun = median, ..., .na_rm = NULL, .default = Inf, .desc = FALSE ) fct_reorder2( .f, .x, .y, .fun = last2, ..., .na_rm = NULL, .default = -Inf, .desc = TRUE ) last2(.x, .y) first2(.x, .y)
.f |
A factor (or character vector). |
.x , .y
|
The levels of |
.fun |
n summary function. It should take one vector for
|
... |
Other arguments passed on to |
.na_rm |
Should |
.default |
What default value should we use for |
.desc |
Order in descending order? Note the default is different
between |
# fct_reorder() ------------------------------------------------------------- # Useful when a categorical variable is mapped to position boxplot(Sepal.Width ~ Species, data = iris) boxplot(Sepal.Width ~ fct_reorder(Species, Sepal.Width), data = iris) # or with library(ggplot2) ggplot(iris, aes(fct_reorder(Species, Sepal.Width), Sepal.Width)) + geom_boxplot() # fct_reorder2() ------------------------------------------------------------- # Useful when a categorical variable is mapped to color, size, shape etc chks <- subset(ChickWeight, as.integer(Chick) < 10) chks <- transform(chks, Chick = fct_shuffle(Chick)) # Without reordering it's hard to match line to legend ggplot(chks, aes(Time, weight, colour = Chick)) + geom_point() + geom_line() # With reordering it's much easier ggplot(chks, aes(Time, weight, colour = fct_reorder2(Chick, Time, weight))) + geom_point() + geom_line() + labs(colour = "Chick")
# fct_reorder() ------------------------------------------------------------- # Useful when a categorical variable is mapped to position boxplot(Sepal.Width ~ Species, data = iris) boxplot(Sepal.Width ~ fct_reorder(Species, Sepal.Width), data = iris) # or with library(ggplot2) ggplot(iris, aes(fct_reorder(Species, Sepal.Width), Sepal.Width)) + geom_boxplot() # fct_reorder2() ------------------------------------------------------------- # Useful when a categorical variable is mapped to color, size, shape etc chks <- subset(ChickWeight, as.integer(Chick) < 10) chks <- transform(chks, Chick = fct_shuffle(Chick)) # Without reordering it's hard to match line to legend ggplot(chks, aes(Time, weight, colour = Chick)) + geom_point() + geom_line() # With reordering it's much easier ggplot(chks, aes(Time, weight, colour = fct_reorder2(Chick, Time, weight))) + geom_point() + geom_line() + labs(colour = "Chick")
This is sometimes useful when plotting a factor.
fct_rev(f)
fct_rev(f)
f |
A factor (or character vector). |
f <- factor(c("a", "b", "c")) fct_rev(f)
f <- factor(c("a", "b", "c")) fct_rev(f)
This is useful when the levels of an ordered factor are actually cyclical, with different conventions on the starting point.
fct_shift(f, n = 1L)
fct_shift(f, n = 1L)
f |
A factor. |
n |
Positive values shift to the left; negative values shift to the right. |
x <- factor( c("Mon", "Tue", "Wed"), levels = c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"), ordered = TRUE ) x fct_shift(x) fct_shift(x, 2) fct_shift(x, -1)
x <- factor( c("Mon", "Tue", "Wed"), levels = c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"), ordered = TRUE ) x fct_shift(x) fct_shift(x, 2) fct_shift(x, -1)
Randomly permute factor levels
fct_shuffle(f)
fct_shuffle(f)
f |
A factor (or character vector). |
f <- factor(c("a", "b", "c")) fct_shuffle(f) fct_shuffle(f)
f <- factor(c("a", "b", "c")) fct_shuffle(f) fct_shuffle(f)
Unify the levels in a list of factors
fct_unify(fs, levels = lvls_union(fs))
fct_unify(fs, levels = lvls_union(fs))
fs |
A list of factors |
levels |
Set of levels to apply to every factor. Default to union of all factor levels |
fs <- list(factor("a"), factor("b"), factor(c("a", "b"))) fct_unify(fs)
fs <- list(factor("a"), factor("b"), factor(c("a", "b"))) fct_unify(fs)
fct_unique()
extracts the complete set of possible values from the
levels of the factor, rather than looking at the actual values, like
unique()
.
fct_unique()
only uses the values of f
in one way: it looks for
implicit missing values so that they can be included in the result.
fct_unique(f)
fct_unique(f)
f |
A factor. |
A factor.
f <- fct(letters[rpois(100, 10)]) unique(f) # in order of appearance fct_unique(f) # in order of levels f <- fct(letters[rpois(100, 2)], letters[1:20]) unique(f) # levels that appear in data fct_unique(f) # all possible levels
f <- fct(letters[rpois(100, 10)]) unique(f) # in order of appearance fct_unique(f) # in order of levels f <- fct(letters[rpois(100, 2)], letters[1:20]) unique(f) # levels that appear in data fct_unique(f) # all possible levels
A sample of categorical variables from the General Social survey
gss_cat
gss_cat
year of survey, 2000–2014 (every other year)
age. Maximum age truncated to 89.
marital status
race
reported income
party affiliation
religion
denomination
hours per day watching tv
Downloaded from https://gssdataexplorer.norc.org/.
gss_cat fct_count(gss_cat$relig) fct_count(fct_lump(gss_cat$relig))
gss_cat fct_count(gss_cat$relig) fct_count(fct_lump(gss_cat$relig))
lvls_reorder
leaves values as they are, but changes the order.
lvls_revalue
changes the values of existing levels; there must
be one new level for each old level.
lvls_expand
expands the set of levels; the new levels must
include the old levels.
lvls_reorder(f, idx, ordered = NA) lvls_revalue(f, new_levels) lvls_expand(f, new_levels)
lvls_reorder(f, idx, ordered = NA) lvls_revalue(f, new_levels) lvls_expand(f, new_levels)
f |
A factor (or character vector). |
idx |
A integer index, with one integer for each existing level. |
ordered |
A logical which determines the "ordered" status of the
output factor. |
new_levels |
A character vector of new levels. |
These functions are less helpful than the higher-level fct_
functions,
but are safer than the very low-level manipulation of levels directly,
because they are more specific, and hence can more carefully check their
arguments.
f <- factor(c("a", "b", "c")) lvls_reorder(f, 3:1) lvls_revalue(f, c("apple", "banana", "carrot")) lvls_expand(f, c("a", "b", "c", "d"))
f <- factor(c("a", "b", "c")) lvls_reorder(f, 3:1) lvls_revalue(f, c("apple", "banana", "carrot")) lvls_expand(f, c("a", "b", "c", "d"))
Find all levels in a list of factors
lvls_union(fs)
lvls_union(fs)
fs |
A list of factors. |
fs <- list(factor("a"), factor("b"), factor(c("a", "b"))) lvls_union(fs)
fs <- list(factor("a"), factor("b"), factor(c("a", "b"))) lvls_union(fs)