---
title: "Nested data"
output: rmarkdown::html_vignette
description: |
A nested data frame contains a list-column of data frames. It's an
alternative way of representing grouped data, that works particularly well
when you're modelling.
vignette: >
%\VignetteIndexEntry{Nested data}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r setup, message = FALSE}
library(tidyr)
library(dplyr)
library(purrr)
```
## Basics
A nested data frame is a data frame where one (or more) columns is a list of data frames. You can create simple nested data frames by hand:
```{r}
df1 <- tibble(
g = c(1, 2, 3),
data = list(
tibble(x = 1, y = 2),
tibble(x = 4:5, y = 6:7),
tibble(x = 10)
)
)
df1
```
(It is possible to create list-columns in regular data frames, not just in tibbles, but it's considerably more work because the default behaviour of `data.frame()` is to treat lists as lists of columns.)
But more commonly you'll create them with `tidyr::nest()`:
```{r}
df2 <- tribble(
~g, ~x, ~y,
1, 1, 2,
2, 4, 6,
2, 5, 7,
3, 10, NA
)
df2 %>% nest(data = c(x, y))
```
`nest()` specifies which variables should be nested inside; an alternative is to use `dplyr::group_by()` to describe which variables should be kept outside.
```{r}
df2 %>% group_by(g) %>% nest()
```
I think nesting is easiest to understand in connection to grouped data: each row in the output corresponds to one _group_ in the input. We'll see shortly this is particularly convenient when you have other per-group objects.
The opposite of `nest()` is `unnest()`. You give it the name of a list-column containing data frames, and it row-binds the data frames together, repeating the outer columns the right number of times to line up.
```{r}
df1 %>% unnest(data)
```
## Nested data and models
Nested data is a great fit for problems where you have one of _something_ for each group. A common place this arises is when you're fitting multiple models.
```{r}
mtcars_nested <- mtcars %>%
group_by(cyl) %>%
nest()
mtcars_nested
```
Once you have a list of data frames, it's very natural to produce a list of models:
```{r}
mtcars_nested <- mtcars_nested %>%
mutate(model = map(data, function(df) lm(mpg ~ wt, data = df)))
mtcars_nested
```
And then you could even produce a list of predictions:
```{r}
mtcars_nested <- mtcars_nested %>%
mutate(pred = map(model, predict))
mtcars_nested
```
This workflow works particularly well in conjunction with [broom](https://broom.tidymodels.org/), which makes it easy to turn models into tidy data frames which can then be `unnest()`ed to get back to flat data frames. You can see a bigger example in the [broom and dplyr vignette](https://broom.tidymodels.org/articles/broom_and_dplyr.html).