Title: | A Grammar of Nested Data Manipulation |
---|---|
Description: | Provides functions for manipulating nested data frames in a list-column using 'dplyr' <https://dplyr.tidyverse.org/> syntax. Rather than unnesting, then manipulating a data frame, 'nplyr' allows users to manipulate each nested data frame directly. 'nplyr' is a wrapper for 'dplyr' functions that provide tools for common data manipulation steps: filtering rows, selecting columns, summarising grouped data, among others. |
Authors: | Mark Rieke [aut],
Bolívar Aponte Rolón [cre] |
Maintainer: | Bolívar Aponte Rolón <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.3.0 |
Built: | 2025-03-02 06:20:30 UTC |
Source: | https://github.com/jibarozzo/nplyr |
A toy dataset containing 500 responses to a job satisfaction survey. The responses were randomly generated using the Qualtrics survey platform.
job_survey
job_survey
A data frame with 500 rows and 6 variables:
name of survey
respondent age
city the respondent resides in
field that the respondent that works in
respondent's job satisfaction (on a scale from extremely satisfied to extremely dissatisfied)
respondent's annual salary, in thousands of dollars
nest_arrange()
orders the rows of nested data frames by the values of
selected columns.
nest_arrange(.data, .nest_data, ..., .by_group = FALSE)
nest_arrange(.data, .nest_data, ..., .by_group = FALSE)
.data |
A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr). |
.nest_data |
A list-column containing data frames |
... |
Variables, or functions of variables. Use |
.by_group |
If |
nest_arrange()
is largely a wrapper for dplyr::arrange()
and maintains
the functionality of arrange()
within each nested data frame. For more
information on arrange()
, please refer to the documentation in
dplyr
.
An object of the same type as .data
. Each object in the column .nest_data
will be also of the same type as the input. Each object in .nest_data
has
the following properties:
All rows appear in the output, but (usually) in a different place.
Columns are not modified.
Groups are not modified.
Data frame attributes are preserved.
Other single table verbs:
nest_filter()
,
nest_mutate()
,
nest_rename()
,
nest_select()
,
nest_slice()
,
nest_summarise()
gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent) gm_nest %>% nest_arrange(country_data, pop) gm_nest %>% nest_arrange(country_data, desc(pop))
gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent) gm_nest %>% nest_arrange(country_data, pop) gm_nest %>% nest_arrange(country_data, desc(pop))
nest_count()
lets you quickly count the unique values of one or more
variables within each nested data frame. nest_count()
results in a summary
with one row per each set of variables to count by. nest_add_count()
is
equivalent with the exception that it retains all rows and adds a new column
with group-wise counts.
nest_count(.data, .nest_data, ..., wt = NULL, sort = FALSE, name = NULL) nest_add_count(.data, .nest_data, ..., wt = NULL, sort = FALSE, name = NULL)
nest_count(.data, .nest_data, ..., wt = NULL, sort = FALSE, name = NULL) nest_add_count(.data, .nest_data, ..., wt = NULL, sort = FALSE, name = NULL)
.data |
A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr). |
.nest_data |
A list-column containing data frames |
... |
Variables to group by. |
wt |
Frequency weights.
Can be
|
sort |
If |
name |
The name of the new column in the output. |
nest_count()
and nest_add_count()
are largely wrappers for
dplyr::count()
and dplyr::add_count()
and maintain the functionality of
count()
and add_count()
within each nested data frame. For more
information on count()
and add_count()
, please refer to the documentation
in dplyr
.
An object of the same type as .data
. Each object in the column .nest_data
will also be of the same type as the input. nest_count()
and
nest_add_count()
group each object in .nest_data
transiently, so the
output returned in .nest_data
will have the same groups as the input.
gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent) # count the number of times each country appears in each nested tibble gm_nest %>% nest_count(country_data, country) gm_nest %>% nest_add_count(country_data, country) # count the sum of population for each country in each nested tibble gm_nest %>% nest_count(country_data, country, wt = pop) gm_nest %>% nest_add_count(country_data, country, wt = pop)
gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent) # count the number of times each country appears in each nested tibble gm_nest %>% nest_count(country_data, country) gm_nest %>% nest_add_count(country_data, country) # count the sum of population for each country in each nested tibble gm_nest %>% nest_count(country_data, country, wt = pop) gm_nest %>% nest_add_count(country_data, country, wt = pop)
nest_distinct()
selects only unique/distinct rows in a nested data frame.
nest_distinct(.data, .nest_data, ..., .keep_all = FALSE)
nest_distinct(.data, .nest_data, ..., .keep_all = FALSE)
.data |
A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr). |
.nest_data |
A list-column containing data frames |
... |
Optional variables to use when determining uniqueness. If there are multiple rows for a given combination of inputs, only the first row will be preserved. If omitted, will use all variables. |
.keep_all |
If |
nest_distinct()
is largely a wrapper for dplyr::distinct()
and maintains
the functionality of distinct()
within each nested data frame. For more
information on distinct()
, please refer to the documentation in
dplyr
.
An object of the same type as .data
. Each object in the column .nest_data
will also be of the same type as the input. Each object in .nest_data
has
the following properties:
Rows are a subset of the input but appear in the same order.
Columns are not modified if ...
is empty or .keep_all
is TRUE
.
Otherwise, nest_distinct()
first calls dplyr::mutate()
to create new
columns within each object in .nest_data
.
Groups are not modified.
Data frame attributes are preserved.
gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent) gm_nest %>% nest_distinct(country_data, country) gm_nest %>% nest_distinct(country_data, country, year)
gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent) gm_nest %>% nest_distinct(country_data, country) gm_nest %>% nest_distinct(country_data, country, year)
nest_drop_na()
is used to drop rows from each data frame in a column of
nested data frames.
nest_drop_na(.data, .nest_data, ...)
nest_drop_na(.data, .nest_data, ...)
.data |
A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr). |
.nest_data |
A list-column containing data frames |
... |
Columns within |
nest_drop_na()
is a wrapper for tidyr::drop_na()
and maintains the functionality
of drop_na()
within each nested data frame. For more information on drop_na()
please refer to the documentation in 'tidyr'.
An object of the same type as .data
. Each object in the column .nest_data
will have rows dropped according to the presence of NAs.
Other tidyr verbs:
nest_extract()
,
nest_fill()
,
nest_replace_na()
,
nest_separate()
,
nest_unite()
gm <- gapminder::gapminder # randomly insert NAs into the dataframe & nest set.seed(123) gm <- gm %>% dplyr::mutate(pop = dplyr::if_else(runif(nrow(gm)) >= 0.9, NA_integer_, pop)) gm_nest <- gm %>% tidyr::nest(country_data = -continent) # drop rows where an NA exists in column `pop` gm_nest %>% nest_drop_na(country_data, pop)
gm <- gapminder::gapminder # randomly insert NAs into the dataframe & nest set.seed(123) gm <- gm %>% dplyr::mutate(pop = dplyr::if_else(runif(nrow(gm)) >= 0.9, NA_integer_, pop)) gm_nest <- gm %>% tidyr::nest(country_data = -continent) # drop rows where an NA exists in column `pop` gm_nest %>% nest_drop_na(country_data, pop)
nest_extract()
is used to extract capturing groups from a column in a nested
data frame using regular expressions into a new column. If the groups don't
match, or the input is NA, the output will be NA.
nest_extract( .data, .nest_data, col, into, regex = "([[:alnum:]]+)", remove = TRUE, convert = FALSE, ... )
nest_extract( .data, .nest_data, col, into, regex = "([[:alnum:]]+)", remove = TRUE, convert = FALSE, ... )
.data |
A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr). |
.nest_data |
A list-column containing data frames |
col |
Column name or position within This argument is passed by expression and supports quasiquotation (you can unquote column names or column positions). |
into |
Names of new variables to create as character vector.
Use |
regex |
A string representing a regular expression used to extract the
desired values. There should be one group (defined by |
remove |
If |
convert |
If NB: this will cause string |
... |
Additional arguments passed on to |
nest_extract()
is a wrapper for tidyr::extract()
and maintains the functionality
of extract()
within each nested data frame. For more information on extract()
please refer to the documentation in 'tidyr'.
An object of the same type as .data
. Each object in the column .nest_data
will have new columns created according to the capture groups specified in
the regular expression.
Other tidyr verbs:
nest_drop_na()
,
nest_fill()
,
nest_replace_na()
,
nest_separate()
,
nest_unite()
set.seed(123) gm <- gapminder::gapminder gm <- gm %>% dplyr::mutate(comb = sample(c(NA, "a-b", "a-d", "b-c", "d-e"), size = nrow(gm), replace = TRUE)) gm_nest <- gm %>% tidyr::nest(country_data = -continent) gm_nest %>% nest_extract(country_data, col = comb, into = c("var1","var2"), regex = "([[:alnum:]]+)-([[:alnum:]]+)")
set.seed(123) gm <- gapminder::gapminder gm <- gm %>% dplyr::mutate(comb = sample(c(NA, "a-b", "a-d", "b-c", "d-e"), size = nrow(gm), replace = TRUE)) gm_nest <- gm %>% tidyr::nest(country_data = -continent) gm_nest %>% nest_extract(country_data, col = comb, into = c("var1","var2"), regex = "([[:alnum:]]+)-([[:alnum:]]+)")
nest_fill()
is used to fill missing values in selected columns of nested data
frames using the next or previous entries in a column of nested data frames.
nest_fill( .data, .nest_data, ..., .direction = c("down", "up", "downup", "updown") )
nest_fill( .data, .nest_data, ..., .direction = c("down", "up", "downup", "updown") )
.data |
A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr). |
.nest_data |
A list-column containing data frames |
... |
< |
.direction |
Direction in which to fill missing values. Currently either "down" (the default), "up", "downup" (i.e. first down and then up) or "updown" (first up and then down). |
nest_fill()
is a wrapper for tidyr::fill()
and maintains the functionality
of fill()
within each nested data frame. For more information on fill()
please refer to the documentation in 'tidyr'.
An object of the same type as .data
. Each object in the column .nest_data
will have the chosen columns filled in the direction specified by .direction
.
Other tidyr verbs:
nest_drop_na()
,
nest_extract()
,
nest_replace_na()
,
nest_separate()
,
nest_unite()
set.seed(123) gm <- gapminder::gapminder %>% dplyr::mutate(pop = dplyr::if_else(runif(dplyr::n()) >= 0.9, NA_integer_, pop)) gm_nest <- gm %>% tidyr::nest(country_data = -continent) gm_nest %>% nest_fill(country_data, pop, .direction = "down")
set.seed(123) gm <- gapminder::gapminder %>% dplyr::mutate(pop = dplyr::if_else(runif(dplyr::n()) >= 0.9, NA_integer_, pop)) gm_nest <- gm %>% tidyr::nest(country_data = -continent) gm_nest %>% nest_fill(country_data, pop, .direction = "down")
nest_filter()
is used to subset nested data frames, retaining all rows that
satisfy your conditions. To be retained, the row must produce a value of
TRUE
for all conditions. Note that when a condition evaluates to NA
the
row will be dropped, unlike base subsetting with [
.
nest_filter()
subsets the rows within .nest_data
, applying the
expressions in ...
to the column values to determine which rows should be
retained. It can be applied to both grouped and ungrouped data.
nest_filter(.data, .nest_data, ..., .preserve = FALSE)
nest_filter(.data, .nest_data, ..., .preserve = FALSE)
.data |
A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr). |
.nest_data |
A list-column containing data frames |
... |
Expressions that return a logical value, and are defined in terms
of the variables in |
.preserve |
Relevant when |
nest_filter()
is largely a wrapper for dplyr::filter()
and maintains the
functionality of filter()
within each nested data frame. For more
information on filter()
, please refer to the documentation in
dplyr
.
An object of the same type as .data
. Each object in the column .nest_data
will also be of the same type as the input. Each object in .nest_data
has
the following properties:
Rows are a subset of the input, but appear in the same order.
Columns are not modified.
The number of groups may be reduced (if .preserve
is not TRUE
).
Data frame attributes are preserved.
Other single table verbs:
nest_arrange()
,
nest_mutate()
,
nest_rename()
,
nest_select()
,
nest_slice()
,
nest_summarise()
gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent) # apply a filter gm_nest %>% nest_filter(country_data, year > 1972) # apply multiple filters gm_nest %>% nest_filter(country_data, year > 1972, pop < 10000000) # apply a filter on grouped data gm_nest %>% nest_group_by(country_data, country) %>% nest_filter(country_data, pop > mean(pop))
gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent) # apply a filter gm_nest %>% nest_filter(country_data, year > 1972) # apply multiple filters gm_nest %>% nest_filter(country_data, year > 1972, pop < 10000000) # apply a filter on grouped data gm_nest %>% nest_group_by(country_data, country) %>% nest_filter(country_data, pop > mean(pop))
nest_group_by()
takes a set of nested tbls and converts it to a set of
nested grouped tbls where operations are performed "by group".
nest_ungroup()
removes grouping.
nest_group_by(.data, .nest_data, ..., .add = FALSE, .drop = TRUE) nest_ungroup(.data, .nest_data, ...)
nest_group_by(.data, .nest_data, ..., .add = FALSE, .drop = TRUE) nest_ungroup(.data, .nest_data, ...)
.data |
A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr). |
.nest_data |
A list-column containing data frames |
... |
In |
.add |
When |
.drop |
Drop groups formed by factor levels that don't appear in the
data? The default is |
nest_group_by()
and nest_ungroup()
are largely wrappers for
dplyr::group_by()
and dplyr::ungroup()
and maintain the functionality of
group_by()
and ungroup()
within each nested data frame. For more
information on group_by()
or ungroup()
, please refer to the documentation
in dplyr
.
An object of the same type as .data
. Each object in the column .nest_data
will be returned as a grouped data frame with class grouped_df
, unless the
combination of ...
and .add
yields an empty set of grouping columns, in
which case a tibble will be returned.
gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent) # grouping doesn't change .nest_data, just .nest_data class: gm_nest_grouped <- gm_nest %>% nest_group_by(country_data, year) gm_nest_grouped # It changes how it acts with other nplyr verbs: gm_nest_grouped %>% nest_summarise( country_data, lifeExp = mean(lifeExp), pop = mean(pop), gdpPercap = mean(gdpPercap) ) # ungrouping removes variable groups: gm_nest_grouped %>% nest_ungroup(country_data)
gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent) # grouping doesn't change .nest_data, just .nest_data class: gm_nest_grouped <- gm_nest %>% nest_group_by(country_data, year) gm_nest_grouped # It changes how it acts with other nplyr verbs: gm_nest_grouped %>% nest_summarise( country_data, lifeExp = mean(lifeExp), pop = mean(pop), gdpPercap = mean(gdpPercap) ) # ungrouping removes variable groups: gm_nest_grouped %>% nest_ungroup(country_data)
nest_mutate()
adds new variables to and preserves existing ones within
the nested data frames in .nest_data
.
nest_transmute()
adds new variables to and drops existing ones from the
nested data frames in .nest_data
.
nest_mutate(.data, .nest_data, ...) nest_transmute(.data, .nest_data, ...)
nest_mutate(.data, .nest_data, ...) nest_transmute(.data, .nest_data, ...)
.data |
A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr). |
.nest_data |
A list-column containing data frames |
... |
Name-value pairs. The name gives the name of the column in the output. The value can be:
|
nest_mutate()
and nest_transmute()
are largely wrappers for
dplyr::mutate()
and dplyr::transmute()
and maintain the functionality of
mutate()
and transmute()
within each nested data frame. For more
information on mutate()
or transmute()
, please refer to the documentation
in dplyr
.
An object of the same type as .data
. Each object in the column .nest_data
will also be of the same type as the input. Each object in .nest_data
has
the following properties:
For nest_mutate()
:
Columns from each object in .nest_data
will be preserved according to
the .keep
argument.
Existing columns that are modified by ...
will always be returned in
their original location.
New columns created through ...
will be placed according to the
.before
and .after
arguments.
For nest_transmute()
:
Columns created or modified through ...
will be returned in the order
specified by ...
.
Unmodified grouping columns will be placed at the front.
The number of rows is not affected.
Columns given the value NULL
will be removed.
Groups will be recomputed if a grouping variable is mutated.
Data frame attributes will be preserved.
Other single table verbs:
nest_arrange()
,
nest_filter()
,
nest_rename()
,
nest_select()
,
nest_slice()
,
nest_summarise()
gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent) # add or modify columns: gm_nest %>% nest_mutate( country_data, lifeExp = NULL, gdp = gdpPercap * pop, pop = pop/1000000 ) # use dplyr::across() to apply transformation to multiple columns gm_nest %>% nest_mutate( country_data, across(c(lifeExp:gdpPercap), mean) ) # nest_transmute() drops unused columns when mutating: gm_nest %>% nest_transmute( country_data, country = country, year = year, pop = pop/1000000 )
gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent) # add or modify columns: gm_nest %>% nest_mutate( country_data, lifeExp = NULL, gdp = gdpPercap * pop, pop = pop/1000000 ) # use dplyr::across() to apply transformation to multiple columns gm_nest %>% nest_mutate( country_data, across(c(lifeExp:gdpPercap), mean) ) # nest_transmute() drops unused columns when mutating: gm_nest %>% nest_transmute( country_data, country = country, year = year, pop = pop/1000000 )
nest_nest_join()
returns all rows and columns in .nest_data
with a new
nested-df column that contains all matches from y
. When there is no match,
the list contains a 0-row tibble.
nest_nest_join( .data, .nest_data, y, by = NULL, copy = FALSE, keep = FALSE, name = NULL, ... )
nest_nest_join( .data, .nest_data, y, by = NULL, copy = FALSE, keep = FALSE, name = NULL, ... )
.data |
A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr). |
.nest_data |
A list-column containing data frames |
y |
A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr). |
by |
A character vector of variables to join by or a join specification
created with If To join on different variables between the objects in To join by multiple variables, use a vector with length >1. For example,
To perform a cross-join, generating all combinations of each object in
|
copy |
If |
keep |
Should the join keys from both |
name |
The name of the list column nesting joins create. If |
... |
One or more unquoted expressions separated by commas. Variable
names can be used if they were positions in the data frame, so expressions
like |
nest_nest_join()
is largely a wrapper around dplyr::nest_join()
and
maintains the functionality of nest_join()
within east nested data frame.
For more information on nest_join()
, please refer to the documentation in
dplyr
.
An object of the same type as .data
. Each object in the column .nest_data
will also be of the same type as the input.
Other joins:
nest-filter-joins
,
nest-mutate-joins
gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent) gm_codes <- gapminder::country_codes gm_nest %>% nest_nest_join(country_data, gm_codes, by = "country")
gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent) gm_codes <- gapminder::country_codes gm_nest %>% nest_nest_join(country_data, gm_codes, by = "country")
nest_relocate()
changes column positions within a nested data frame, using
the same syntax as nest_select()
or dplyr::select()
to make it easy to
move blocks of columns at once.
nest_relocate(.data, .nest_data, ..., .before = NULL, .after = NULL)
nest_relocate(.data, .nest_data, ..., .before = NULL, .after = NULL)
.data |
A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr). |
.nest_data |
A list-column containing data frames |
... |
Columns to move. |
.before , .after
|
Destination of columns selected by |
nest_relocate()
is largely a wrapper for dplyr::relocate()
and maintains
the functionality of relocate()
within each nested data frame. For more
information on relocate()
, please refer to the documentation in
dplyr
.
An object of the same type as .data
. Each object in the column .nest_data
will also be of the same type as the input. Each object in .nest_data
has
the following properties:
Rows are not affected.
The same columns appear in the output, but (usually) in a different place.
Data frame attributes are preserved.
Groups are not affected.
gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent) gm_nest %>% nest_relocate(country_data, year) gm_nest %>% nest_relocate(country_data, pop, .after = year)
gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent) gm_nest %>% nest_relocate(country_data, year) gm_nest %>% nest_relocate(country_data, pop, .after = year)
nest_rename()
changes the names of individual variables using
new_name = old_name
syntax; nest_rename_with()
renames columns using a
function.
nest_rename(.data, .nest_data, ...) nest_rename_with(.data, .nest_data, .fn, .cols = dplyr::everything(), ...)
nest_rename(.data, .nest_data, ...) nest_rename_with(.data, .nest_data, .fn, .cols = dplyr::everything(), ...)
.data |
A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr). |
.nest_data |
A list-column containing data frames |
... |
For For |
.fn |
A function used to transform the selected |
.cols |
Columns to rename; defaults to all columns. |
nest_rename()
and nest_rename_with()
are largely wrappers for
dplyr::rename()
and dplyr::rename_with()
and maintain the functionality
of rename()
and rename_with()
within each nested data frame. For more
information on rename()
or rename_with()
, please refer to the
documentation in dplyr
.
An object of the same type as .data
. Each object in the column .nest_data
will also be of the same type as the input. Each object in .nest_data
has
the following properties:
Rows are not affected.
Column names are changed; column order is preserved.
Data frame attributes are preserved.
Groups are updated to reflect new names.
Other single table verbs:
nest_arrange()
,
nest_filter()
,
nest_mutate()
,
nest_select()
,
nest_slice()
,
nest_summarise()
gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent) gm_nest %>% nest_rename(country_data, population = pop) gm_nest %>% nest_rename_with(country_data, stringr::str_to_lower)
gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent) gm_nest %>% nest_rename(country_data, population = pop) gm_nest %>% nest_rename_with(country_data, stringr::str_to_lower)
nest_replace_na()
is used to replace missing values in selected columns of
nested data frames using values specified by column.
nest_replace_na(.data, .nest_data, replace, ...)
nest_replace_na(.data, .nest_data, replace, ...)
.data |
A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr). |
.nest_data |
A list-column containing data frames |
replace |
A list of values, with one value for each column in that has |
... |
Additional arguments for |
nest_replace_na()
is a wrapper for tidyr::replace_na()
and maintains the functionality
of replace_na()
within each nested data frame. For more information on replace_na()
please refer to the documentation in 'tidyr'.
An object of the same type as .data
. Each object in the column .nest_data
will have NAs replaced in the specified columns.
Other tidyr verbs:
nest_drop_na()
,
nest_extract()
,
nest_fill()
,
nest_separate()
,
nest_unite()
set.seed(123) gm <- gapminder::gapminder %>% dplyr::mutate(pop = dplyr::if_else(runif(dplyr::n()) >= 0.9, NA_integer_, pop)) gm_nest <- gm %>% tidyr::nest(country_data = -continent) gm_nest %>% nest_replace_na(.nest_data = country_data, replace = list(pop = -500))
set.seed(123) gm <- gapminder::gapminder %>% dplyr::mutate(pop = dplyr::if_else(runif(dplyr::n()) >= 0.9, NA_integer_, pop)) gm_nest <- gm %>% tidyr::nest(country_data = -continent) gm_nest %>% nest_replace_na(.nest_data = country_data, replace = list(pop = -500))
nest_select()
selects (and optionally renames) variables in nested data
frames, using a concise mini-language that makes it easy to refer to
variables based on their name (e.g., a:f
selects all columns from a
on
the left to f
on the right). You can also use predicate functions like
is.numeric to select variables based on their properties.
nest_select(.data, .nest_data, ...)
nest_select(.data, .nest_data, ...)
.data |
A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr). |
.nest_data |
A list-column containing data frames |
... |
One or more unquoted expressions separated by commas. Variable
names can be used if they were positions in the data frame, so expressions
like |
nest_select()
is largely a wrapper for dplyr::select()
and maintains the
functionality of select()
within each nested data frame. For more
information on select()
, please refer to the documentation in
dplyr
.
An object of the same type as .data
. Each object in the column .nest_data
will also be of the same type as the input. Each object in .nest_data
has
the following properties:
Rows are not affect.
Output columns are a subset of input columns, potentially with a different
order. Columns will be renamed if new_name = old_name
form is used.
Data frame attributes are preserved.
Groups are maintained; you can't select off grouping variables.
Other single table verbs:
nest_arrange()
,
nest_filter()
,
nest_mutate()
,
nest_rename()
,
nest_slice()
,
nest_summarise()
gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent) gm_nest %>% nest_select(country_data, country, year, pop) gm_nest %>% nest_select(country_data, dplyr::where(is.numeric))
gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent) gm_nest %>% nest_select(country_data, country, year, pop) gm_nest %>% nest_select(country_data, dplyr::where(is.numeric))
nest_separate()
is used to separate a single character column into multiple
columns using a regular expression or a vector of character positions in a
list of nested data frames.
nest_separate( .data, .nest_data, col, into, sep = "[^[:alnum:]]+", remove = TRUE, convert = FALSE, extra = "warn", fill = "warn", ... )
nest_separate( .data, .nest_data, col, into, sep = "[^[:alnum:]]+", remove = TRUE, convert = FALSE, extra = "warn", fill = "warn", ... )
.data |
A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr). |
.nest_data |
A list-column containing data frames |
col |
Column name or position within. Must be present in all data frames
in This argument is passed by expression and supports quasiquotation (you can unquote column names or column positions). |
into |
Names of new variables to create as character vector.
Use |
sep |
Separator between columns. If character, If numeric, |
remove |
If |
convert |
If NB: this will cause string |
extra |
If
|
fill |
If
|
... |
Additional arguments passed on to |
nest_separate()
is a wrapper for tidyr::separate()
and maintains the functionality
of separate()
within each nested data frame. For more information on separate()
please refer to the documentation in 'tidyr'.
An object of the same type as .data
. Each object in the column .nest_data
will have the specified column split according to the regular expression or
the vector of character positions.
Other tidyr verbs:
nest_drop_na()
,
nest_extract()
,
nest_fill()
,
nest_replace_na()
,
nest_unite()
set.seed(123) gm <- gapminder::gapminder %>% dplyr::mutate(comb = paste(continent, year, sep = "-")) gm_nest <- gm %>% tidyr::nest(country_data = -continent) gm_nest %>% nest_separate(country_data, col = comb, into = c("var1","var2"), sep = "-")
set.seed(123) gm <- gapminder::gapminder %>% dplyr::mutate(comb = paste(continent, year, sep = "-")) gm_nest <- gm %>% tidyr::nest(country_data = -continent) gm_nest %>% nest_separate(country_data, col = comb, into = c("var1","var2"), sep = "-")
nest_slice()
lets you index rows in nested data frames by their (integer)
locations. It allows you to select, remove, and duplicate rows. It is
accompanied by a number of helpers for common use cases:
nest_slice_head()
and nest_slice_tail()
select the first or last rows
of each nested data frame in .nest_data
.
nest_slice_sample()
randomly selects rows from each data frame in
.nest_data
.
nest_slice_min()
and nest_slice_max()
select the rows with the highest
or lowest values of a variable within each nested data frame in
.nest_data
.
If .nest_data
is a grouped data frame, the operation will be performed on
each group, so that (e.g.) nest_slice_head(df, nested_dfs, n = 5)
will
return the first five rows in each group for each nested data frame.
nest_slice(.data, .nest_data, ..., .preserve = FALSE) nest_slice_head(.data, .nest_data, ...) nest_slice_tail(.data, .nest_data, ...) nest_slice_min(.data, .nest_data, order_by, ..., with_ties = TRUE) nest_slice_max(.data, .nest_data, order_by, ..., with_ties = TRUE) nest_slice_sample(.data, .nest_data, ..., weight_by = NULL, replace = FALSE)
nest_slice(.data, .nest_data, ..., .preserve = FALSE) nest_slice_head(.data, .nest_data, ...) nest_slice_tail(.data, .nest_data, ...) nest_slice_min(.data, .nest_data, order_by, ..., with_ties = TRUE) nest_slice_max(.data, .nest_data, order_by, ..., with_ties = TRUE) nest_slice_sample(.data, .nest_data, ..., weight_by = NULL, replace = FALSE)
.data |
A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr). |
.nest_data |
A list-column containing data frames |
... |
For Provide either positive values to keep, or negative values to drop. The values provided must be either all positive or all negative. Indices beyond the number of rows in the input are silently ignored. For Additionally:
|
.preserve |
Relevant when |
order_by |
Variable or function of variables to order by. |
with_ties |
Should ties be kept together? The default, |
weight_by |
Sampling weights. This must evaluate to a vector of non-negative numbers the same length as the input. Weights are automatically standardised to sum to 1. |
replace |
Should sampling be performed with ( |
nest_slice()
and its helpers are largely wrappers for dplyr::slice()
and
its helpers and maintains the functionality of slice()
and its helpers
within each nested data frame. For more information on slice()
or its
helpers, please refer to the documentation in
dplyr
.
An object of the same type as .data
. Each object in the column .nest_data
will also be of the same type as the input. Each object in .nest_data
has
the following properties:
Each row may appear 0, 1, or many times in the output.
Columns are not modified.
Groups are not modified.
Data frame attributes are preserved.
Other single table verbs:
nest_arrange()
,
nest_filter()
,
nest_mutate()
,
nest_rename()
,
nest_select()
,
nest_summarise()
gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent) # select the 1st, 3rd, and 5th rows in each data frame in country_data gm_nest %>% nest_slice(country_data, 1, 3, 5) # or select all but the 1st, 3rd, and 5th rows: gm_nest %>% nest_slice(country_data, -1, -3, -5) # first and last rows based on existing order: gm_nest %>% nest_slice_head(country_data, n = 5) gm_nest %>% nest_slice_tail(country_data, n = 5) # rows with minimum and maximum values of a variable: gm_nest %>% nest_slice_min(country_data, lifeExp, n = 5) gm_nest %>% nest_slice_max(country_data, lifeExp, n = 5) # randomly select rows with or without replacement: gm_nest %>% nest_slice_sample(country_data, n = 5) gm_nest %>% nest_slice_sample(country_data, n = 5, replace = TRUE)
gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent) # select the 1st, 3rd, and 5th rows in each data frame in country_data gm_nest %>% nest_slice(country_data, 1, 3, 5) # or select all but the 1st, 3rd, and 5th rows: gm_nest %>% nest_slice(country_data, -1, -3, -5) # first and last rows based on existing order: gm_nest %>% nest_slice_head(country_data, n = 5) gm_nest %>% nest_slice_tail(country_data, n = 5) # rows with minimum and maximum values of a variable: gm_nest %>% nest_slice_min(country_data, lifeExp, n = 5) gm_nest %>% nest_slice_max(country_data, lifeExp, n = 5) # randomly select rows with or without replacement: gm_nest %>% nest_slice_sample(country_data, n = 5) gm_nest %>% nest_slice_sample(country_data, n = 5, replace = TRUE)
nest_summarise()
creates a new set of nested data frames. Each will have
one (or more) rows for each combination of grouping variables; if there are
no grouping variables, the output will have a single row summarising all
observations in .nest_data
. Each nested data frame will contain one column
for each grouping variable and one column for each of the summary statistics
that you have specified.
nest_summarise()
and nest_summarize()
are synonyms.
nest_summarise(.data, .nest_data, ..., .groups = NULL) nest_summarize(.data, .nest_data, ..., .groups = NULL)
nest_summarise(.data, .nest_data, ..., .groups = NULL) nest_summarize(.data, .nest_data, ..., .groups = NULL)
.data |
A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr). |
.nest_data |
A list-column containing data frames |
... |
Name-value pairs of functions. The name will be the name of the variable in the result. The value can be:
|
.groups |
|
nest_summarise()
is largely a wrapper for dplyr::summarise()
and
maintains the functionality of summarise()
within each nested data frame.
For more information on summarise()
, please refer to the documentation in
dplyr
.
An object of the same type as .data
. Each object in the column .nest_data
will usually be of the same type as the input. Each object in .nest_data
has
the following properties:
The rows come from the underlying group_keys()
The columns are a combination of the grouping keys and the summary expressions that you provide.
The grouping structure is controlled by the .groups
argument, the output
may be another grouped_df, a tibble, or a rowwise data frame.
Data frame attributes are not preserved, because nest_summarise()
fundamentally creates a new data frame for each object in .nest_data
.
Other single table verbs:
nest_arrange()
,
nest_filter()
,
nest_mutate()
,
nest_rename()
,
nest_select()
,
nest_slice()
gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent) # a summary applied to an ungrouped tbl returns a single row gm_nest %>% nest_summarise( country_data, n = dplyr::n(), median_pop = median(pop) ) # usually, you'll want to group first gm_nest %>% nest_group_by(country_data, country) %>% nest_summarise( country_data, n = dplyr::n(), median_pop = median(pop) )
gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent) # a summary applied to an ungrouped tbl returns a single row gm_nest %>% nest_summarise( country_data, n = dplyr::n(), median_pop = median(pop) ) # usually, you'll want to group first gm_nest %>% nest_group_by(country_data, country) %>% nest_summarise( country_data, n = dplyr::n(), median_pop = median(pop) )
nest_unite()
is used to combine multiple columns into one in a column of
nested data frames.
nest_unite( .data, .nest_data, col, ..., sep = "_", remove = TRUE, na.rm = FALSE )
nest_unite( .data, .nest_data, col, ..., sep = "_", remove = TRUE, na.rm = FALSE )
.data |
A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr). |
.nest_data |
A list-column containing data frames |
col |
The name of the new column, as a string or symbol. This argument is passed by expression and supports
quasiquotation (you can unquote strings
and symbols). The name is captured from the expression with
|
... |
Columns to unite. |
sep |
Separator to use between values. |
remove |
If |
na.rm |
If |
nest_unite()
is a wrapper for tidyr::unite()
and maintains the functionality
of unite()
within each nested data frame. For more information on unite()
please refer to the documentation in 'tidyr'.
An object of the same type as .data
. Each object in the column .nest_data
will have a new column created as a combination of existing columns.
Other tidyr verbs:
nest_drop_na()
,
nest_extract()
,
nest_fill()
,
nest_replace_na()
,
nest_separate()
set.seed(123) gm <- gapminder::gapminder gm_nest <- gm %>% tidyr::nest(country_data = -continent) gm_nest %>% nest_unite(country_data, col = comb, year, pop)
set.seed(123) gm <- gapminder::gapminder gm_nest <- gm %>% tidyr::nest(country_data = -continent) gm_nest %>% nest_unite(country_data, col = comb, year, pop)
Nested filtering joins filter rows from .nest_data
based on the presence or
absence of matches in y
:
nest_semi_join()
returns all rows from .nest_data
with a match in y
.
nest_anti_join()
returns all rows from .nest_data
without a match in y
.
nest_semi_join(.data, .nest_data, y, by = NULL, copy = FALSE, ...) nest_anti_join(.data, .nest_data, y, by = NULL, copy = FALSE, ...)
nest_semi_join(.data, .nest_data, y, by = NULL, copy = FALSE, ...) nest_anti_join(.data, .nest_data, y, by = NULL, copy = FALSE, ...)
.data |
A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr). |
.nest_data |
A list-column containing data frames |
y |
A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr). |
by |
A character vector of variables to join by or a join specification
created with If To join on different variables between the objects in To join by multiple variables, use a vector with length >1. For example,
To perform a cross-join, generating all combinations of each object in
|
copy |
If |
... |
One or more unquoted expressions separated by commas. Variable
names can be used if they were positions in the data frame, so expressions
like |
nest_semi_join()
and nest_anti_join()
are largely wrappers for
dplyr::semi_join()
and dplyr::anti_join()
and maintain the functionality
of semi_join()
and anti_join()
within each nested data frame. For more
information on semi_join()
or anti_join()
, please refer to the
documentation in dplyr
.
An object of the same type as .data
. Each object in the column .nest_data
will also be of the same type as the input. Each object in .nest_data
has
the following properties:
Rows are a subset of the input, but appear in the same order.
Columns are not modified.
Data frame attributes are preserved.
Groups are taken from .nest_data
. The number of groups may be reduced.
Other joins:
nest-mutate-joins
,
nest_nest_join()
gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent) gm_codes <- gapminder::country_codes %>% dplyr::slice_sample(n = 10) gm_nest %>% nest_semi_join(country_data, gm_codes, by = "country") gm_nest %>% nest_anti_join(country_data, gm_codes, by = "country")
gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent) gm_codes <- gapminder::country_codes %>% dplyr::slice_sample(n = 10) gm_nest %>% nest_semi_join(country_data, gm_codes, by = "country") gm_nest %>% nest_anti_join(country_data, gm_codes, by = "country")
Nested mutating joins add columns from y
to each of the nested data frames
in .nest_data
, matching observations based on the keys. There are four
nested mutating joins:
nest_inner_join()
only keeps observations from .nest_data
that have a
matching key in y
.
The most important property of an inner join is that unmatched rows in either input are not included in the result.
There are three outer joins that keep observations that appear in at least one of the data frames:
nest_left_join()
keeps all observations in .nest_data
.
nest_right_join()
keeps all observations in y
.
nest_full_join()
keeps all observations in .nest_data
and y
.
nest_inner_join( .data, .nest_data, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ..., keep = FALSE ) nest_left_join( .data, .nest_data, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ..., keep = FALSE ) nest_right_join( .data, .nest_data, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ..., keep = FALSE ) nest_full_join( .data, .nest_data, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ..., keep = FALSE )
nest_inner_join( .data, .nest_data, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ..., keep = FALSE ) nest_left_join( .data, .nest_data, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ..., keep = FALSE ) nest_right_join( .data, .nest_data, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ..., keep = FALSE ) nest_full_join( .data, .nest_data, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ..., keep = FALSE )
.data |
A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr). |
.nest_data |
A list-column containing data frames |
y |
A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr). |
by |
A character vector of variables to join by or a join specification
created with If To join on different variables between the objects in To join by multiple variables, use a vector with length >1. For example,
To perform a cross-join, generating all combinations of each object in
|
copy |
If |
suffix |
If there are non-joined duplicate variables in |
... |
Other parameters passed onto methods. Includes:
|
keep |
Should the join keys from both |
nest_inner_join()
, nest_left_join()
, nest_right_join()
, and
nest_full_join()
are largely wrappers for dplyr::inner_join()
,
dplyr::left_join()
, dplyr::right_join()
, and dplyr::full_join()
and
maintain the functionality of these verbs within each nested data frame. For
more information on inner_join()
, left_join()
, right_join()
, or
full_join()
, please refer to the documentation in
dplyr
.
An object of the same type as .data
. Each object in the column .nest_data
will also be of the same type as the input. The order of the rows and columns
of each object in .nest_data
is preserved as much as possible. Each object
in .nest_data
has the following properties:
For nest_inner_join()
, a subset of rows in each object in .nest_data
.
For nest_left_join()
, all rows in each object in .nest_data
.
For nest_right_join()
, a subset of rows in each object in .nest_data
,
followed by unmatched y
rows.
For nest_full_join()
, all rows in each object in .nest_data
, followed
by unmatched y
rows.
Output columns include all columns from each .nest_data
and all non-key
columns from y
. If keep = TRUE
, the key columns from y
are included
as well.
If non-key columns in any object in .nest_data
and y
have the same name,
suffix
es are added to disambiguate. If keep = TRUE
and key columns in
.nest_data
and y
have the same name, suffix
es are added to
disambiguate these as well.
If keep = FALSE
, output columns included in by
are coerced to their
common type between the objects in .nest_data
and y
.
Groups are taken from .nest_data
.
Other joins:
nest-filter-joins
,
nest_nest_join()
gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent) gm_codes <- gapminder::country_codes gm_nest %>% nest_inner_join(country_data, gm_codes, by = "country") gm_nest %>% nest_left_join(country_data, gm_codes, by = "country") gm_nest %>% nest_right_join(country_data, gm_codes, by = "country") gm_nest %>% nest_full_join(country_data, gm_codes, by = "country")
gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent) gm_codes <- gapminder::country_codes gm_nest %>% nest_inner_join(country_data, gm_codes, by = "country") gm_nest %>% nest_left_join(country_data, gm_codes, by = "country") gm_nest %>% nest_right_join(country_data, gm_codes, by = "country") gm_nest %>% nest_full_join(country_data, gm_codes, by = "country")
A toy dataset containing 750 responses to a personal satisfaction survey. The responses were randomly generated using the Qualtrics survey platform.
personal_survey
personal_survey
A data frame with 750 rows and 6 variables
name of survey
respondent age
city the respondent resides in
field that the respondent that works in
respondent's personal life satisfaction (on a scale from extremely satisfied to extremely dissatisfied)
open text response elaborating on personal life satisfaction