| Title: | A 'dplyr' Back End for Databases |
|---|---|
| Description: | A 'dplyr' back end for databases that allows you to work with remote database tables as if they are in-memory data frames. Basic features works with any database that has a 'DBI' back end; more advanced features require 'SQL' translation to be provided by the package author. |
| Authors: | Hadley Wickham [aut, cre], Maximilian Girlich [aut], Edgar Ruiz [aut], Posit Software, PBC [cph, fnd] |
| Maintainer: | Hadley Wickham <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 2.5.2.9000 |
| Built: | 2026-05-24 06:51:09 UTC |
| Source: | https://github.com/tidyverse/dbplyr |
Use .sql$foo(x, y) to make it clear that you're calling the SQL
foo() function, not the R foo() function. This also makes it easier to
reduce R CMD check notes in packages; just import .sql from dbplyr with
e.g. @importFrom dbplyr .sql.
Note that .sql itself does nothing and is just NULL; it is automatically
removed when dbplyr translates your R code to SQL.
.sql.sql
library(dplyr, warn.conflicts = FALSE) db <- lazy_frame(x = 1, y = 2) db |> mutate(z = .sql$CUMULATIVE_SUM(x, 1))library(dplyr, warn.conflicts = FALSE) db <- lazy_frame(x = 1, y = 2) db |> mutate(z = .sql$CUMULATIVE_SUM(x, 1))
This is an method for the dplyr arrange() generic. It generates
the ORDER BY clause of the SQL query. It also affects the
window_order() of windowed expressions in mutate.tbl_lazy().
Note that ORDER BY clauses can not generally appear in subqueries, which
means that you should arrange() as late as possible in your pipelines.
## S3 method for class 'tbl_lazy' arrange(.data, ..., .by_group = FALSE)## S3 method for class 'tbl_lazy' arrange(.data, ..., .by_group = FALSE)
.data |
A lazy data frame backed by a database query. |
... |
< |
.by_group |
If |
Another tbl_lazy. Use show_query() to see the generated
query, and use collect() to execute the query
and return data to R.
Unlike R, most databases sorts NA (NULLs) at the front. You can
can override this behaviour by explicitly sorting on is.na(x).
library(dplyr, warn.conflicts = FALSE) db <- memdb_frame(a = c(3, 4, 1, 2), b = c(5, 1, 2, NA)) db |> arrange(a) |> show_query() # Note that NAs are sorted first db |> arrange(b) # override by sorting on is.na() first db |> arrange(is.na(b), b)library(dplyr, warn.conflicts = FALSE) db <- memdb_frame(a = c(3, 4, 1, 2), b = c(5, 1, 2, NA)) db |> arrange(a) |> show_query() # Note that NAs are sorted first db |> arrange(b) # override by sorting on is.na() first db |> arrange(is.na(b), b)
This backend supports Microsoft Access databases, typically accessed via
odbc. Use dialect_access() with lazy_frame() to see simulated SQL without
connecting to a live database.
Key differences for this backend are:
SELECT uses TOP, not LIMIT
Non-standard types and mathematical functions
String concatenation uses &
No ANALYZE equivalent
TRUE and FALSE converted to 1 and 0
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
dialect_access() simulate_access()dialect_access() simulate_access()
library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(x = 1, y = 2, z = "a", con = dialect_access()) lf |> head() lf |> mutate(y = as.numeric(y), z = sqrt(x^2 + 10)) lf |> mutate(a = paste0(z, " times"))library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(x = 1, y = 2, z = "a", con = dialect_access()) lf |> head() lf |> mutate(y = as.numeric(y), z = sqrt(x^2 + 10)) lf |> mutate(a = paste0(z, " times"))
This backend supports databases accessed via AdbiConnection created by
adbi::adbi() and DBI::dbConnect(). dbplyr automatically detects the
underlying database type by querying the ADBC driver's vendor name and
uses the appropriate SQL dialect.
The following vendors are recognized:
PostgreSQL and PostgreSQL-compatible systems (CrateDB, CockroachDB,
Citus, Neon, ParadeDB, TimescaleDB, Yellowbrick, YugabyteDB, CedarDB):
dialect_postgres()
MySQL and MySQL-compatible systems (MariaDB, TiDB, Vitess): the
appropriate dialect (dialect_mariadb() for MariaDB, dialect_mysql()
otherwise)
SQLite: dialect_sqlite()
Microsoft SQL Server: dialect_mssql()
Snowflake: dialect_snowflake()
Amazon Redshift: dialect_redshift()
If your database is not recognized, dbplyr will fall back to a generic ODBC
dialect. In this case, or if dbplyr guesses wrong, you can use
with_dialect() to choose a specific dialect.
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
# ADBC connections require the adbi package and an ADBC driver. # Once connected, dbplyr automatically detects the database type: # # library(adbi) # library(dplyr, warn.conflicts = FALSE) # # con <- DBI::dbConnect(adbi::adbi("adbcsqlite"), uri = ":memory:") # tbl(con, "my_table") |> filter(x > 1)# ADBC connections require the adbi package and an ADBC driver. # Once connected, dbplyr automatically detects the database type: # # library(adbi) # library(dplyr, warn.conflicts = FALSE) # # con <- DBI::dbConnect(adbi::adbi("adbcsqlite"), uri = ":memory:") # tbl(con, "my_table") |> filter(x > 1)
This is the base dialect for ANSI compliant SQL, forming the foundation
of all other dialects. Use dialect_ansi() with lazy_frame() to see
simulated SQL without connecting to a live database.
See vignette("translation-function") for a list of functions that are
translated.
dialect_ansi() simulate_dbi(class = character(), ...)dialect_ansi() simulate_dbi(class = character(), ...)
class, ...
|
No longer used. |
library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_ansi()) lf |> transmute(x = mean(b, na.rm = TRUE)) lf |> transmute(x = log(b), y = log(b, base = 2))library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_ansi()) lf |> transmute(x = mean(b, na.rm = TRUE)) lf |> transmute(x = log(b), y = log(b, base = 2))
This backend supports IBM DB2 databases, typically accessed via ODBC.
Use dialect_db2() with lazy_frame() to see simulated SQL without
connecting to a live database.
Key differences for this backend are:
Uses FETCH FIRST n ROWS ONLY instead of LIMIT n
Uses double quotes for identifier quoting
paste() uses ||
DB2-specific data type names for casts (e.g. VARCHAR(255), DOUBLE)
Date component extraction via YEAR(), MONTH(), ..., DAYOFYEAR(),
DAYOFWEEK(), QUARTER(), WEEK()
str_flatten() uses LISTAGG
Statistical summaries sd(), var(), cor(), cov()
runif() translates to RAND()
Regular expression functions (DB2 11.1+)
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
dialect_db2()dialect_db2()
library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_db2()) lf |> head() lf |> transmute(x = paste0(d, " times")) lf |> summarise(x = sd(b, na.rm = TRUE))library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_db2()) lf |> head() lf |> transmute(x = paste0(d, " times")) lf |> summarise(x = sd(b, na.rm = TRUE))
This backend supports SAP HANA databases, typically accessed via
HDBConnection created by DBI::dbConnect(). Use dialect_hana() with
lazy_frame() to see simulated SQL without connecting to a live database.
Key differences for this backend are:
Temporary tables get # prefix and use LOCAL TEMPORARY COLUMN.
No table analysis performed in copy_to().
paste() uses ||
Note that you can't create new boolean columns from logical expressions;
you need to wrap with explicit ifelse: ifelse(x > y, TRUE, FALSE).
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
dialect_hana() simulate_hana()dialect_hana() simulate_hana()
library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_hana()) lf |> transmute(x = paste0(d, " times"))library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_hana()) lf |> transmute(x = paste0(d, " times"))
This backend supports Apache Hive, typically accessed via odbc. Use
dialect_hive() with lazy_frame() to see simulated SQL without connecting
to a live database.
Key differences for this backend are a scattering of custom translations provided by users.
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
dialect_hive() simulate_hive()dialect_hive() simulate_hive()
library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(a = TRUE, b = 1, d = 2, c = "z", con = dialect_hive()) lf |> transmute(x = cot(b)) lf |> transmute(x = bitwShiftL(c, 1L)) lf |> transmute(x = str_replace_all(c, "a", "b")) lf |> summarise(x = median(d, na.rm = TRUE)) lf |> summarise(x = var(c, na.rm = TRUE))library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(a = TRUE, b = 1, d = 2, c = "z", con = dialect_hive()) lf |> transmute(x = cot(b)) lf |> transmute(x = bitwShiftL(c, 1L)) lf |> transmute(x = str_replace_all(c, "a", "b")) lf |> summarise(x = median(d, na.rm = TRUE)) lf |> summarise(x = var(c, na.rm = TRUE))
This backend supports Apache Impala, typically accessed via odbc. Use
dialect_impala() with lazy_frame() to see simulated SQL without
connecting to a live database.
Key differences for this backend are a scattering of custom translations provided by users, mostly focussed on bitwise operations.
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
dialect_impala() simulate_impala()dialect_impala() simulate_impala()
library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_impala()) lf |> transmute(X = bitwNot(bitwOr(b, c)))library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_impala()) lf |> transmute(X = bitwNot(bitwOr(b, c)))
This backend supports databases accessed via JDBCConnection created by
RJDBC::JDBC() and DBI::dbConnect(). dbplyr automatically detects the
underlying database type from the JDBC connection class and uses the
appropriate SQL dialect.
The following databases are recognized via their JDBC connection class:
PostgreSQL (org.postgresql.*): dialect_postgres()
MySQL (com.mysql.*): dialect_mysql()
MariaDB (org.mariadb.*): dialect_mariadb()
SQLite (org.sqlite.*): dialect_sqlite()
Oracle (oracle.*): dialect_oracle()
SQL Server (com.microsoft.sqlserver.*): dialect_mssql()
IBM DB2 (com.ibm.db2.*): dialect_db2()
SAP HANA (com.sap.db.*): dialect_hana()
Teradata (com.teradata.*): dialect_teradata()
Apache Hive (org.apache.hive.*): dialect_hive()
Apache Spark (org.apache.spark.* or com.simba.spark.*): dialect_spark_sql()
Snowflake (net.snowflake.*): dialect_snowflake()
Impala (com.cloudera.impala.*): dialect_impala()
Amazon Redshift (com.amazon.redshift.*): dialect_redshift()
If your database is not recognized, dbplyr will fall back to a generic ODBC dialect. Please file an issue if you'd like support for additional databases.
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
with_dialect() to use a different dialect if dbplyr guesses
incorrectly, or a more specific translation is available.
# JDBC connections require the RJDBC package and a JDBC driver JAR file. # Once connected, dbplyr automatically detects the database type: # # library(RJDBC) # library(dplyr, warn.conflicts = FALSE) # # drv <- JDBC("org.postgresql.Driver", "postgresql.jar") # con <- dbConnect(drv, "jdbc:postgresql://localhost/mydb", "user", "password") # tbl(con, "my_table") |> filter(x > 1)# JDBC connections require the RJDBC package and a JDBC driver JAR file. # Once connected, dbplyr automatically detects the database type: # # library(RJDBC) # library(dplyr, warn.conflicts = FALSE) # # drv <- JDBC("org.postgresql.Driver", "postgresql.jar") # con <- dbConnect(drv, "jdbc:postgresql://localhost/mydb", "user", "password") # tbl(con, "my_table") |> filter(x > 1)
This backend supports Microsoft SQL Server, typically accessed via odbc. Use
dialect_mssql() with lazy_frame() to see simulated SQL without connecting
to a live database.
Key differences for this backend are:
SELECT uses TOP not LIMIT
Automatically prefixes # to create temporary tables. Add the prefix
yourself to avoid the message.
String basics: paste(), substr(), nchar()
Custom types for as.* functions
Lubridate extraction functions, year(), month(), day() etc
Semi-automated bit <-> boolean translation (see below)
stringr functions str_detect(), str_starts(), str_ends() with
fixed() patterns work on all versions; regular expression patterns
require SQL Server 2025+ (version 17.0)
stringr functions str_replace(), str_replace_all(), str_remove(),
str_remove_all(), str_extract(), and str_count() require SQL Server
2025+ (version 17.0)
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
dialect_mssql(version = "15.0") simulate_mssql(version = "15.0")dialect_mssql(version = "15.0") simulate_mssql(version = "15.0")
version |
Version of MS SQL to simulate. Currently, 11.0 and above
will use |
SQL server uses two incompatible types to represent TRUE and FALSE
values:
The BOOLEAN type is the result of logical comparisons (e.g. x > y)
and can be used WHERE but not to create new columns in SELECT.
https://learn.microsoft.com/en-us/sql/t-sql/language-elements/comparison-operators-transact-sql
The BIT type is a special type of numeric column used to store
TRUE and FALSE values, but can't be used in WHERE clauses.
https://learn.microsoft.com/en-us/sql/t-sql/data-types/bit-transact-sql?view=sql-server-ver15
dbplyr does its best to automatically create the correct type when needed, but can't do it 100% correctly because it does not have a full type inference system. This means that you many need to manually do conversions from time to time.
To convert from bit to boolean use x == 1
To convert from boolean to bit use as.logical(if(x, 0, 1))
library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_mssql()) lf |> head() lf |> transmute(x = paste(b, c, d)) # Can use boolean as is: lf |> filter(c > d) # Need to convert from boolean to bit: lf |> transmute(x = c > d) # Can use boolean as is: lf |> transmute(x = ifelse(c > d, "c", "d"))library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_mssql()) lf |> head() lf |> transmute(x = paste(b, c, d)) # Can use boolean as is: lf |> filter(c > d) # Need to convert from boolean to bit: lf |> transmute(x = c > d) # Can use boolean as is: lf |> transmute(x = ifelse(c > d, "c", "d"))
This backend supports MySQL and MariaDB databases, typically accessed via
MySQLConnection or MariaDBConnection created by DBI::dbConnect(). Use
dialect_mysql() with lazy_frame() to see simulated SQL without connecting
to a live database.
Key differences for this backend are:
paste() uses CONCAT_WS()
String translations for str_detect(), str_locate(), and
str_replace_all()
Clear error message for unsupported full joins
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
dialect_mariadb() dialect_mysql() simulate_mysql() simulate_mariadb()dialect_mariadb() dialect_mysql() simulate_mysql() simulate_mariadb()
library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_mysql()) lf |> transmute(x = paste0(d, " times"))library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_mysql()) lf |> transmute(x = paste0(d, " times"))
This backend supports databases accessed via OdbcConnection created by
DBI::dbConnect(). Use dialect_odbc() with lazy_frame() to see simulated
SQL without connecting to a live database.
Key differences for this backend are minor translations for common data types.
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
dialect_odbc() simulate_odbc()dialect_odbc() simulate_odbc()
with_dialect() to use a different dialect if dbplyr guesses
incorrectly, or a more specific translation is available.
library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(a = TRUE, b = 1, d = 2, c = "z", con = dialect_odbc()) lf |> transmute(x = as.numeric(b)) lf |> transmute(x = as.integer(b)) lf |> transmute(x = as.character(b))library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(a = TRUE, b = 1, d = 2, c = "z", con = dialect_odbc()) lf |> transmute(x = as.numeric(b)) lf |> transmute(x = as.integer(b)) lf |> transmute(x = as.character(b))
This backend supports Oracle databases, typically accessed via
OraConnection created by DBI::dbConnect(). Use dialect_oracle() with
lazy_frame() to see simulated SQL without connecting to a live database.
Key differences for this backend are:
Use FETCH FIRST instead of LIMIT
Custom types
paste() uses ||
Custom subquery generation (no AS)
setdiff() uses MINUS instead of EXCEPT
Note that versions of Oracle prior to 23c have limited supported for
TRUE and FALSE and you may need to use 1 and 0 instead.
See https://oracle-base.com/articles/23/boolean-data-type-23 for
more details.
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
dialect_oracle() simulate_oracle()dialect_oracle() simulate_oracle()
library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_oracle()) lf |> transmute(x = paste0(c, " times")) lf |> setdiff(lf)library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_oracle()) lf |> transmute(x = paste0(c, " times")) lf |> setdiff(lf)
This backend supports PostgreSQL databases, typically accessed via a
PqConnection created by DBI::dbConnect(). Use dialect_postgres() with
lazy_frame() to see simulated SQL without connecting to a live database.
Key differences for this backend are:
Many stringr functions
lubridate date-time extraction functions
More standard statistical summaries
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
dialect_postgres() simulate_postgres()dialect_postgres() simulate_postgres()
library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_postgres()) lf |> summarise(x = sd(b, na.rm = TRUE)) lf |> summarise(y = cor(b, c), z = cov(b, c))library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_postgres()) lf |> summarise(x = sd(b, na.rm = TRUE)) lf |> summarise(y = cor(b, c), z = cov(b, c))
This backend supports Amazon Redshift databases, typically accessed via
a RedshiftConnection created by DBI::dbConnect(). Use dialect_redshift()
with lazy_frame() to see simulated SQL without connecting to a live
database.
Base translations come from PostgreSQL backend. There are generally few differences, apart from string manipulation.
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
dialect_redshift() simulate_redshift()dialect_redshift() simulate_redshift()
library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_redshift()) lf |> transmute(x = paste(c, " times")) lf |> transmute(x = substr(c, 2, 3)) lf |> transmute(x = str_replace_all(c, "a", "z"))library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_redshift()) lf |> transmute(x = paste(c, " times")) lf |> transmute(x = substr(c, 2, 3)) lf |> transmute(x = str_replace_all(c, "a", "z"))
This backend supports Snowflake databases, typically accessed via odbc. Use
dialect_snowflake() with lazy_frame() to see simulated SQL without
connecting to a live database.
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
dialect_snowflake() simulate_snowflake()dialect_snowflake() simulate_snowflake()
library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_snowflake()) lf |> transmute(x = paste0(d, " times"))library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_snowflake()) lf |> transmute(x = paste0(d, " times"))
This backend supports Databricks Spark SQL, typically accessed via the
Databricks ODBC or JDBC connector. Use dialect_spark_sql() with
lazy_frame() to see simulated SQL without connecting to a live database.
Key differences for this backend are better translation of statistical
aggregate functions (e.g. var(), median()) and use of temporary views
instead of temporary tables when copying data.
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
dialect_spark_sql() simulate_spark_sql()dialect_spark_sql() simulate_spark_sql()
library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(a = TRUE, b = 1, d = 2, c = "z", con = dialect_spark_sql()) lf |> summarise(x = median(d, na.rm = TRUE)) lf |> summarise(x = var(c, na.rm = TRUE), .by = d) lf |> mutate(x = first(c)) lf |> mutate(x = first(c), .by = d)library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(a = TRUE, b = 1, d = 2, c = "z", con = dialect_spark_sql()) lf |> summarise(x = median(d, na.rm = TRUE)) lf |> summarise(x = var(c, na.rm = TRUE), .by = d) lf |> mutate(x = first(c)) lf |> mutate(x = first(c), .by = d)
This backend supports SQLite databases, typically accessed via
a SQLiteConnection created by DBI::dbConnect(). Use dialect_sqlite()
with lazy_frame() to see simulated SQL without connecting to a live
database.
Key differences for this backend are:
Uses non-standard LOG() function
Date-time extraction functions from lubridate
Custom median translation
Right and full joins are simulated using left joins
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
dialect_sqlite() simulate_sqlite()dialect_sqlite() simulate_sqlite()
library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_sqlite()) lf |> transmute(x = paste(c, " times")) lf |> transmute(x = log(b), y = log(b, base = 2))library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_sqlite()) lf |> transmute(x = paste(c, " times")) lf |> transmute(x = log(b), y = log(b, base = 2))
This backend supports Teradata databases, typically accessed via odbc. Use
dialect_teradata() with lazy_frame() to see simulated SQL without
connecting to a live database.
Key differences for this backend are:
Uses TOP instead of LIMIT
Selection of user supplied translations
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
dialect_teradata() simulate_teradata()dialect_teradata() simulate_teradata()
library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_teradata()) lf |> head()library(dplyr, warn.conflicts = FALSE) lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_teradata()) lf |> head()
Combine multiple lazy queries into a single query using UNION ALL.
This is a convenient wrapper around purrr::reduce(tables, union_all).
Like dplyr::bind_rows() (and unlike UNION ALL), bind_queries() will
automatically align columns based on their name, and fill in any missing
columns with missing values.
bind_queries(...)bind_queries(...)
... |
<dynamic-dots> Lazy tables to combine. |
A lazy query.
lf1 <- lazy_frame(x = 1, y = "a") lf2 <- lazy_frame(x = 2, y = "b") bind_queries(lf1, lf2) lf3 <- lazy_frame(y = "c", x = 3, z = 10) bind_queries(lf2, lf3) # If you already have a list, you can use splice operator queries <- list(lf1, lf2) bind_queries(!!!queries)lf1 <- lazy_frame(x = 1, y = "a") lf2 <- lazy_frame(x = 2, y = "b") bind_queries(lf1, lf2) lf3 <- lazy_frame(y = "c", x = 3, z = 10) bind_queries(lf2, lf3) # If you already have a list, you can use splice operator queries <- list(lf1, lf2) bind_queries(!!!queries)
collapse() forces computation of a lazy query by wrapping it in a subquery.
This is not generally needed, but can be useful if you need to work around
database/dbplyr limitations.
## S3 method for class 'tbl_sql' collapse(x, ...)## S3 method for class 'tbl_sql' collapse(x, ...)
x |
A lazy data frame backed by a database query. |
... |
Ignored. |
library(dplyr, warn.conflicts = FALSE) db <- memdb_frame(a = c(3, 4, 1, 2), b = c(5, 1, 2, NA)) db |> filter(a <= 2) |> show_query() db |> filter(a <= 2) |> collapse() |> show_query()library(dplyr, warn.conflicts = FALSE) db <- memdb_frame(a = c(3, 4, 1, 2), b = c(5, 1, 2, NA)) db |> filter(a <= 2) |> show_query() db |> filter(a <= 2) |> collapse() |> show_query()
collect() executes the query and retrieves the results into a local tibble.
This brings all the data from the database into R's memory, which is useful
once you've done as much as possible in the database, and now need to use
R functions.
## S3 method for class 'tbl_sql' collect( x, ..., n = Inf, warn_incomplete = TRUE, sql_options = NULL, cte = deprecated() )## S3 method for class 'tbl_sql' collect( x, ..., n = Inf, warn_incomplete = TRUE, sql_options = NULL, cte = deprecated() )
x |
A lazy data frame backed by a database query. |
... |
Ignored. |
n |
Number of rows to fetch. Defaults to |
warn_incomplete |
Warn if |
sql_options |
SQL rendering options generated by |
cte |
library(dplyr, warn.conflicts = FALSE) db <- memdb_frame(a = c(3, 4, 1, 2), b = c(5, 1, 2, NA)) db |> filter(a <= 2) |> collect()library(dplyr, warn.conflicts = FALSE) db <- memdb_frame(a = c(3, 4, 1, 2), b = c(5, 1, 2, NA)) db |> filter(a <= 2) |> collect()
Turns implicit missing values into explicit missing values. This is a method
for the tidyr::complete() generic.
## S3 method for class 'tbl_lazy' complete(data, ..., fill = list())## S3 method for class 'tbl_lazy' complete(data, ..., fill = list())
data |
A lazy data frame backed by a database query. |
... |
Specification of columns to expand. See tidyr::expand for more details. |
fill |
A named list that for each variable supplies a single value to use instead of NA for missing combinations. |
Another tbl_lazy. Use show_query() to see the generated
query, and use collect() to execute the query
and return data to R.
df <- memdb_frame( group = c(1:2, 1), item_id = c(1:2, 2), item_name = c("a", "b", "b"), value1 = 1:3, value2 = 4:6 ) df |> tidyr::complete(group, nesting(item_id, item_name)) # You can also choose to fill in missing values df |> tidyr::complete(group, nesting(item_id, item_name), fill = list(value1 = 0))df <- memdb_frame( group = c(1:2, 1), item_id = c(1:2, 2), item_name = c("a", "b", "b"), value1 = 1:3, value2 = 4:6 ) df |> tidyr::complete(group, nesting(item_id, item_name)) # You can also choose to fill in missing values df |> tidyr::complete(group, nesting(item_id, item_name), fill = list(value1 = 0))
compute() executes the query and stores the results in a new remote table.
This is useful when you want to cache intermediate results for reuse or to
improve performance by avoiding repeated computation of complex queries.
## S3 method for class 'tbl_sql' compute( x, name = NULL, temporary = TRUE, overwrite = FALSE, unique_indexes = list(), indexes = list(), analyze = TRUE, ..., sql_options = NULL, cte = deprecated() )## S3 method for class 'tbl_sql' compute( x, name = NULL, temporary = TRUE, overwrite = FALSE, unique_indexes = list(), indexes = list(), analyze = TRUE, ..., sql_options = NULL, cte = deprecated() )
x |
A lazy data frame backed by a database query. |
name |
Name of new remote table. Use a string to create the table
in the current catalog/schema. Use |
temporary |
if |
overwrite |
If |
unique_indexes |
a list of character vectors. Each element of the list will create a new unique index over the specified column(s). Duplicate rows will result in failure. |
indexes |
a list of character vectors. Each element of the list will create a new index. |
analyze |
if |
... |
Ignored. |
sql_options |
SQL rendering options generated by |
cte |
library(dplyr, warn.conflicts = FALSE) db <- memdb_frame(a = c(3, 4, 1, 2), b = c(5, 1, 2, NA)) db |> filter(a <= 2) |> show_query() db |> filter(a <= 2) |> compute() |> show_query()library(dplyr, warn.conflicts = FALSE) db <- memdb_frame(a = c(3, 4, 1, 2), b = c(5, 1, 2, NA)) db |> filter(a <= 2) |> show_query() db |> filter(a <= 2) |> compute() |> show_query()
This is an alternative to copy_to() that does not need write access and
is faster for small data.
copy_inline(con, df, types = NULL)copy_inline(con, df, types = NULL)
con |
A database connection. |
df |
A local data frame. The data is written directly in the SQL query so it should be small. |
types |
A named character vector of SQL data types to use for the columns.
The data types are backend specific. For example for Postgres this could
be |
It writes the data directly in the SQL query via the VALUES clause.
A tbl_lazy.
copy_to() to copy the data into a new database table.
df <- data.frame(x = 1:3, y = c("a", "b", "c")) con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") copy_inline(con, df) copy_inline(con, df) |> dplyr::show_query()df <- data.frame(x = 1:3, y = c("a", "b", "c")) con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") copy_inline(con, df) copy_inline(con, df) |> dplyr::show_query()
This is an implementation of the dplyr copy_to() generic and it mostly
a wrapper around DBI::dbWriteTable().
It is useful for copying small amounts of data to a database for examples, experiments, and joins. By default, it creates temporary tables which are only visible within the current connection to the database.
## S3 method for class 'src_sql' copy_to( dest, df, name = deparse(substitute(df)), overwrite = FALSE, types = NULL, temporary = TRUE, unique_indexes = NULL, indexes = NULL, analyze = TRUE, ..., in_transaction = TRUE )## S3 method for class 'src_sql' copy_to( dest, df, name = deparse(substitute(df)), overwrite = FALSE, types = NULL, temporary = TRUE, unique_indexes = NULL, indexes = NULL, analyze = TRUE, ..., in_transaction = TRUE )
dest |
remote data source |
df |
A local data frame, a |
name |
Name of new remote table. Use a string to create the table
in the current catalog/schema. Use |
overwrite |
If |
types |
a character vector giving variable types to use for the columns. See https://www.sqlite.org/datatype3.html for available types. |
temporary |
if |
unique_indexes |
a list of character vectors. Each element of the list will create a new unique index over the specified column(s). Duplicate rows will result in failure. |
indexes |
a list of character vectors. Each element of the list will create a new index. |
analyze |
if |
... |
other parameters passed to methods. |
in_transaction |
Should the table creation be wrapped in a transaction? This typically makes things faster, but you may want to suppress if the database doesn't support transactions, or you're wrapping in a transaction higher up (and your database doesn't support nested transactions.) |
Another tbl_lazy. Use show_query() to see the generated
query, and use collect() to execute the query
and return data to R.
copy_inline() to use small data in an SQL query without actually
writing to a table.
library(dplyr, warn.conflicts = FALSE) df <- data.frame(x = 1:5, y = letters[5:1]) db <- copy_to(memdb(), df) db df2 <- data.frame(y = c("a", "d"), fruit = c("apple", "date")) # copy_to() is called automatically if you set copy = TRUE # in the join functions db |> left_join(df2, copy = TRUE)library(dplyr, warn.conflicts = FALSE) df <- data.frame(x = 1:5, y = letters[5:1]) db <- copy_to(memdb(), df) db df2 <- data.frame(y = c("a", "d"), fruit = c("apple", "date")) # copy_to() is called automatically if you set copy = TRUE # in the join functions db |> left_join(df2, copy = TRUE)
These are methods for the dplyr dplyr::count() and tally() generics. They
wrap up group_by.tbl_lazy(), summarise.tbl_lazy() and, optionally,
arrange.tbl_lazy().
## S3 method for class 'tbl_lazy' count(x, ..., wt = NULL, sort = FALSE, name = NULL) ## S3 method for class 'tbl_lazy' add_count(x, ..., wt = NULL, sort = FALSE, name = NULL, .drop = NULL) ## S3 method for class 'tbl_lazy' tally(x, wt = NULL, sort = FALSE, name = NULL)## S3 method for class 'tbl_lazy' count(x, ..., wt = NULL, sort = FALSE, name = NULL) ## S3 method for class 'tbl_lazy' add_count(x, ..., wt = NULL, sort = FALSE, name = NULL, .drop = NULL) ## S3 method for class 'tbl_lazy' tally(x, wt = NULL, sort = FALSE, name = NULL)
x |
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
... |
< |
wt |
<
|
sort |
If |
name |
The name of the new column in the output. If omitted, it will default to |
.drop |
Not supported for lazy tables. |
library(dplyr, warn.conflicts = FALSE) db <- memdb_frame(g = c(1, 1, 1, 2, 2), x = c(4, 3, 6, 9, 2)) db |> count(g) |> show_query() db |> count(g, wt = x) |> show_query() db |> count(g, wt = x, sort = TRUE) |> show_query()library(dplyr, warn.conflicts = FALSE) db <- memdb_frame(g = c(1, 1, 1, 2, 2), x = c(4, 3, 6, 9, 2)) db |> count(g) |> show_query() db |> count(g, wt = x) |> show_query() db |> count(g, wt = x, sort = TRUE) |> show_query()
This is a method for the tidyr uncount() generic. It uses a temporary
table, so your database user needs permissions to create one.
dbplyr_uncount(data, weights, .remove = TRUE, .id = NULL)dbplyr_uncount(data, weights, .remove = TRUE, .id = NULL)
data |
A lazy data frame backed by a database query. |
weights |
A vector of weights. Evaluated in the context of |
.remove |
If |
.id |
Supply a string to create a new variable which gives a unique identifier for each created row. |
df <- memdb_frame(x = c("a", "b"), n = c(1, 2)) dbplyr_uncount(df, n) dbplyr_uncount(df, n, .id = "id") # You can also use constants dbplyr_uncount(df, 2) # Or expressions dbplyr_uncount(df, 2 / n)df <- memdb_frame(x = c("a", "b"), n = c(1, 2)) dbplyr_uncount(df, n) dbplyr_uncount(df, n, .id = "id") # You can also use constants dbplyr_uncount(df, 2) # Or expressions dbplyr_uncount(df, 2 / n)
These are methods for the dplyr generics slice_min(), slice_max(), and
slice_sample(). They are translated to SQL using dplyr::filter() and
window functions (ROWNUMBER, MIN_RANK, or CUME_DIST depending on
arguments). slice(), slice_head(), and slice_tail() are not supported
since database tables have no intrinsic order.
If data is grouped, the operation will be performed on each group so that
(e.g.) slice_min(db, x, n = 3) will select the three rows with the smallest
value of x in each group.
## S3 method for class 'tbl_lazy' slice_min( .data, order_by, ..., n, prop, by = NULL, with_ties = TRUE, na_rm = TRUE ) ## S3 method for class 'tbl_lazy' slice_max( .data, order_by, ..., n, by = NULL, prop, with_ties = TRUE, na_rm = TRUE ) ## S3 method for class 'tbl_lazy' slice_sample(.data, ..., n, prop, by = NULL, weight_by = NULL, replace = FALSE)## S3 method for class 'tbl_lazy' slice_min( .data, order_by, ..., n, prop, by = NULL, with_ties = TRUE, na_rm = TRUE ) ## S3 method for class 'tbl_lazy' slice_max( .data, order_by, ..., n, by = NULL, prop, with_ties = TRUE, na_rm = TRUE ) ## S3 method for class 'tbl_lazy' slice_sample(.data, ..., n, prop, by = NULL, weight_by = NULL, replace = FALSE)
.data |
A lazy data frame backed by a database query. |
order_by |
Variable or function of variables to order by. |
... |
Not used. |
n, prop
|
Provide either If |
by |
< |
with_ties |
Should ties be kept together? The default, |
na_rm |
Should missing values in |
weight_by, replace
|
Not supported for database backends. |
library(dplyr, warn.conflicts = FALSE) db <- memdb_frame(x = 1:3, y = c(1, 1, 2)) db |> slice_min(x) |> show_query() db |> slice_max(x) |> show_query() db |> slice_sample() |> show_query() db |> group_by(y) |> slice_min(x) |> show_query() # By default, ties are includes so you may get more rows # than you expect db |> slice_min(y, n = 1) db |> slice_min(y, n = 1, with_ties = FALSE) # Non-integer group sizes are rounded down db |> slice_min(x, prop = 0.5)library(dplyr, warn.conflicts = FALSE) db <- memdb_frame(x = 1:3, y = c(1, 1, 2)) db |> slice_min(x) |> show_query() db |> slice_max(x) |> show_query() db |> slice_sample() |> show_query() db |> group_by(y) |> slice_min(x) |> show_query() # By default, ties are includes so you may get more rows # than you expect db |> slice_min(y, n = 1) db |> slice_min(y, n = 1, with_ties = FALSE) # Non-integer group sizes are rounded down db |> slice_min(x, prop = 0.5)
This is a method for the dplyr distinct() generic. It adds the
DISTINCT clause to the SQL query.
## S3 method for class 'tbl_lazy' distinct(.data, ..., .keep_all = FALSE)## S3 method for class 'tbl_lazy' distinct(.data, ..., .keep_all = FALSE)
.data |
A lazy data frame backed by a database query. |
... |
< |
.keep_all |
If |
Another tbl_lazy. Use show_query() to see the generated
query, and use collect() to execute the query
and return data to R.
library(dplyr, warn.conflicts = FALSE) db <- memdb_frame(x = c(1, 1, 2, 2), y = c(1, 2, 1, 1)) db |> distinct() |> show_query() db |> distinct(x) |> show_query()library(dplyr, warn.conflicts = FALSE) db <- memdb_frame(x = c(1, 1, 2, 2), y = c(1, 2, 1, 1)) db |> distinct() |> show_query() db |> distinct(x) |> show_query()
do() is deprecated. Instead of do() you should use collect() and then
your favourite combination of purrr and dplyr functions.
## S3 method for class 'tbl_sql' do(.data, ..., .chunk_size = 10000L)## S3 method for class 'tbl_sql' do(.data, ..., .chunk_size = 10000L)
.data |
a tbl |
... |
Expressions to apply to each group. If named, results will be
stored in a new column. If unnamed, must return a data frame. You can
use |
.chunk_size |
The size of each chunk to pull into R. If this number is too big, the process will be slow because R has to allocate and free a lot of memory. If it's too small, it will be slow, because of the overhead of talking to the database. |
escape() turns R values into SQL literals. It implements double dispatch
via two sets of generics: first escape() dispatches on the class of x,
then that method calls sql_escape_ident(), sql_escape_logical(), etc,
which dispatch on con.
These generics translate individual values into SQL. The core
generics are DBI::dbQuoteIdentifier() and DBI::dbQuoteString()
for quoting identifiers and strings, but dbplyr needs additional
tools for inserting logical, date, date-time, and raw values into
queries.
escape(x, parens = NA, collapse = " ", con = NULL) sql_escape_ident(con, x) sql_escape_logical(con, x) sql_escape_date(con, x) sql_escape_datetime(con, x) sql_escape_string(con, x) sql_escape_raw(con, x) sql_vector(x, parens = NA, collapse = " ", con = NULL)escape(x, parens = NA, collapse = " ", con = NULL) sql_escape_ident(con, x) sql_escape_logical(con, x) sql_escape_date(con, x) sql_escape_datetime(con, x) sql_escape_string(con, x) sql_escape_raw(con, x) sql_vector(x, parens = NA, collapse = " ", con = NULL)
x |
An object to escape. Existing sql vectors will be left as is,
character vectors are escaped with single quotes, numeric vectors have
trailing |
parens, collapse
|
Controls behaviour when multiple values are supplied.
Default behaviour: lists are always wrapped in parens and separated by commas, identifiers are separated by commas and never wrapped, atomic vectors are separated by spaces and wrapped in parens if needed. |
con |
A sql_dialect object or database connection. Connections are supported for backward compatibility. |
A sql vector.
Other generic:
db-sql,
db_connection_describe(),
db_copy_to()
con <- dialect_ansi() # Doubles vs. integers escape(1:5, con = con) escape(c(1, 5.4), con = con) # String vs known sql vs. sql identifier escape("X", con = con) escape(sql("X"), con = con) escape(ident("X"), con = con) # Escaping is idempotent escape("X", con = con) escape(escape("X", con = con), con = con) # Database specific generics sql_escape_logical(con, c(TRUE, FALSE, NA)) sql_escape_date(con, Sys.Date()) sql_escape_date(con, Sys.time()) sql_escape_raw(con, charToRaw("hi"))con <- dialect_ansi() # Doubles vs. integers escape(1:5, con = con) escape(c(1, 5.4), con = con) # String vs known sql vs. sql identifier escape("X", con = con) escape(sql("X"), con = con) escape(ident("X"), con = con) # Escaping is idempotent escape("X", con = con) escape(escape("X", con = con), con = con) # Database specific generics sql_escape_logical(con, c(TRUE, FALSE, NA)) sql_escape_date(con, Sys.Date()) sql_escape_date(con, Sys.time()) sql_escape_raw(con, charToRaw("hi"))
This is a method for the tidyr::expand generics. It doesn't sort the
result explicitly, so the order might be different to what expand()
returns for data frames.
## S3 method for class 'tbl_lazy' expand(data, ..., .name_repair = "check_unique")## S3 method for class 'tbl_lazy' expand(data, ..., .name_repair = "check_unique")
data |
A lazy data frame backed by a database query. |
... |
Specification of columns to expand. See tidyr::expand for more details. |
.name_repair |
Treatment of problematic column names:
This argument is passed on as |
Another tbl_lazy. Use show_query() to see the generated
query, and use collect() to execute the query
and return data to R.
fruits <- memdb_frame( type = c("apple", "orange", "apple", "orange", "orange", "orange"), year = c(2010, 2010, 2012, 2010, 2010, 2012), size = c("XS", "S", "M", "S", "S", "M"), weights = rnorm(6) ) # All possible combinations --------------------------------------- fruits |> tidyr::expand(type) fruits |> tidyr::expand(type, size) # Only combinations that already appear in the data --------------- fruits |> tidyr::expand(nesting(type, size))fruits <- memdb_frame( type = c("apple", "orange", "apple", "orange", "orange", "orange"), year = c(2010, 2010, 2012, 2010, 2010, 2012), size = c("XS", "S", "M", "S", "S", "M"), weights = rnorm(6) ) # All possible combinations --------------------------------------- fruits |> tidyr::expand(type) fruits |> tidyr::expand(type, size) # Only combinations that already appear in the data --------------- fruits |> tidyr::expand(nesting(type, size))
Fill in missing values with previous or next value
## S3 method for class 'tbl_lazy' fill(.data, ..., .direction = c("down", "up", "updown", "downup"))## S3 method for class 'tbl_lazy' fill(.data, ..., .direction = c("down", "up", "updown", "downup"))
.data |
A lazy data frame backed by a database query. |
... |
Columns to fill. |
.direction |
Direction in which to fill missing values. Currently
either "down" (the default) or "up". Note that "up" does not work when
|
library(dplyr, warn.conflicts = FALSE) squirrels <- tibble::tribble( ~group, ~name, ~role, ~n_squirrels, ~ n_squirrels2, 1, "Sam", "Observer", NA, 1, 1, "Mara", "Scorekeeper", 8, NA, 1, "Jesse", "Observer", NA, NA, 1, "Tom", "Observer", NA, 4, 2, "Mike", "Observer", NA, NA, 2, "Rachael", "Observer", NA, 6, 2, "Sydekea", "Scorekeeper", 14, NA, 2, "Gabriela", "Observer", NA, NA, 3, "Derrick", "Observer", NA, NA, 3, "Kara", "Scorekeeper", 9, 10, 3, "Emily", "Observer", NA, NA, 3, "Danielle", "Observer", NA, NA ) squirrels$id <- 1:12 squirrels_db <- copy_to(memdb(), squirrels) squirrels_db |> window_order(id) |> tidyr::fill(n_squirrels, n_squirrels2)library(dplyr, warn.conflicts = FALSE) squirrels <- tibble::tribble( ~group, ~name, ~role, ~n_squirrels, ~ n_squirrels2, 1, "Sam", "Observer", NA, 1, 1, "Mara", "Scorekeeper", 8, NA, 1, "Jesse", "Observer", NA, NA, 1, "Tom", "Observer", NA, 4, 2, "Mike", "Observer", NA, NA, 2, "Rachael", "Observer", NA, 6, 2, "Sydekea", "Scorekeeper", 14, NA, 2, "Gabriela", "Observer", NA, NA, 3, "Derrick", "Observer", NA, NA, 3, "Kara", "Scorekeeper", 9, 10, 3, "Emily", "Observer", NA, NA, 3, "Danielle", "Observer", NA, NA ) squirrels$id <- 1:12 squirrels_db <- copy_to(memdb(), squirrels) squirrels_db |> window_order(id) |> tidyr::fill(n_squirrels, n_squirrels2)
These are methods for the dplyr dplyr::filter() and dplyr::filter_out()
generics. They generate the WHERE clause of the SQL query.
filter() is translated directly to WHERE, which already matches dplyr's
behaviour of treating NA like FALSE (SQL's three-valued logic drops
NULL rows from WHERE).
filter_out() requires an additional step, where the combined condition
is wrapped in is_distinct_from(., TRUE), which is then translated using
the backend (e.g. to IS DISTINCT FROM on PostgreSQL, IS NOT on SQLite).
This ensures that the SQL translation matches dplyr's semantics.
## S3 method for class 'tbl_lazy' filter(.data, ..., .by = NULL, .preserve = FALSE) ## S3 method for class 'tbl_lazy' filter_out(.data, ..., .by = NULL, .preserve = FALSE)## S3 method for class 'tbl_lazy' filter(.data, ..., .by = NULL, .preserve = FALSE) ## S3 method for class 'tbl_lazy' filter_out(.data, ..., .by = NULL, .preserve = FALSE)
.data |
A lazy data frame backed by a database query. |
... |
< |
.by |
< |
.preserve |
Not supported by this method. |
Another tbl_lazy. Use show_query() to see the generated
query, and use collect() to execute the query
and return data to R.
library(dplyr, warn.conflicts = FALSE) db <- memdb_frame(x = c(2, NA, 5, NA, 10), y = 1:5) db |> filter(x < 5) |> show_query() db |> filter_out(x < 5) |> show_query() db |> filter(is.na(x)) |> show_query()library(dplyr, warn.conflicts = FALSE) db <- memdb_frame(x = c(2, NA, 5, NA, 10), y = 1:5) db |> filter(x < 5) |> show_query() db |> filter_out(x < 5) |> show_query() db |> filter(is.na(x)) |> show_query()
RETURNING rowsget_returned_rows() extracts the RETURNING rows produced by
rows_insert(), rows_append(), rows_update(), rows_upsert(),
or rows_delete() if these are called with the returning argument.
An error is raised if this information is not available.
has_returned_rows() checks if x has stored RETURNING rows produced by
rows_insert(), rows_append(), rows_update(), rows_upsert(),
or rows_delete().
get_returned_rows(x) has_returned_rows(x)get_returned_rows(x) has_returned_rows(x)
x |
A lazy tbl. |
For get_returned_rows(), a tibble.
For has_returned_rows(), a scalar logical.
library(dplyr) con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") DBI::dbExecute(con, "CREATE TABLE Info ( id INTEGER PRIMARY KEY AUTOINCREMENT, number INTEGER )") info <- tbl(con, "Info") rows1 <- copy_inline(con, data.frame(number = c(1, 5))) rows_insert(info, rows1, conflict = "ignore", in_place = TRUE) info # If the table has an auto incrementing primary key, you can use # the returning argument + `get_returned_rows()` its value rows2 <- copy_inline(con, data.frame(number = c(13, 27))) info <- rows_insert( info, rows2, conflict = "ignore", in_place = TRUE, returning = id ) info get_returned_rows(info)library(dplyr) con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") DBI::dbExecute(con, "CREATE TABLE Info ( id INTEGER PRIMARY KEY AUTOINCREMENT, number INTEGER )") info <- tbl(con, "Info") rows1 <- copy_inline(con, data.frame(number = c(1, 5))) rows_insert(info, rows1, conflict = "ignore", in_place = TRUE) info # If the table has an auto incrementing primary key, you can use # the returning argument + `get_returned_rows()` its value rows2 <- copy_inline(con, data.frame(number = c(13, 27))) info <- rows_insert( info, rows2, conflict = "ignore", in_place = TRUE, returning = id ) info get_returned_rows(info)
This is a method for the dplyr group_by() generic. It is translated to
the GROUP BY clause of the SQL query when used with
summarise() and to the PARTITION BY clause of
window functions when used with mutate().
## S3 method for class 'tbl_lazy' group_by(.data, ..., .add = FALSE, .drop = TRUE)## S3 method for class 'tbl_lazy' group_by(.data, ..., .add = FALSE, .drop = TRUE)
.data |
A lazy data frame backed by a database query. |
... |
< |
.add |
When |
.drop |
Not supported by this method. |
library(dplyr, warn.conflicts = FALSE) db <- memdb_frame(g = c(1, 1, 1, 2, 2), x = c(4, 3, 6, 9, 2)) db |> group_by(g) |> summarise(n()) |> show_query() db |> group_by(g) |> mutate(x2 = x / sum(x, na.rm = TRUE)) |> show_query()library(dplyr, warn.conflicts = FALSE) db <- memdb_frame(g = c(1, 1, 1, 2, 2), x = c(4, 3, 6, 9, 2)) db |> group_by(g) |> summarise(n()) |> show_query() db |> group_by(g) |> mutate(x2 = x / sum(x, na.rm = TRUE)) |> show_query()
This is a method for the head() generic. It is usually translated to the
LIMIT clause of the SQL query. Because LIMIT is not an official part of
the SQL specification, some database use other clauses like TOP or
FETCH ROWS.
Note that databases don't really have a sense of row order, so what "first"
means is subject to interpretation. Most databases will respect ordering
performed with arrange(), but it's not guaranteed. tail() is not
supported at all because the situation is even murkier for the "last" rows.
Additionally, LIMIT clauses can not generally appear in subqueries, which
means that you should use head() as late as possible in your pipelines.
## S3 method for class 'tbl_lazy' head(x, n = 6L, ...)## S3 method for class 'tbl_lazy' head(x, n = 6L, ...)
x |
A lazy data frame backed by a database query. |
n |
Number of rows to return |
... |
Not used. |
Another tbl_lazy. Use show_query() to see the generated
query, and use collect() to execute the query
and return data to R.
library(dplyr, warn.conflicts = FALSE) db <- memdb_frame(x = 1:100) db |> head() |> show_query() # Pretend we have data in a SQL server database db2 <- lazy_frame(x = 1:100, con = dialect_mssql()) db2 |> head() |> show_query()library(dplyr, warn.conflicts = FALSE) db <- memdb_frame(x = 1:100) db |> head() |> show_query() # Pretend we have data in a SQL server database db2 <- lazy_frame(x = 1:100, con = dialect_mssql()) db2 |> head() |> show_query()
These are methods for the dplyr generics dplyr::intersect(),
dplyr::union(), and dplyr::setdiff(). They are translated to
INTERSECT, UNION, and EXCEPT respectively.
## S3 method for class 'tbl_lazy' intersect(x, y, copy = "none", ..., all = FALSE) ## S3 method for class 'tbl_lazy' union(x, y, copy = "none", ..., all = FALSE) ## S3 method for class 'tbl_lazy' union_all(x, y, copy = "none", ...) ## S3 method for class 'tbl_lazy' setdiff(x, y, copy = "none", ..., all = FALSE)## S3 method for class 'tbl_lazy' intersect(x, y, copy = "none", ..., all = FALSE) ## S3 method for class 'tbl_lazy' union(x, y, copy = "none", ..., all = FALSE) ## S3 method for class 'tbl_lazy' union_all(x, y, copy = "none", ...) ## S3 method for class 'tbl_lazy' setdiff(x, y, copy = "none", ..., all = FALSE)
x, y
|
A pair of lazy data frames backed by database queries. |
copy |
If
|
... |
Must be empty. |
all |
If |
These are methods for the dplyr join generics. They are translated to the following SQL queries:
inner_join(x, y): SELECT * FROM x JOIN y ON x.a = y.a
left_join(x, y): SELECT * FROM x LEFT JOIN y ON x.a = y.a
right_join(x, y): SELECT * FROM x RIGHT JOIN y ON x.a = y.a
full_join(x, y): SELECT * FROM x FULL JOIN y ON x.a = y.a
semi_join(x, y): SELECT * FROM x WHERE EXISTS (SELECT 1 FROM y WHERE x.a = y.a)
anti_join(x, y): SELECT * FROM x WHERE NOT EXISTS (SELECT 1 FROM y WHERE x.a = y.a)
## S3 method for class 'tbl_lazy' inner_join( x, y, by = NULL, copy = "none", suffix = NULL, ..., keep = NULL, na_matches = c("never", "na"), multiple = NULL, unmatched = "drop", relationship = NULL, sql_on = NULL, auto_index = FALSE, x_as = NULL, y_as = NULL ) ## S3 method for class 'tbl_lazy' left_join( x, y, by = NULL, copy = "none", suffix = NULL, ..., keep = NULL, na_matches = c("never", "na"), multiple = NULL, unmatched = "drop", relationship = NULL, sql_on = NULL, auto_index = FALSE, x_as = NULL, y_as = NULL ) ## S3 method for class 'tbl_lazy' right_join( x, y, by = NULL, copy = "none", suffix = NULL, ..., keep = NULL, na_matches = c("never", "na"), multiple = NULL, unmatched = "drop", relationship = NULL, sql_on = NULL, auto_index = FALSE, x_as = NULL, y_as = NULL ) ## S3 method for class 'tbl_lazy' full_join( x, y, by = NULL, copy = "none", suffix = NULL, ..., keep = NULL, na_matches = c("never", "na"), multiple = NULL, relationship = NULL, sql_on = NULL, auto_index = FALSE, x_as = NULL, y_as = NULL ) ## S3 method for class 'tbl_lazy' cross_join( x, y, ..., copy = "none", suffix = c(".x", ".y"), x_as = NULL, y_as = NULL ) ## S3 method for class 'tbl_lazy' semi_join( x, y, by = NULL, copy = "none", ..., na_matches = c("never", "na"), sql_on = NULL, auto_index = FALSE, x_as = NULL, y_as = NULL ) ## S3 method for class 'tbl_lazy' anti_join( x, y, by = NULL, copy = "none", ..., na_matches = c("never", "na"), sql_on = NULL, auto_index = FALSE, x_as = NULL, y_as = NULL )## S3 method for class 'tbl_lazy' inner_join( x, y, by = NULL, copy = "none", suffix = NULL, ..., keep = NULL, na_matches = c("never", "na"), multiple = NULL, unmatched = "drop", relationship = NULL, sql_on = NULL, auto_index = FALSE, x_as = NULL, y_as = NULL ) ## S3 method for class 'tbl_lazy' left_join( x, y, by = NULL, copy = "none", suffix = NULL, ..., keep = NULL, na_matches = c("never", "na"), multiple = NULL, unmatched = "drop", relationship = NULL, sql_on = NULL, auto_index = FALSE, x_as = NULL, y_as = NULL ) ## S3 method for class 'tbl_lazy' right_join( x, y, by = NULL, copy = "none", suffix = NULL, ..., keep = NULL, na_matches = c("never", "na"), multiple = NULL, unmatched = "drop", relationship = NULL, sql_on = NULL, auto_index = FALSE, x_as = NULL, y_as = NULL ) ## S3 method for class 'tbl_lazy' full_join( x, y, by = NULL, copy = "none", suffix = NULL, ..., keep = NULL, na_matches = c("never", "na"), multiple = NULL, relationship = NULL, sql_on = NULL, auto_index = FALSE, x_as = NULL, y_as = NULL ) ## S3 method for class 'tbl_lazy' cross_join( x, y, ..., copy = "none", suffix = c(".x", ".y"), x_as = NULL, y_as = NULL ) ## S3 method for class 'tbl_lazy' semi_join( x, y, by = NULL, copy = "none", ..., na_matches = c("never", "na"), sql_on = NULL, auto_index = FALSE, x_as = NULL, y_as = NULL ) ## S3 method for class 'tbl_lazy' anti_join( x, y, by = NULL, copy = "none", ..., na_matches = c("never", "na"), sql_on = NULL, auto_index = FALSE, x_as = NULL, y_as = NULL )
x, y
|
A pair of lazy data frames backed by database queries. |
by |
A join specification created with If To join on different variables between To join by multiple variables, use a
For simple equality joins, you can alternatively specify a character vector
of variable names to join by. For example, To perform a cross-join, generating all combinations of |
copy |
If
|
suffix |
If there are non-joined duplicate variables in |
... |
Other parameters passed onto methods. |
keep |
Should the join keys from both
|
na_matches |
Should NA (NULL) values match one another?
The default, "never", is how databases usually work. |
multiple, unmatched
|
Unsupported in database backends. As a workaround for multiple use a unique key and for unmatched a foreign key constraint. |
relationship |
Unsupported in database backends. |
sql_on |
A custom join predicate as an SQL expression.
Usually joins use column equality, but you can perform more complex
queries by supply |
auto_index |
if |
x_as, y_as
|
Alias to use for |
Another tbl_lazy. Use show_query() to see the generated
query, and use collect() to execute the query
and return data to R.
library(dplyr, warn.conflicts = FALSE) band_db <- copy_to(memdb(), dplyr::band_members) instrument_db <- copy_to(memdb(), dplyr::band_instruments) band_db |> left_join(instrument_db) |> show_query() # Can join with local data frames by setting copy = TRUE band_db |> left_join(dplyr::band_instruments, copy = TRUE) # Unlike R, joins in SQL don't usually match NAs (NULLs) db <- memdb_frame(x = c(1, 2, NA)) label <- memdb_frame(x = c(1, NA), label = c("one", "missing")) db |> left_join(label, by = "x") # But you can activate R's usual behaviour with the na_matches argument db |> left_join(label, by = "x", na_matches = "na") # By default, joins are equijoins, but you can use `sql_on` to # express richer relationships db1 <- memdb_frame(x = 1:5) db2 <- memdb_frame(x = 1:3, y = letters[1:3]) db1 |> left_join(db2) |> show_query() db1 |> left_join(db2, sql_on = "LHS.x < RHS.x") |> show_query()library(dplyr, warn.conflicts = FALSE) band_db <- copy_to(memdb(), dplyr::band_members) instrument_db <- copy_to(memdb(), dplyr::band_instruments) band_db |> left_join(instrument_db) |> show_query() # Can join with local data frames by setting copy = TRUE band_db |> left_join(dplyr::band_instruments, copy = TRUE) # Unlike R, joins in SQL don't usually match NAs (NULLs) db <- memdb_frame(x = c(1, 2, NA)) label <- memdb_frame(x = c(1, NA), label = c("one", "missing")) db |> left_join(label, by = "x") # But you can activate R's usual behaviour with the na_matches argument db |> left_join(label, by = "x", na_matches = "na") # By default, joins are equijoins, but you can use `sql_on` to # express richer relationships db1 <- memdb_frame(x = 1:5) db2 <- memdb_frame(x = 1:3, y = letters[1:3]) db1 |> left_join(db2) |> show_query() db1 |> left_join(db2, sql_on = "LHS.x < RHS.x") |> show_query()
This is a helper function that retrieves the most recent SQL query generated by dbplyr, which can be useful for debugging.
last_sql()last_sql()
A SQL string, or NULL if no query has been generated yet.
library(dplyr, warn.conflicts = FALSE) df <- lazy_frame(x = 1:3) df |> filter(x > 1) last_sql()library(dplyr, warn.conflicts = FALSE) df <- lazy_frame(x = 1:3) df |> filter(x > 1) last_sql()
memdb() creates a temporary in-memory database that disappears when the
R session ends. It's a convenient way to learn about and experiment with
dbplyr without having to connect to a "real" database.
memdb_frame() works like tibble::tibble(), but instead of creating a new
data frame in R, it creates a table in memdb(). local_memdb_frame()
is like memdb_frame() but the table will be automatically deleted when
the current scope ends. It's useful for tests. But beware: this function
will overwrite an existing table of the same name.
memdb() memdb_frame(.name = unique_table_name(), ...) local_memdb_frame(.name = unique_table_name(), ..., frame = caller_env())memdb() memdb_frame(.name = unique_table_name(), ...) local_memdb_frame(.name = unique_table_name(), ..., frame = caller_env())
.name |
Name of table in database: defaults to a random name that's unlikely to conflict with an existing table. |
... |
< Arguments are evaluated sequentially.
You can refer to previously created elements directly or using the rlang::.data
pronoun.
To refer explicitly to objects in the calling environment, use |
frame |
The created table is bound to this execution frame and will be deleted when it ends. For expert use only. |
library(dplyr) # use memdb_frame() to create a new database table df <- memdb_frame(x = runif(100), y = runif(100)) df |> arrange(x) df |> arrange(x) |> show_query() # Use memdb() + copy_to() to copy an existing data frame iris_db <- copy_to(memdb(), iris) iris_dblibrary(dplyr) # use memdb_frame() to create a new database table df <- memdb_frame(x = runif(100), y = runif(100)) df |> arrange(x) df |> arrange(x) |> show_query() # Use memdb() + copy_to() to copy an existing data frame iris_db <- copy_to(memdb(), iris) iris_db
These are methods for the dplyr mutate() and transmute() generics.
They are translated to computed expressions in the SELECT clause of
the SQL query.
## S3 method for class 'tbl_lazy' mutate( .data, ..., .by = NULL, .order = NULL, .frame = NULL, .keep = c("all", "used", "unused", "none"), .before = NULL, .after = NULL )## S3 method for class 'tbl_lazy' mutate( .data, ..., .by = NULL, .order = NULL, .frame = NULL, .keep = c("all", "used", "unused", "none"), .before = NULL, .after = NULL )
.data |
A lazy data frame backed by a database query. |
... |
< |
.by |
< |
.order |
< |
.frame |
A length-2 numeric vector specifying the bounds for
window function frames. The first element is the lower bound (use |
.keep |
Control which columns from
|
.before, .after
|
< |
Another tbl_lazy. Use show_query() to see the generated
query, and use collect() to execute the query
and return data to R.
library(dplyr, warn.conflicts = FALSE) db <- memdb_frame(x = 1:5, y = 5:1) db |> mutate(a = (x + y) / 2, b = sqrt(x^2L + y^2L)) |> show_query() # dbplyr automatically creates subqueries as needed db |> mutate(x1 = x + 1, x2 = x1 * 2) |> show_query() # `.order` and `.frame` control window functions db <- memdb_frame(g = c(1, 1, 2, 2, 2), x = c(5, 3, 1, 4, 2)) db |> mutate(rolling_sum = sum(x), .by = g, .order = x, .frame = c(-2, 2)) |> show_query()library(dplyr, warn.conflicts = FALSE) db <- memdb_frame(x = 1:5, y = 5:1) db |> mutate(a = (x + y) / 2, b = sqrt(x^2L + y^2L)) |> show_query() # dbplyr automatically creates subqueries as needed db |> mutate(x1 = x + 1, x2 = x1 * 2) |> show_query() # `.order` and `.frame` control window functions db <- memdb_frame(g = c(1, 1, 2, 2, 2), x = c(5, 3, 1, 4, 2)) db |> mutate(rolling_sum = sum(x), .by = g, .order = x, .frame = c(-2, 2)) |> show_query()
pivot_longer() "lengthens" data, increasing the number of rows and
decreasing the number of columns. The inverse transformation is
tidyr::pivot_wider().
Learn more in vignette("pivot", "tidyr").
While most functionality is identical there are some differences to
pivot_longer() on local data frames:
the output is sorted differently/not explicitly,
the coercion of mixed column types is left to the database,
values_ptypes NOT supported.
Note that build_longer_spec() and pivot_longer_spec() do not work with
remote tables.
## S3 method for class 'tbl_lazy' pivot_longer( data, cols, ..., cols_vary, names_to = "name", names_prefix = NULL, names_sep = NULL, names_pattern = NULL, names_ptypes = NULL, names_transform = NULL, names_repair = "check_unique", values_to = "value", values_drop_na = FALSE, values_ptypes, values_transform = NULL )## S3 method for class 'tbl_lazy' pivot_longer( data, cols, ..., cols_vary, names_to = "name", names_prefix = NULL, names_sep = NULL, names_pattern = NULL, names_ptypes = NULL, names_transform = NULL, names_repair = "check_unique", values_to = "value", values_drop_na = FALSE, values_ptypes, values_transform = NULL )
data |
A data frame to pivot. |
cols |
Columns to pivot into longer format. |
... |
Additional arguments passed on to methods. |
cols_vary |
Unsupported; included for compatibility with the generic. |
names_to |
A string specifying the name of the column to create
from the data stored in the column names of |
names_prefix |
A regular expression used to remove matching text from the start of each variable name. |
names_sep, names_pattern
|
If |
names_ptypes |
A list of column name-prototype pairs. |
names_transform, values_transform
|
A list of column name-function pairs. |
names_repair |
What happens if the output has invalid column names? |
values_to |
A string specifying the name of the column to create
from the data stored in cell values. If |
values_drop_na |
If |
values_ptypes |
Not supported. |
The SQL translation basically works as follows:
split the specification by its key columns i.e. by variables crammed into the column names.
for each part in the split specification transmute() data into the
following columns
id columns i.e. columns that are not pivotted
key columns
value columns i.e. columns that are pivotted
combine all the parts with union_all()
# See vignette("pivot") for examples and explanation # Simplest case where column names are character data memdb_frame( id = c("a", "b"), x = 1:2, y = 3:4 ) |> tidyr::pivot_longer(-id)# See vignette("pivot") for examples and explanation # Simplest case where column names are character data memdb_frame( id = c("a", "b"), x = 1:2, y = 3:4 ) |> tidyr::pivot_longer(-id)
pivot_wider() "widens" data, increasing the number of columns and
decreasing the number of rows. The inverse transformation is
pivot_longer(). Learn more in vignette("pivot", "tidyr").
pivot_wider() on database tables comes with some caveats, please make
sure to read below for details.
## S3 method for class 'tbl_lazy' pivot_wider( data, ..., id_cols = NULL, id_expand = FALSE, names_from = name, names_prefix = "", names_sep = "_", names_glue = NULL, names_sort = FALSE, names_vary = "fastest", names_expand = FALSE, names_repair = "check_unique", values_from = value, values_fill = NULL, values_fn = ~max(.x, na.rm = TRUE), unused_fn = NULL ) dbplyr_pivot_wider_spec( data, spec, ..., names_repair = "check_unique", id_cols = NULL, id_expand = FALSE, values_fill = NULL, values_fn = ~max(.x, na.rm = TRUE), unused_fn = NULL, error_call = current_env() )## S3 method for class 'tbl_lazy' pivot_wider( data, ..., id_cols = NULL, id_expand = FALSE, names_from = name, names_prefix = "", names_sep = "_", names_glue = NULL, names_sort = FALSE, names_vary = "fastest", names_expand = FALSE, names_repair = "check_unique", values_from = value, values_fill = NULL, values_fn = ~max(.x, na.rm = TRUE), unused_fn = NULL ) dbplyr_pivot_wider_spec( data, spec, ..., names_repair = "check_unique", id_cols = NULL, id_expand = FALSE, values_fill = NULL, values_fn = ~max(.x, na.rm = TRUE), unused_fn = NULL, error_call = current_env() )
data |
A lazy data frame backed by a database query. |
... |
Unused; included for compatibility with generic. |
id_cols |
A set of columns that uniquely identifies each observation. |
id_expand |
Unused; included for compatibility with the generic. |
names_from, values_from
|
A pair of
arguments describing which column (or columns) to get the name of the
output column ( If |
names_prefix |
String added to the start of every variable name. |
names_sep |
If |
names_glue |
Instead of |
names_sort |
Should the column names be sorted? If |
names_vary |
When
|
names_expand |
Should the values in the |
names_repair |
What happens if the output has invalid column names? |
values_fill |
Optionally, a (scalar) value that specifies what each
|
values_fn |
A function, the default is |
unused_fn |
Optionally, a function applied to summarize the values from
the unused columns (i.e. columns not identified by The default drops all unused columns from the result. This can be a named list if you want to apply different aggregations to different unused columns.
This is similar to grouping by the |
spec |
A specification data frame. This is useful for more complex pivots because it gives you greater control on how metadata stored in the columns become column names in the result. Must be a data frame containing character |
error_call |
The execution environment of a currently
running function, e.g. |
pivot_wider() is eagerNote that pivot_wider() cannot be lazy because we need to look
at the data to figure out what the new column names will be.
If you have a long-running query you have two options:
Temporarily store the result of the query via compute().
Create a spec before and use dbplyr_pivot_wider_spec() - dbplyr's version
of tidyr::pivot_wider_spec().
values_fn
The big difference to pivot_wider() for local data frames is that
values_fn must not be NULL. By default it is max() which yields
the same results as for local data frames if three conditions are true:
The combination of id_cols and value uniquely identify an observation.
The column has a comparable type (e.g. numeric, date-time, or (for most databases) string).
values_fill is NULL.
If either the second or third condition is not met, you must supply a
custom values_fn. Unfortunately there is no generally available alternative
and you'll need to look for something database specific, like FIRST()
or ANY_VALUE().
The translation to SQL code basically works as follows:
Get unique keys in names_from column.
For each key value generate an expression of the form:
values_fn( CASE WHEN (`names from column` == `key value`) THEN (`value column`) END ) AS `output column`
Group data by id columns.
Summarise the grouped data with the expressions from step 2.
memdb_frame( id = 1, key = c("x", "y"), value = 1:2 ) |> tidyr::pivot_wider( id_cols = id, names_from = key, values_from = value )memdb_frame( id = 1, key = c("x", "y"), value = 1:2 ) |> tidyr::pivot_wider( id_cols = id, names_from = key, values_from = value )
This is a method for the dplyr pull() generic. It evaluates the query
retrieving just the specified column.
## S3 method for class 'tbl_sql' pull(.data, var = -1, name = NULL, ...)## S3 method for class 'tbl_sql' pull(.data, var = -1, name = NULL, ...)
.data |
A lazy data frame backed by a database query. |
var |
A variable specified as:
The default returns the last column (on the assumption that's the column you've created most recently). This argument is taken by expression and supports quasiquotation (you can unquote column names and column locations). |
name |
An optional parameter that specifies the column to be used
as names for a named vector. Specified in a similar manner as |
... |
< |
A vector of data.
library(dplyr, warn.conflicts = FALSE) db <- memdb_frame(x = 1:5, y = 5:1) db |> mutate(z = x + y * 2) |> pull()library(dplyr, warn.conflicts = FALSE) db <- memdb_frame(x = 1:5, y = 5:1) db |> mutate(z = x + y * 2) |> pull()
remote_name() gives the unescaped name of the remote table, or NULL if it
is a query (created by sql()) or already escape (created by ident_q()).
remote_table() gives the remote table or the query.
remote_query() gives the text of the query, and remote_query_plan()
the query plan (as computed by the remote database). remote_src() and
remote_con() give the dplyr source and DBI connection respectively.
remote_name(x, null_if_local = TRUE) remote_table(x, null_if_local = TRUE) remote_src(x) remote_con(x) remote_query(x, sql_options = NULL, cte = deprecated()) remote_query_plan(x, ...)remote_name(x, null_if_local = TRUE) remote_table(x, null_if_local = TRUE) remote_src(x) remote_con(x) remote_query(x, sql_options = NULL, cte = deprecated()) remote_query_plan(x, ...)
x |
Remote table, currently must be a |
null_if_local |
Return |
sql_options |
SQL rendering options generated by |
cte |
|
... |
Additional arguments passed on to methods. |
remote_name() returns a string, or NULL if not a remote table, or not
applicable. For example, computed queries do not have a "name".
remote_table() returns a sql() object containing the (correctly
quoted) table identifier, or the custom query if the table was created
from sql(). Returns NULL if applied to a lazy query that no longer
corresponds to a single table (e.g. after a filter() or a join), or if
applied to a local lazy table.
mf <- memdb_frame(x = 1:5, y = 5:1, .name = "blorp") remote_name(mf) remote_src(mf) remote_con(mf) remote_query(mf) mf2 <- dplyr::filter(mf, x > 3) remote_name(mf2) remote_src(mf2) remote_con(mf2) remote_query(mf2)mf <- memdb_frame(x = 1:5, y = 5:1, .name = "blorp") remote_name(mf) remote_src(mf) remote_con(mf) remote_query(mf) mf2 <- dplyr::filter(mf, x > 3) remote_name(mf2) remote_src(mf2) remote_con(mf2) remote_query(mf2)
This is a method for the tidyr::replace_na() generic.
## S3 method for class 'tbl_lazy' replace_na(data, replace = list(), ...)## S3 method for class 'tbl_lazy' replace_na(data, replace = list(), ...)
data |
A pair of lazy data frame backed by database queries. |
replace |
A named list of values, with one value for each column that has NA values to be replaced. |
... |
Unused; included for compatibility with generic. |
Another tbl_lazy. Use show_query() to see the generated
query, and use collect() to execute the query
and return data to R.
df <- memdb_frame(x = c(1, 2, NA), y = c("a", NA, "b")) df |> tidyr::replace_na(list(x = 0, y = "unknown"))df <- memdb_frame(x = c(1, 2, NA), y = c("a", NA, "b")) df |> tidyr::replace_na(list(x = 0, y = "unknown"))
These are methods for the dplyr rows_insert(), rows_append(),
rows_update(), rows_patch(), rows_upsert(), and rows_delete()
generics.
When in_place = TRUE these verbs do not generate SELECT queries, but
instead directly modify the underlying data using INSERT, UPDATE, or
DELETE operators. This will require that you have write access to
the database: the connection needs permission to insert, modify or delete
rows, but not to alter the structure of the table.
The default, in_place = FALSE, generates equivalent lazy tables (using
SELECT queries) that allow previewing the result without actually
modifying the underlying table on the database.
## S3 method for class 'tbl_lazy' rows_insert( x, y, by = NULL, ..., conflict = c("error", "ignore"), copy = "none", in_place = FALSE, returning = NULL, method = NULL ) ## S3 method for class 'tbl_lazy' rows_append(x, y, ..., copy = "none", in_place = FALSE, returning = NULL) ## S3 method for class 'tbl_lazy' rows_update( x, y, by = NULL, ..., unmatched = c("error", "ignore"), copy = "none", in_place = FALSE, returning = NULL ) ## S3 method for class 'tbl_lazy' rows_patch( x, y, by = NULL, ..., unmatched = c("error", "ignore"), copy = "none", in_place = FALSE, returning = NULL ) ## S3 method for class 'tbl_lazy' rows_upsert( x, y, by = NULL, ..., copy = "none", in_place = FALSE, returning = NULL, method = NULL ) ## S3 method for class 'tbl_lazy' rows_delete( x, y, by = NULL, ..., unmatched = c("error", "ignore"), copy = "none", in_place = FALSE, returning = NULL )## S3 method for class 'tbl_lazy' rows_insert( x, y, by = NULL, ..., conflict = c("error", "ignore"), copy = "none", in_place = FALSE, returning = NULL, method = NULL ) ## S3 method for class 'tbl_lazy' rows_append(x, y, ..., copy = "none", in_place = FALSE, returning = NULL) ## S3 method for class 'tbl_lazy' rows_update( x, y, by = NULL, ..., unmatched = c("error", "ignore"), copy = "none", in_place = FALSE, returning = NULL ) ## S3 method for class 'tbl_lazy' rows_patch( x, y, by = NULL, ..., unmatched = c("error", "ignore"), copy = "none", in_place = FALSE, returning = NULL ) ## S3 method for class 'tbl_lazy' rows_upsert( x, y, by = NULL, ..., copy = "none", in_place = FALSE, returning = NULL, method = NULL ) ## S3 method for class 'tbl_lazy' rows_delete( x, y, by = NULL, ..., unmatched = c("error", "ignore"), copy = "none", in_place = FALSE, returning = NULL )
x |
A lazy table.
For |
y |
A lazy table, data frame, or data frame extensions (e.g. a tibble). |
by |
An unnamed character vector giving the key columns. The key columns
must exist in both By default, we use the first column in |
... |
Other parameters passed onto methods. |
conflict |
For One of:
|
copy |
If
|
in_place |
Should |
returning |
Columns to return. See |
method |
A string specifying the method to use. This is only relevant for
|
unmatched |
For One of:
|
A new tbl_lazy of the modified data.
With in_place = FALSE, the result is a lazy query that prints visibly,
because the purpose of this operation is to preview the results.
With in_place = TRUE, x is returned invisibly,
because the purpose of this operation is the side effect of modifying rows
in the table behind x.
library(dplyr) con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") DBI::dbExecute(con, "CREATE TABLE Ponies ( id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT, cutie_mark TEXT )") ponies <- tbl(con, "Ponies") applejack <- copy_inline(con, data.frame( name = "Apple Jack", cutie_mark = "three apples" )) # The default behavior is to generate a SELECT query rows_insert(ponies, applejack, conflict = "ignore") # And the original table is left unchanged: ponies # You can also choose to modify the table with in_place = TRUE: rows_insert(ponies, applejack, conflict = "ignore", in_place = TRUE) # In this case `rows_insert()` returns nothing and the underlying # data is modified ponieslibrary(dplyr) con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") DBI::dbExecute(con, "CREATE TABLE Ponies ( id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT, cutie_mark TEXT )") ponies <- tbl(con, "Ponies") applejack <- copy_inline(con, data.frame( name = "Apple Jack", cutie_mark = "three apples" )) # The default behavior is to generate a SELECT query rows_insert(ponies, applejack, conflict = "ignore") # And the original table is left unchanged: ponies # You can also choose to modify the table with in_place = TRUE: rows_insert(ponies, applejack, conflict = "ignore", in_place = TRUE) # In this case `rows_insert()` returns nothing and the underlying # data is modified ponies
These are methods for the dplyr select(), rename(), and relocate()
generics. They generate the SELECT clause of the SQL query.
These functions do not support predicate functions, i.e. you can
not use where(is.numeric) to select all numeric variables.
## S3 method for class 'tbl_lazy' select(.data, ...) ## S3 method for class 'tbl_lazy' rename(.data, ...) ## S3 method for class 'tbl_lazy' rename_with(.data, .fn, .cols = everything(), ...) ## S3 method for class 'tbl_lazy' relocate(.data, ..., .before = NULL, .after = NULL)## S3 method for class 'tbl_lazy' select(.data, ...) ## S3 method for class 'tbl_lazy' rename(.data, ...) ## S3 method for class 'tbl_lazy' rename_with(.data, .fn, .cols = everything(), ...) ## S3 method for class 'tbl_lazy' relocate(.data, ..., .before = NULL, .after = NULL)
.data |
A lazy data frame backed by a database query. |
... |
< |
.fn |
A function used to transform the selected |
.cols |
< |
.before, .after
|
< |
library(dplyr, warn.conflicts = FALSE) db <- memdb_frame(x = 1, y = 2, z = 3) db |> select(-y) |> show_query() db |> relocate(z) |> show_query() db |> rename(first = x, last = z) |> show_query()library(dplyr, warn.conflicts = FALSE) db <- memdb_frame(x = 1, y = 2, z = 3) db |> select(-y) |> show_query() db |> relocate(z) |> show_query() db |> rename(first = x, last = z) |> show_query()
show_sql() displays the SQL query that will be dispatched to the database;
explain() displays both the SQL query and the query plan.
## S3 method for class 'tbl_lazy' show_query(x, ..., use_colour = TRUE, sql_options = NULL, cte = deprecated()) ## S3 method for class 'tbl_sql' explain(x, ...)## S3 method for class 'tbl_lazy' show_query(x, ..., use_colour = TRUE, sql_options = NULL, cte = deprecated()) ## S3 method for class 'tbl_sql' explain(x, ...)
x |
An object to explain |
... |
For |
use_colour |
Should the output be coloured? |
sql_options |
SQL rendering options generated by |
cte |
Use sql() to declare that a string is literal SQL and should be used
as is, without quoting.
sql(...) is.sql(x)sql(...) is.sql(x)
... |
Character vectors that will be combined into a single SQL vector. |
x |
Object to check if it is an sql object. |
library(dplyr, warn.conflicts = FALSE) # sql() just adds a class sql("x + 1") is.sql(sql("x + 1")) # You can use it when you need to insert some literal SQL in a query db <- memdb_frame(x = 1:3) db |> mutate(y = sql("CAST(x as VARCHAR)"))library(dplyr, warn.conflicts = FALSE) # sql() just adds a class sql("x + 1") is.sql(sql("x + 1")) # You can use it when you need to insert some literal SQL in a query db <- memdb_frame(x = 1:3) db |> mutate(y = sql("CAST(x as VARCHAR)"))
The dialect system allows multiple database connection classes to share SQL generation code. A dialect object encapsulates the SQL syntax rules for a particular database, independent of the connection mechanism.
sql_dialect() returns the dialect for a connection. For connections
that haven't implemented a dialect method, returns the connection itself
for backward compatibility.
new_sql_dialect() creates a new dialect object. This is primarily
intended for dbplyr backend authors.
sql_dialect(con) new_sql_dialect( dialect, quote_identifier, has_window_clause = FALSE, has_table_alias_with_as = TRUE, has_star_table_prefix = FALSE )sql_dialect(con) new_sql_dialect( dialect, quote_identifier, has_window_clause = FALSE, has_table_alias_with_as = TRUE, has_star_table_prefix = FALSE )
con |
A database connection. |
dialect |
A string giving the dialect name (e.g., "postgres", "mysql"). |
quote_identifier |
A function that quotes identifiers. Should accept a character vector and return a sql vector. |
has_window_clause |
Does the backend support named window
definitions (the |
has_table_alias_with_as |
Does the backend support using |
has_star_table_prefix |
Does the backend require table prefixes when
selecting all columns in single-table queries (e.g., |
sql_dialect() returns a dialect object (class sql_dialect) or
the connection itself for backward compatibility.
new_sql_dialect() returns a dialect object with class
c("sql_dialect_{name}", "sql_dialect").
For backward compatibility, all sql_ generics (and a handful of others)
call sql_dialect() on the con argument in order to dispatch further on
the dialect object, if possible:
sql_generic <- function(con, arg1, arg2, ...) {
UseMethod("sql_generic", sql_dialect(con))
}
Unfortunately, due to the way that UseMethod() works, this uses
sql_dialect(con) to control which method is selected, but still passes
the original con to the method. This means that if you are implementing
a method for a dialect and need to access dialect properties, you must call
sql_dialect(con) again inside the method.
# Create a custom dialect my_dialect <- new_sql_dialect( "custom", quote_identifier = function(x) sql_quote(x, "`"), has_window_clause = TRUE ) class(my_dialect)# Create a custom dialect my_dialect <- new_sql_dialect( "custom", quote_identifier = function(x) sql_quote(x, "`"), has_window_clause = TRUE ) class(my_dialect)
sql_glue() and sql_glue2() are designed to help dbplyr extenders
generate custom SQL. They differ only in whether or not they require
a connection. sql_glue() retrieves the ambient connection, making it
suitable for use inside sql_translation() methods; sql_glue2() requires
a connection, making it suitable for use inside all other sql_ methods.
As the name suggests, these functions use glue syntax to make it
easy to mix fixed SQL with varying user inputs. The default glue syntax, {x},
will escape x using the database connection. If there are multiple values
in x, they'll be collapsed into a single string with ,. If you want them
to be wrapped in (), use a * suffix, e.g. {x*}.
You can also use type markers to control how the value is treated:
{.sql x}: x is literal SQL that should be interpolated as
is, without additional escaping. x must be a string.
{.tbl x}: x is a table identifier like a string, I(), or one of
the older forms like DBI::Id() or in_schema().
{.id x}: x is a generic identifier, e.g. for a column or index.
x must be a character vector
sql_glue(sql, envir = parent.frame()) sql_glue2(con, sql, envir = parent.frame())sql_glue(sql, envir = parent.frame()) sql_glue2(con, sql, envir = parent.frame())
sql |
A string to interpolate. |
envir |
Environment to evaluate |
con |
A sql_dialect object or database connection. Connections are supported for backward compatibility. |
An SQL string.
con <- dialect_ansi() tbl <- "my_table" sql_glue2(con, "SELECT * FROM {.tbl tbl}") # Values are properly escaped name <- "Robert'); DROP TABLE students;--" sql_glue2(con, "INSERT INTO students (name) VALUES ({name})") # Control wrapping with * x <- c("name", "age", "grade") sql_glue2(con, "SELECT {.id x} FROM students") sql_glue2(con, "SELECT * WHERE variable IN {x*}")con <- dialect_ansi() tbl <- "my_table" sql_glue2(con, "SELECT * FROM {.tbl tbl}") # Values are properly escaped name <- "Robert'); DROP TABLE students;--" sql_glue2(con, "INSERT INTO students (name) VALUES ({name})") # Control wrapping with * x <- c("name", "age", "grade") sql_glue2(con, "SELECT {.id x} FROM students") sql_glue2(con, "SELECT * WHERE variable IN {x*}")
Customise SQL generation with these three options.
sql_options(cte = FALSE, use_star = TRUE, qualify_all_columns = FALSE)sql_options(cte = FALSE, use_star = TRUE, qualify_all_columns = FALSE)
cte |
If |
use_star |
If |
qualify_all_columns |
If |
A <dbplyr_sql_options> object.
library(dplyr, warn.conflicts = FALSE) lf1 <- lazy_frame(key = 1, a = 1, b = 2) lf2 <- lazy_frame(key = 1, a = 1, c = 3) result <- left_join(lf1, lf2, by = "key") |> filter(c >= 3) show_query(result) sql_options <- sql_options(cte = TRUE, qualify_all_columns = TRUE) show_query(result, sql_options = sql_options)library(dplyr, warn.conflicts = FALSE) lf1 <- lazy_frame(key = 1, a = 1, b = 2) lf2 <- lazy_frame(key = 1, a = 1, c = 3) result <- left_join(lf1, lf2, by = "key") |> filter(c >= 3) show_query(result) sql_options <- sql_options(cte = TRUE, qualify_all_columns = TRUE) show_query(result, sql_options = sql_options)
These functions generate the SQL used in rows_*(in_place = TRUE).
sql_query_insert( con, table, from, insert_cols, by, ..., conflict = c("error", "ignore"), returning_cols = NULL, method = NULL ) sql_query_append(con, table, from, insert_cols, ..., returning_cols = NULL) sql_query_update_from( con, table, from, by, update_values, ..., returning_cols = NULL ) sql_query_upsert( con, table, from, by, update_cols, ..., returning_cols = NULL, method = NULL ) sql_query_delete(con, table, from, by, ..., returning_cols = NULL)sql_query_insert( con, table, from, insert_cols, by, ..., conflict = c("error", "ignore"), returning_cols = NULL, method = NULL ) sql_query_append(con, table, from, insert_cols, ..., returning_cols = NULL) sql_query_update_from( con, table, from, by, update_values, ..., returning_cols = NULL ) sql_query_upsert( con, table, from, by, update_cols, ..., returning_cols = NULL, method = NULL ) sql_query_delete(con, table, from, by, ..., returning_cols = NULL)
con |
A sql_dialect object or database connection. Connections are supported for backward compatibility. |
table |
Table to update. Must be a table identifier.
Use a string to refer to tables in the current schema/catalog or
|
from |
Table or query that contains the new data. Either a table identifier or SQL. |
insert_cols |
Names of columns to insert. |
by |
An unnamed character vector giving the key columns. The key columns
must exist in both By default, we use the first column in |
... |
Other parameters passed onto methods. |
conflict |
For One of:
|
returning_cols |
Optional. Names of columns to return. |
method |
Optional. The method to use. |
update_values |
A named SQL vector that specify how to update the columns. |
update_cols |
Names of columns to update. |
Insert Methods
"where_not_exists"The default for most databases.
INSERT INTO x_name SELECT * FROM y WHERE NOT EXISTS <match on by columns>
"on_conflict"Supported by:
Postgres
SQLite
This method uses the ON CONFLICT clause and therefore requires a unique
index on the columns specified in by.
Upsert Methods
"merge"The upsert method according to the SQL standard. It uses the MERGE statement
MERGE INTO x_name USING y ON <match on by columns> WHEN MATCHED THEN UPDATE SET ... WHEN NOT MATCHED THEN INSERT ...
"on_conflict"Supported by:
Postgres
SQLite
This method uses the ON CONFLICT clause and therefore requires a unique
index on the columns specified in by.
"cte_update"Supported by:
Postgres
SQLite
Oracle
The classical way to upsert in Postgres and SQLite before support for
ON CONFLICT was added. The update is done in a CTE clause and the unmatched
values are then inserted outside of the CTE.
A SQL query.
sql_query_upsert( con = dialect_postgres(), table = "airlines", from = "df", by = "carrier", update_cols = "name" )sql_query_upsert( con = dialect_postgres(), table = "airlines", from = "df", by = "carrier", update_cols = "name" )
These functions help you create custom aggregate SQL translations when
implementing a new backend. They are typically used within sql_translator()
to define how R aggregate functions should be translated to SQL.
sql_aggregate() creates a SQL aggregate function translator for functions
with a single argument and an optional na.rm parameter (e.g., SUM(),
AVG()).
sql_aggregate_2() creates a SQL aggregate function translator for
functions with exactly two arguments (e.g., CORR(), COVAR_SAMP()).
sql_aggregate_n() creates a SQL aggregate function translator for
functions with any number of arguments and an optional na.rm parameter
(e.g., LEAST(), GREATEST()).
sql_not_supported() creates a function that throws an informative error
when a function is not supported in SQL.
sql_check_na_rm() is a helper that you can use in aggregate functions
to direct the user towards setting na.rm = TRUE.
sql_aggregate(f, f_r = f) sql_aggregate_2(f) sql_aggregate_n(f, f_r = f) sql_check_na_rm(na.rm) sql_not_supported(f)sql_aggregate(f, f_r = f) sql_aggregate_2(f) sql_aggregate_n(f, f_r = f) sql_check_na_rm(na.rm) sql_not_supported(f)
f |
The name of the SQL function as a string. |
f_r |
The name of the R function being translated as a string. |
na.rm |
Logical indicating whether missing values should be removed.
In SQL, missing values are always removed in aggregate functions, so this
function will warn if |
Other SQL translation helpers:
sql_translation_scalar,
sql_translation_string,
sql_translation_window,
sql_variant()
These functions help you create custom scalar SQL translations when
implementing a new backend. They are typically used within sql_translator()
to define how R functions should be translated to SQL.
sql_infix() creates SQL infix operators like +, -, *, /.
sql_prefix() creates SQL prefix functions like ABS(), SQRT().
sql_cast() creates SQL cast expressions like CAST(x AS type).
sql_try_cast() creates SQL try_cast expressions (for safe casting).
sql_log() creates a SQL logarithm function with optional base.
sql_cot() creates a SQL cotangent function (as 1 / TAN(x)).
sql_runif() creates a SQL expression for generating uniform random
numbers.
sql_infix(f, pad = TRUE) sql_prefix(f, n = NULL) sql_cast(type) sql_try_cast(type) sql_log() sql_cot() sql_runif(rand_expr, n = n(), min = 0, max = 1)sql_infix(f, pad = TRUE) sql_prefix(f, n = NULL) sql_cast(type) sql_try_cast(type) sql_log() sql_cot() sql_runif(rand_expr, n = n(), min = 0, max = 1)
f |
The name of the SQL function as a string. |
pad |
If |
n |
For |
type |
SQL type name as a string. |
rand_expr |
An string giving an SQL expression that generates a
random number between 0 and 1, e.g. |
min, max
|
Range of random values. |
Other SQL translation helpers:
sql_translation_agg,
sql_translation_string,
sql_translation_window,
sql_variant()
These functions help you create custom string SQL translations when
implementing a new backend. They are typically used within sql_translator()
to define how R string functions should be translated to SQL.
sql_substr() creates a SQL substring function translator that converts
R's substr(x, start, stop) to SQL's SUBSTR(x, start, length).
sql_str_sub() creates a SQL substring function translator that handles
stringr's str_sub() with support for negative indices.
sql_paste() creates a SQL paste function using CONCAT_WS() or similar.
sql_paste_infix() creates a SQL paste function using an infix operator
like ||.
sql_substr(f = "SUBSTR") sql_str_sub(subset_f = "SUBSTR", length_f = "LENGTH", optional_length = TRUE) sql_paste(default_sep, f = "CONCAT_WS") sql_paste_infix(default_sep, op, cast = sql_cast("text"))sql_substr(f = "SUBSTR") sql_str_sub(subset_f = "SUBSTR", length_f = "LENGTH", optional_length = TRUE) sql_paste(default_sep, f = "CONCAT_WS") sql_paste_infix(default_sep, op, cast = sql_cast("text"))
f |
The name of the SQL function as a string. |
subset_f |
The name of the SQL substring function. |
length_f |
The name of the SQL string length function. |
optional_length |
Whether the length argument is optional in the SQL substring function. |
default_sep |
The default separator for paste operations. |
op |
The SQL operator to use for infix paste operations. |
cast |
A function to cast values to strings. |
Other SQL translation helpers:
sql_translation_agg,
sql_translation_scalar,
sql_translation_window,
sql_variant()
These functions help you create custom window SQL translations when
implementing a new backend. They are typically used within sql_translator()
to define how R window functions should be translated to SQL.
win_over() makes it easy to generate the window function specification.
win_absent(), win_rank(), win_aggregate(), and win_cumulative()
provide helpers for constructing common types of window functions.
win_current_group() and win_current_order() allow you to access
the grouping and order context set up by group_by() and arrange().
win_over( expr, partition = NULL, order = NULL, frame = NULL, con = sql_current_con() ) win_rank(f, empty_order = FALSE) win_aggregate(f) win_aggregate_2(f) win_cumulative(f) win_absent(f) win_current_group() win_current_order() win_current_frame()win_over( expr, partition = NULL, order = NULL, frame = NULL, con = sql_current_con() ) win_rank(f, empty_order = FALSE) win_aggregate(f) win_aggregate_2(f) win_cumulative(f) win_absent(f) win_current_group() win_current_order() win_current_frame()
expr |
The window expression. |
partition |
Variables to partition over. |
order |
Variables to order by. |
frame |
A numeric vector of length two defining the frame. |
con |
A sql_dialect object or database connection. Connections are supported for backward compatibility. |
f |
The name of an SQL function as a string. |
empty_order |
A logical value indicating whether to order by NULL if
|
Other SQL translation helpers:
sql_translation_agg,
sql_translation_scalar,
sql_translation_string,
sql_variant()
con <- dialect_ansi() win_over(sql("avg(x)"), con = con) win_over(sql("avg(x)"), "y", con = con) win_over(sql("avg(x)"), order = "y", con = con) win_over(sql("avg(x)"), order = c("x", "y"), con = con) win_over(sql("avg(x)"), frame = c(-Inf, 0), order = "y", con = con)con <- dialect_ansi() win_over(sql("avg(x)"), con = con) win_over(sql("avg(x)"), "y", con = con) win_over(sql("avg(x)"), order = "y", con = con) win_over(sql("avg(x)"), order = c("x", "y"), con = con) win_over(sql("avg(x)"), frame = c(-Inf, 0), order = "y", con = con)
sql_variant() creates a SQL variant, a list of translators for scalar,
aggregate, and window functions. sql_translator() creates a translator,
an environment containing R to SQL translations. When creating a backend,
you'll use these functions to customize how R functions are converted to
SQL.
Learn more in vignette("new-backend").
sql_variant( scalar = sql_translator(), aggregate = sql_translator(), window = sql_translator() ) sql_translator(..., .funs = list(), .parent = new.env(parent = emptyenv())) base_scalar base_agg base_win base_no_win base_odbc_scalar base_odbc_agg base_odbc_winsql_variant( scalar = sql_translator(), aggregate = sql_translator(), window = sql_translator() ) sql_translator(..., .funs = list(), .parent = new.env(parent = emptyenv())) base_scalar base_agg base_win base_no_win base_odbc_scalar base_odbc_agg base_odbc_win
scalar, aggregate, window
|
The three families of functions that an SQL variant can supply. |
..., .funs
|
Named functions, used to add custom converters from standard
R functions to SQL functions. Specify individually in |
.parent |
The SQL variant that this variant should inherit from.
Defaults to |
dbplyr provides the following base translators that implement standard SQL semantics:
base_scalar - scalar functions and operators
base_agg - aggregate functions
base_win - window functions
base_no_win - versions of window functions that throw errors
Other SQL translation helpers:
sql_translation_agg,
sql_translation_scalar,
sql_translation_string,
sql_translation_window
# An example of adding some mappings for the statistical functions that # postgresql provides: http://bit.ly/K5EdTn postgres_agg <- sql_translator(.parent = base_agg, cor = sql_aggregate_2("CORR"), cov = sql_aggregate_2("COVAR_SAMP"), sd = sql_aggregate("STDDEV_SAMP", "sd"), var = sql_aggregate("VAR_SAMP", "var") ) # Next we have to simulate a connection that uses this variant con <- new_sql_dialect("test", quote = \(x) sql_quote(x, '"')) sql_translation.sql_dialect_test <- function(x) { sql_variant( base_scalar, postgres_agg, base_no_win ) } translate_sql(cor(x, y), con = con, window = FALSE) translate_sql(sd(income / years, na.rm = TRUE), con = con, window = FALSE)# An example of adding some mappings for the statistical functions that # postgresql provides: http://bit.ly/K5EdTn postgres_agg <- sql_translator(.parent = base_agg, cor = sql_aggregate_2("CORR"), cov = sql_aggregate_2("COVAR_SAMP"), sd = sql_aggregate("STDDEV_SAMP", "sd"), var = sql_aggregate("VAR_SAMP", "var") ) # Next we have to simulate a connection that uses this variant con <- new_sql_dialect("test", quote = \(x) sql_quote(x, '"')) sql_translation.sql_dialect_test <- function(x) { sql_variant( base_scalar, postgres_agg, base_no_win ) } translate_sql(cor(x, y), con = con, window = FALSE) translate_sql(sd(income / years, na.rm = TRUE), con = con, window = FALSE)
This is a method for the dplyr summarise() generic. It generates the
SELECT clause of the SQL query, and generally needs to be combined with
group_by().
## S3 method for class 'tbl_lazy' summarise(.data, ..., .by = NULL, .groups = NULL)## S3 method for class 'tbl_lazy' summarise(.data, ..., .by = NULL, .groups = NULL)
.data |
A lazy data frame backed by a database query. |
... |
< |
.by |
< |
.groups |
When In addition, a message informs you of that choice, unless the result is ungrouped,
the option "dplyr.summarise.inform" is set to |
Another tbl_lazy. Use show_query() to see the generated
query, and use collect() to execute the query
and return data to R.
library(dplyr, warn.conflicts = FALSE) db <- memdb_frame(g = c(1, 1, 1, 2, 2), x = c(4, 3, 6, 9, 2)) db |> summarise(n()) |> show_query() db |> group_by(g) |> summarise(n()) |> show_query()library(dplyr, warn.conflicts = FALSE) db <- memdb_frame(g = c(1, 1, 1, 2, 2), x = c(4, 3, 6, 9, 2)) db |> summarise(n()) |> show_query() db |> group_by(g) |> summarise(n()) |> show_query()
Use tbl() to create a SQL query backed by a database. Manipulating this
object with dplyr verbs then builds up a SQL query that will only be executed
when you explicitly ask for it, either by printing the object, calling
collect() to bring the data back to R or calling compute() to create a
new table in the database. You can see the query without executing it with
show_query().
Learn more in vignette("dbplyr").
## S3 method for class 'src_dbi' tbl(src, from, vars = NULL, ...)## S3 method for class 'src_dbi' tbl(src, from, vars = NULL, ...)
src |
A |
from |
Either a table identifier or a literal Use a string to identify a table in the current schema/catalog or
|
vars |
Optionally, provide a character vector of column names. If
not supplied, will be retrieved from the database by running a simple
query. This argument is mainly useful for better performance when creating
many |
... |
Passed on to |
library(dplyr) # Connect to a temporary in-memory SQLite database and add some data con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") copy_to(con, mtcars) # To retrieve a single table from a source, use `tbl()` mtcars_db <- con |> tbl("mtcars") mtcars_db # Use `I()` for qualified table names con |> tbl(I("temp.mtcars")) |> head(1) # You can also pass raw SQL if you want a more sophisticated query con |> tbl(sql("SELECT * FROM mtcars WHERE cyl = 8")) |> head(1) # But in most cases, you'll rely on dbplyr to construct the SQL: mtcars_db |> filter(vs == 1) |> summarise(mpg = mean(mpg, na.rm = TRUE), .by = cyl) |> show_query()library(dplyr) # Connect to a temporary in-memory SQLite database and add some data con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") copy_to(con, mtcars) # To retrieve a single table from a source, use `tbl()` mtcars_db <- con |> tbl("mtcars") mtcars_db # Use `I()` for qualified table names con |> tbl(I("temp.mtcars")) |> head(1) # You can also pass raw SQL if you want a more sophisticated query con |> tbl(sql("SELECT * FROM mtcars WHERE cyl = 8")) |> head(1) # But in most cases, you'll rely on dbplyr to construct the SQL: mtcars_db |> filter(vs == 1) |> summarise(mpg = mean(mpg, na.rm = TRUE), .by = cyl) |> show_query()
dbplyr translates commonly used base functions including logical
(!, &, |), arithmetic (^), and comparison (!=) operators, as well
as common summary (mean(), var()), and transformation (log())
functions. All other functions will be preserved as is. R's infix functions
(e.g. %like%) will be converted to their SQL equivalents (e.g. LIKE).
Learn more in vignette("translation-function").
translate_sql( ..., con, vars_group = NULL, vars_order = NULL, vars_frame = NULL, window = TRUE ) translate_sql_( dots, con, vars_group = NULL, vars_order = NULL, vars_frame = NULL, window = TRUE, context = list() )translate_sql( ..., con, vars_group = NULL, vars_order = NULL, vars_frame = NULL, window = TRUE ) translate_sql_( dots, con, vars_group = NULL, vars_order = NULL, vars_frame = NULL, window = TRUE, context = list() )
..., dots
|
Expressions to translate. |
con |
A sql_dialect object or database connection. Connections are supported for backward compatibility. |
vars_group, vars_order, vars_frame
|
Parameters used in the |
window |
Use |
context |
Use to carry information for special translation cases. For example, MS SQL needs a different conversion for is.na() in WHERE vs. SELECT clauses. Expects a list. |
con <- dialect_ansi() # Regular maths is translated in a very straightforward way translate_sql(x + 1, con = con) translate_sql(sin(x) + tan(y), con = con) # Note that all variable names are escaped translate_sql(like == "x", con = con) # In ANSI SQL: "" quotes variable _names_, '' quotes strings # Logical operators are converted to their sql equivalents translate_sql(x < 5 & !(y >= 5), con = con) # xor() doesn't have a direct SQL equivalent translate_sql(xor(x, y), con = con) # If is translated into case when translate_sql(if (x > 5) "big" else "small", con = con) # Infix functions are passed onto SQL with % removed translate_sql(first %like% "Had%", con = con) translate_sql(first %is% NA, con = con) translate_sql(first %in% c("John", "Roger", "Robert"), con = con) # And be careful if you really want integers translate_sql(x == 1, con = con) translate_sql(x == 1L, con = con) # If you have an already quoted object, use translate_sql_: x <- quote(y + 1 / sin(t)) translate_sql_(list(x), con = dialect_ansi()) # Windowed translation -------------------------------------------- # Known window functions automatically get OVER() translate_sql(mpg > mean(mpg), con = con) # Suppress this with window = FALSE translate_sql(mpg > mean(mpg), window = FALSE, con = con) # vars_group controls partition: translate_sql(mpg > mean(mpg), vars_group = "cyl", con = con) # and vars_order controls ordering for those functions that need it translate_sql(cumsum(mpg), con = con) translate_sql(cumsum(mpg), vars_order = "mpg", con = con)con <- dialect_ansi() # Regular maths is translated in a very straightforward way translate_sql(x + 1, con = con) translate_sql(sin(x) + tan(y), con = con) # Note that all variable names are escaped translate_sql(like == "x", con = con) # In ANSI SQL: "" quotes variable _names_, '' quotes strings # Logical operators are converted to their sql equivalents translate_sql(x < 5 & !(y >= 5), con = con) # xor() doesn't have a direct SQL equivalent translate_sql(xor(x, y), con = con) # If is translated into case when translate_sql(if (x > 5) "big" else "small", con = con) # Infix functions are passed onto SQL with % removed translate_sql(first %like% "Had%", con = con) translate_sql(first %is% NA, con = con) translate_sql(first %in% c("John", "Roger", "Robert"), con = con) # And be careful if you really want integers translate_sql(x == 1, con = con) translate_sql(x == 1L, con = con) # If you have an already quoted object, use translate_sql_: x <- quote(y + 1 / sin(t)) translate_sql_(list(x), con = dialect_ansi()) # Windowed translation -------------------------------------------- # Known window functions automatically get OVER() translate_sql(mpg > mean(mpg), con = con) # Suppress this with window = FALSE translate_sql(mpg > mean(mpg), window = FALSE, con = con) # vars_group controls partition: translate_sql(mpg > mean(mpg), vars_group = "cyl", con = con) # and vars_order controls ordering for those functions that need it translate_sql(cumsum(mpg), con = con) translate_sql(cumsum(mpg), vars_order = "mpg", con = con)
These allow you to override the PARTITION BY and ORDER BY clauses
of window functions generated by grouped mutates.
window_order(.data, ...) window_frame(.data, from = -Inf, to = Inf)window_order(.data, ...) window_frame(.data, from = -Inf, to = Inf)
.data |
A lazy data frame backed by a database query. |
... |
Variables to order by |
from, to
|
Bounds of the frame. |
library(dplyr, warn.conflicts = FALSE) db <- memdb_frame(g = rep(1:2, each = 5), y = runif(10), z = 1:10) db |> window_order(y) |> mutate(z = cumsum(y)) |> show_query() db |> group_by(g) |> window_frame(-3, 0) |> window_order(z) |> mutate(z = sum(y)) |> show_query()library(dplyr, warn.conflicts = FALSE) db <- memdb_frame(g = rep(1:2, each = 5), y = runif(10), z = 1:10) db |> window_order(y) |> mutate(z = cumsum(y)) |> show_query() db |> group_by(g) |> window_frame(-3, 0) |> window_order(z) |> mutate(z = sum(y)) |> show_query()
with_dialect() overrides the default dialect assigned to a connection.
This is useful when dbplyr guesses the dialect incorrectly, which is most
likely to occur with ODBC/JDBC/ADBC backends.
with_dialect(con, dialect)with_dialect(con, dialect)
con |
A database connection (class |
dialect |
A dialect object created by a |
A connection object that uses the specified dialect for SQL generation.
# Wrap an in-memory SQLite connection to use Postgres dialect con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") pg_con <- with_dialect(con, dialect_postgres()) # SQL generation uses Postgres syntax lf <- lazy_frame(x = 1, con = pg_con) lf |> dplyr::mutate(y = sd(x)) DBI::dbDisconnect(con)# Wrap an in-memory SQLite connection to use Postgres dialect con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") pg_con <- with_dialect(con, dialect_postgres()) # SQL generation uses Postgres syntax lf <- lazy_frame(x = 1, con = pg_con) lf |> dplyr::mutate(y = sd(x)) DBI::dbDisconnect(con)