This vignette discusses why you might use dbplyr instead of writing SQL yourself, and what to do when dbplyr’s built-in translations can’t create the SQL that you need.
One simple nicety of dplyr is that it will automatically generate
subqueries if you want to use a freshly created variable in
mutate()
:
mf %>%
mutate(
a = y * x,
b = a ^ 2,
) %>%
show_query()
#> <SQL>
#> SELECT `q01`.*, POWER(`a`, 2.0) AS `b`
#> FROM (
#> SELECT `dbplyr_yvg9c16PdX`.*, `y` * `x` AS `a`
#> FROM `dbplyr_yvg9c16PdX`
#> ) AS `q01`
In general, it’s much easier to work iteratively in dbplyr. You can
easily give intermediate queries names, and reuse them in multiple
places. Or if you have a common operation that you want to do to many
queries, you can easily wrap it up in a function. It’s also easy to
chain count()
to the end of any query to check the results
are about what you expect.
dbplyr aims to translate the most common R functions to their SQL equivalents, allowing you to ignore the vagaries of the SQL dialect that you’re working with, so you can focus on the data analysis problem at hand. But different backends have different capabilities, and sometimes there are SQL functions that don’t have exact equivalents in R. In those cases, you’ll need to write SQL code directly.
Any function that dbplyr doesn’t know about will be left as is:
mf %>%
mutate(z = foofify(x, y)) %>%
show_query()
#> <SQL>
#> SELECT `dbplyr_yvg9c16PdX`.*, foofify(`x`, `y`) AS `z`
#> FROM `dbplyr_yvg9c16PdX`
mf %>%
filter(x %LIKE% "%foo%") %>%
show_query()
#> <SQL>
#> SELECT `dbplyr_yvg9c16PdX`.*
#> FROM `dbplyr_yvg9c16PdX`
#> WHERE (`x` LIKE '%foo%')
SQL functions tend to have a greater variety of syntax than R. That
means there are a number of expressions that can’t be translated
directly from R code. To insert these in your own queries, you can use
literal SQL inside sql()
:
mf %>%
transmute(factorial = sql("x!")) %>%
show_query()
#> <SQL>
#> SELECT x! AS `factorial`
#> FROM `dbplyr_yvg9c16PdX`
mf %>%
transmute(factorial = sql("CAST(x AS FLOAT)")) %>%
show_query()
#> <SQL>
#> SELECT CAST(x AS FLOAT) AS `factorial`
#> FROM `dbplyr_yvg9c16PdX`
Learn more in vignette("translation-function")
.