duckspending_tbl()
and dplyr get you most of the way for ad-hoc queries. A few
situations want raw SQL:
AT (SNAPSHOT n),
QUALIFY, list/struct expressions) that dbplyr
doesn’t translate.The connection returned by duckspending_connection() is
a regular DBI connection, so DBI::dbGetQuery()
and friends work directly.
library(duckspending)
con <- duckspending_connection()
DBI::dbGetQuery(con, "
SELECT awarding_agency_code,
COUNT(*) AS n,
SUM(federal_action_obligation) AS total
FROM latest.source_procurement_transaction
WHERE fiscal_year = 2024
GROUP BY awarding_agency_code
ORDER BY total DESC
LIMIT 10
")Note the latest. schema prefix. Without it, you’d have
to USE latest.main first — but that changes per-session
state. Qualifying table names in the query is safer.
The interesting raw-SQL use case is comparing snapshots inside the database, without bouncing data through R:
con <- duckspending_connection(snapshots = c("20260206", "20260506"))
DBI::dbGetQuery(con, '
SELECT
feb.awarding_agency_code,
feb.total AS total_feb,
may.total AS total_may,
may.total - feb.total AS delta
FROM (
SELECT awarding_agency_code, SUM(federal_action_obligation) AS total
FROM "20260206".source_procurement_transaction
WHERE fiscal_year = 2024
GROUP BY awarding_agency_code
) AS feb
FULL OUTER JOIN (
SELECT awarding_agency_code, SUM(federal_action_obligation) AS total
FROM "20260506".source_procurement_transaction
WHERE fiscal_year = 2024
GROUP BY awarding_agency_code
) AS may USING (awarding_agency_code)
ORDER BY ABS(delta) DESC NULLS LAST
LIMIT 20
')The schema names are date strings ("20260206"). SQL
identifiers that begin with a digit must be
double-quoted, or DuckDB will parse them as numeric literals. So:
-- WRONG: parses as integer 20260206
SELECT * FROM 20260206.subaward_search LIMIT 5;
-- RIGHT: double-quoted identifier
SELECT * FROM "20260206".subaward_search LIMIT 5;Inside DBI::dbGetQuery() strings, use single quotes
around the SQL and literal double quotes around identifiers. R’s
'...' strings tolerate embedded " without
escaping, which is why the example above uses single-quoted R
strings:
If you’d rather use double-quoted R strings, escape the inner quotes:
For safer identifier interpolation use
DBI::dbQuoteIdentifier():
Always use parameterised queries for user-supplied values, never paste them into the SQL string:
dplyrRaw SQL is power-user mode. For most queries:
duckspending_tbl() + dplyr produces
equivalent SQL and is refactor-safe (no string concatenation).dbplyr::sql() lets you inject a SQL snippet inside an
otherwise-dplyr pipeline when only one expression needs raw
SQL.dplyr::show_query() prints the generated SQL — useful
for sanity checking what dbplyr is producing.duckspending_tbl("subaward_search") |>
filter(dbplyr::sql("list_contains(business_categories, 'small_business')")) |>
count() |>
show_query()Drop into raw SQL when dbplyr can’t express what you
want, when you’re porting a query verbatim from another tool, or when
the SQL is the clearer way to communicate intent.