-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to convert "enum" types to factor or character? #455
Comments
Thanks. Reprex: library(RPostgres)
con <- postgresDefault()
dbExecute(con, "CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy')")
#> [1] 0
dbExecute(con, "CREATE TABLE x (a mood)")
#> [1] 0
dbExecute(con, "INSERT INTO x VALUES ('sad')")
#> [1] 1
dbExecute(con, "INSERT INTO x VALUES ('ok')")
#> [1] 1
out <- dbGetQuery(con, "SELECT * FROM x")
dput(out)
#> structure(list(a = structure(c("sad", "ok"), class = "pq_NA")), row.names = c(NA,
#> -2L), class = "data.frame")
out <- dbGetQuery(con, "SELECT CAST(a AS text) FROM x")
dput(out)
#> structure(list(a = c("sad", "ok")), class = "data.frame", row.names = c(NA,
#> -2L))
dbExecute(con, "DROP TABLE IF EXISTS x")
#> [1] 0
dbExecute(con, "DROP TYPE IF EXISTS mood")
#> [1] 0 Created on 2024-01-22 with reprex v2.0.2 Are you requesting the removal of the Would you like to share the use case behind this request? |
She uses dbplyr, so she has little control over the underlying query and is eventually just calling It's mostly a problem when you start using it like a character vector with other parts of the tidyverse, like: x <- structure(c("sad", "ok"), class = "pq_NA")
vctrs::vec_c(x, "stuff")
#> Error in `vctrs::vec_c()`:
#> ! Can't combine `..1` <pq_NA> and `..2` <character>. It seems reasonable to me that we should try and map as many Postgres types to native R types as we can, and mapping an enum to a character or factor seemed pretty natural
My argument is that this extra class serves no purpose for her (and most users, I'd imagine), and it would be better off as a bare character vector or a factor |
That's not wrong. For now, a possible workaround might be something along the lines of collect <- function(x, ...) {
out <- dplyr::collect(x, ...)
out[] <- map(out, ~ if (is.character(.x)) unclass(.x) else .x)
out
} Would it help if all enum classes had a common superclass so that we could define casting/coercion operators for vctrs? Adding a special |
Is there a reason enums don't come through as factors? That feels like a 1:1 match to me |
No particular reason, but I'm not sure if the levels are available as part of the result. |
It seems like you look the labels up in an overarching |
It's interesting that library(RPostgres)
con <- postgresDefault()
dbExecute(con, "CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy')")
#> [1] 0
dbDisconnect(con)
con <- postgresDefault()
dbExecute(con, "CREATE TABLE x (a mood)")
#> [1] 0
dbExecute(con, "INSERT INTO x VALUES ('sad')")
#> [1] 1
dbExecute(con, "INSERT INTO x VALUES ('ok')")
#> [1] 1
out <- dbGetQuery(con, "SELECT * FROM x")
dput(out)
#> structure(list(a = structure(c("sad", "ok"), class = "pq_mood")), row.names = c(NA,
#> -2L), class = "data.frame")
out <- dbGetQuery(con, "SELECT CAST(a AS text) FROM x")
dput(out)
#> structure(list(a = c("sad", "ok")), class = "data.frame", row.names = c(NA,
#> -2L))
dbExecute(con, "DROP TABLE IF EXISTS x")
#> [1] 0
dbExecute(con, "DROP TYPE IF EXISTS mood")
#> [1] 0
dbDisconnect(con) Created on 2024-01-23 with reprex v2.0.2 And we already fetch a If the enums were factors, we would get cast and coercion behavior for free. @paleolimbot: How do you (intend to) handle enum types in ADBC for Postgres? |
It looks like they're not handled yet! library(adbcdrivermanager)
db <- adbc_database_init(
adbcpostgresql::adbcpostgresql(),
uri = "postgresql://localhost:5432/postgres?user=postgres&password=password")
db |>
execute_adbc("CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy')")
con <- db |> adbc_connection_init()
con |>
execute_adbc("CREATE TABLE x (a mood)") |>
execute_adbc("INSERT INTO x VALUES ('sad')") |>
execute_adbc("INSERT INTO x VALUES ('ok')")
con |>
read_adbc("SELECT * from x")
#> Error in adbc_statement_execute_query(stmt, stream, stream_join_parent = TRUE): [libpq] Column #1 ("a") has unknown type code 16385 Created on 2024-01-23 with reprex v2.0.2 |
Just kidding, I forgot about the part where we cache the types in the database. It's returned as binary but the COPY output is character, so that's how the driver will return it: library(adbcdrivermanager)
db <- adbc_database_init(
adbcpostgresql::adbcpostgresql(),
uri = "postgresql://localhost:5432/postgres?user=postgres&password=password")
db |>
execute_adbc("CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy')")
db |>
adbc_database_release()
# Have to recreate the database after because that's where we cache the types
con <- db <- adbc_database_init(
adbcpostgresql::adbcpostgresql(),
uri = "postgresql://localhost:5432/postgres?user=postgres&password=password"
) |>
adbc_connection_init()
con |>
execute_adbc("CREATE TABLE x (a mood)") |>
execute_adbc("INSERT INTO x VALUES ('sad')") |>
execute_adbc("INSERT INTO x VALUES ('ok')")
# Currently returned as binary (i.e., COPY output)
con |>
read_adbc("SELECT * from x") |>
as.data.frame() |>
dplyr::pull(a) |>
unclass() |>
vapply(rawToChar, character(1))
#> [1] "sad" "ok" Created on 2024-01-23 with reprex v2.0.2 |
Sorry for hijacking, but after apache/arrow-adbc#1485 it'll be: library(adbcdrivermanager)
db <- adbc_database_init(
adbcpostgresql::adbcpostgresql(),
uri = "postgresql://localhost:5432/postgres?user=postgres&password=password")
db |>
execute_adbc("CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy')")
db |>
adbc_database_release()
# Have to recreate the database after because that's where we cache the types
con <- db <- adbc_database_init(
adbcpostgresql::adbcpostgresql(),
uri = "postgresql://localhost:5432/postgres?user=postgres&password=password"
) |>
adbc_connection_init()
con |>
execute_adbc("CREATE TABLE x (a mood)") |>
execute_adbc("INSERT INTO x VALUES ('sad')") |>
execute_adbc("INSERT INTO x VALUES ('ok')")
# Returns as string because that's how the driver sees it
con |>
read_adbc("SELECT * from x") |>
as.data.frame()
#> a
#> 1 sad
#> 2 ok
# If you know the R type you can specify it
con |>
read_adbc("SELECT * from x") |>
nanoarrow::convert_array_stream(
to = data.frame(x = factor(levels = c("mood", "sad", "ok")))
) |>
str()
#> 'data.frame': 2 obs. of 1 variable:
#> $ x: Factor w/ 3 levels "mood","sad","ok": 2 3 Created on 2024-01-23 with reprex v2.0.2 |
@paleolimbot engaging with your hijacking (sorry, not sorry 😬): Would a reverse conversion, i.e. character/factor to an existing enum with Would an update cache function be useful (instead of having to (de)connect)? |
My wife has a scenario where she's pulling in a number of columns that end up with a class of
"pq_stuff_enum"
. She doesn't really care that its an enum, and would really like it to come through as a character or possibly a factor.It seems like
typcategory
has a category codeE
for enums:https://www.postgresql.org/docs/current/catalog-pg-type.html
I wonder if we can detect that when pulling the column in? Maybe around here?
RPostgres/src/PqResultImpl.cpp
Line 115 in f15f8f4
I could see a
dbConnect()
argument with something likeenum = c("enum", "character", "factor")
or something similar as a way to control how these are handledThe text was updated successfully, but these errors were encountered: