unable connect to Redshift using adbc_driver_postgresql #1563

j0bekt01 · 2024-02-23T17:33:17Z

What happened?

I was attempting to create a connection to Redshift using adbc_driver_postgresql but I am getting this error: IO: [libpq] Failed to build type mapping table: ERROR: column "typarray" does not exist in pg_type.
I am aware that Redshift is based on an earlier version of Postgres. Is it currently possible to connect to Redshift? if not, are there plans to do so?

Environment/Setup
4.14.336-253.554.amzn2.x86_64
Python 3.9.18
adbc-driver-manager 0.10.0
adbc-driver-postgresql 0.10.0
pyarrow 14.0.2

lidavidm · 2024-02-24T16:43:42Z

Hmm, it looks like Redshift uses the wire protocol but isn't quite compatible, so we'd have to handle it separately. Even then, maintaining CI would be a challenge, plus it appears it's based on PostgreSQL 8? We've had issues in the past with older PostgreSQL versions, too.

lidavidm · 2024-02-24T16:44:18Z

Specifically for the error here, we are querying the database metadata at startup to figure out which column types map to which Arrow types, but Redshift appears to not provide one of the required columns.

Sairam90 · 2024-10-03T19:51:22Z

any updates on this issue ?

paleolimbot · 2024-10-03T21:32:17Z

We can probably solve the specific issue of not having the column typarray. I don't have a way to test on Redshift but would be happy to look into it given a way to do so!

gtomitsuka · 2024-10-04T13:29:17Z

We're blocked on a transition to ADBC by this; TextQL is happy to provide access to one of our demo data warehouses with their default sample database for a few weeks – it's serverless, so may require a cold start sometimes, but works very well. I assume you won't need more than a few thousand queries used for this.

Who should I talk to about this? Feel free to reach out, [email protected].

I may be able to contribute myself in a month or so if the issue turns out to be much bigger than expected. This guide about the differences is pretty decent.

paleolimbot · 2024-10-04T14:34:07Z

I'll send you a note...I happen to be working on the Postgres driver right now and I'm happy to kick the tires a little to see if I can get it to work. To claim Redshift support long term we would need a set of more permanent credentials that we can use in CI, but there are a few companies that are doing this kind of testing anyway that might be willing to share with us if we can demonstrate a POC 🙂 .

gtomitsuka · 2024-10-04T14:48:50Z

My CTO is currently on honeymoon; I might be able to get long-term credentials for you too but would definitely need his approval first. Will keep you updated!

Responded to your email with credentials for you, might be able to get more for more people if needed.

j0bekt01 · 2024-10-04T16:00:20Z

Supporting Redshift will significantly boost the exposure and usage of this driver. In my experience working with databases, particularly Redshift, the primary bottleneck is extracting the data fast, and this support will greatly alleviate that issue. ConnectorX has gained popularity as the default engine for Polars, but it only supports reading operations. For writing, one must use SQLAlchemy, which is less performant. Alternatively, you can use the Redshift COPY command, which involves dumping data to S3 via Arrow and then using the Redshift Data API to submit a COPY command. The ability to both read and write with adbc_driver would be a game-changer.
I am happy to help contribute in any way I can.

WillAyd · 2024-10-04T16:05:04Z

Alternatively, you can use the Redshift COPY command, which involves dumping data to S3 via Arrow and then using the Redshift Data API to submit a COPY command.

I believe you can also just use the AWS SDK for pandas (previously known as awswrangler) which does this automatically for you. That may be the best of your intermediate options, although yea having ADBC support in the near term is probably best

j0bekt01 · 2024-10-04T16:15:30Z

Alternatively, you can use the Redshift COPY command, which involves dumping data to S3 via Arrow and then using the Redshift Data API to submit a COPY command.

I believe you can also just use the AWS SDK for pandas (previously known as awswrangler) which does this automatically for you. That may be the best of your intermediate options, although yea having ADBC support in the near term is probably best

Yes, you are correct, awswrangler handles exactly this internally. However, I prefer using Polars over Pandas, and I know conversion between the two is possible. I just believe that supporting a premier data warehouse (Redshift) would solidify this as a strong alternative to JDBC/ODBC drivers.

WillAyd · 2024-10-04T16:36:59Z

Does the Apache project get any promotional credits from cloud providers? While having someone donate access is nice, it might be best safest if we had a dedicated redshift to connect to over OIDC (ditto for Google cloud and the Bigquerry driver)

This article is pretty old but I wonder if it still applies:

https://aws.amazon.com/blogs/opensource/aws-promotional-credits-open-source-projects/

lidavidm · 2024-10-07T00:24:35Z

I do not believe we get any credits, unfortunately.

…dshift (#2219) Just following up on #1563 to see if the missing `typarray` column is the only issue. To get the details right might be a large project, but we might be able to support a basic connection without too much effort. Paramter binding and non-COPY result fetching seem to work...the default query fetch method (COPY) is not supported, `connection_get_info()` fails, and at a glance, `connection_get_objects()` might be returning incorrect results (and fails at the column depth). ``` r library(adbcdrivermanager) db <- adbc_database_init( adbcpostgresql::adbcpostgresql(), uri = Sys.getenv("ADBC_REDSHIFT_TEST_URI"), adbc.postgresql.load_array_types = FALSE ) con <- db |> adbc_connection_init() stmt <- con |> adbc_statement_init(adbc.postgresql.use_copy = FALSE) stream <- nanoarrow::nanoarrow_allocate_array_stream() stmt |> adbc_statement_bind(data.frame(45)) |> adbc_statement_set_sql_query("SELECT 1 + $1 as foofy, 'string' as foofy_string") |> adbc_statement_execute_query(stream) #> [1] -1 tibble::as_tibble(stream) #> # A tibble: 1 × 2 #> foofy foofy_string #> <dbl> <chr> #> 1 46 string ``` <sup>Created on 2024-10-04 with [reprex v2.1.1](https://reprex.tidyverse.org)</sup> --------- Co-authored-by: William Ayd <[email protected]>

j0bekt01 added the Type: bug Something isn't working label Feb 23, 2024

paleolimbot mentioned this issue Oct 5, 2024

feat(c/driver/postgresql): Enable basic connect/query workflow for Redshift #2219

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unable connect to Redshift using adbc_driver_postgresql #1563

unable connect to Redshift using adbc_driver_postgresql #1563

j0bekt01 commented Feb 23, 2024 •

edited

Loading

lidavidm commented Feb 24, 2024

lidavidm commented Feb 24, 2024

Sairam90 commented Oct 3, 2024

paleolimbot commented Oct 3, 2024

gtomitsuka commented Oct 4, 2024 •

edited

Loading

paleolimbot commented Oct 4, 2024

gtomitsuka commented Oct 4, 2024 •

edited

Loading

j0bekt01 commented Oct 4, 2024

WillAyd commented Oct 4, 2024

j0bekt01 commented Oct 4, 2024

WillAyd commented Oct 4, 2024

lidavidm commented Oct 7, 2024

unable connect to Redshift using adbc_driver_postgresql #1563

unable connect to Redshift using adbc_driver_postgresql #1563

Comments

j0bekt01 commented Feb 23, 2024 • edited Loading

What happened?

lidavidm commented Feb 24, 2024

lidavidm commented Feb 24, 2024

Sairam90 commented Oct 3, 2024

paleolimbot commented Oct 3, 2024

gtomitsuka commented Oct 4, 2024 • edited Loading

paleolimbot commented Oct 4, 2024

gtomitsuka commented Oct 4, 2024 • edited Loading

j0bekt01 commented Oct 4, 2024

WillAyd commented Oct 4, 2024

j0bekt01 commented Oct 4, 2024

WillAyd commented Oct 4, 2024

lidavidm commented Oct 7, 2024

j0bekt01 commented Feb 23, 2024 •

edited

Loading

gtomitsuka commented Oct 4, 2024 •

edited

Loading

gtomitsuka commented Oct 4, 2024 •

edited

Loading