Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Querying Dremio with the ADBC Flight SQL client #1559

Closed
maxfirman opened this issue Feb 21, 2024 · 15 comments
Closed

[Python] Querying Dremio with the ADBC Flight SQL client #1559

maxfirman opened this issue Feb 21, 2024 · 15 comments
Labels
Type: question Usage question

Comments

@maxfirman
Copy link

What would you like help with?

I'm attempting to query Dremio using the ADBC Flight Sql client, however my code hangs when attempting to return a result set.

The following code snippet will reproduce the issue:

from adbc_driver_flightsql.dbapi import connect


with connect(
    uri="grpc://localhost:32010",
    db_kwargs={"username": "dremio", "password": "dremio123"},
) as connection:
    with connection.cursor() as cursor:
        cursor.execute("select 1 as foo")
        result = cursor.fetchall()  # <- hangs indefinitely
        print(result)

I have been testing against a local Dremio instance running in the standalone Docker image:

docker run -p 9047:9047 -p 31010:31010 -p 32010:32010 -p 45678:45678 dremio/dremio-oss

Note you will have to create an initial user through the Dremio UI: http://localhost:9047

Interestingly, I can see that the query is being executed successfully when I look at the jobs page in the Dremio UI. However any attempt to subsequently retrieve the data using fetchall, fetchone, fetch_arrow_table etc... hangs indefinitely.

@maxfirman maxfirman added the Type: question Usage question label Feb 21, 2024
@lidavidm
Copy link
Member

Python 3.11.2 (main, Mar 13 2023, 12:18:29) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import adbc_driver_flightsql.dbapi
>>> conn = adbc_driver_flightsql.dbapi.connect("grpc://localhost:32010", db_kwargs={"username": "dremio", "password": "dremio123"})
/home/lidavidm/temp/venv/lib/python3.11/site-packages/adbc_driver_manager/dbapi.py:307: Warning: Cannot disable autocommit; conn will not be DB-API 2.0 compliant
  warnings.warn(
>>> cur = conn.cursor()
>>> cur.execute("SELECT 1")
>>> cur.fetchall()
[(1,)]

What version of ADBC, Python, PyArrow, Dremio, etc. are you using, and on what platform?

Are you able to get a stack trace? (You can use py-spy)

This is the version of Dremio I havae:

Build
    24.3.2-202401241821100032-d2d8a497
Edition
    Community Edition
Build Time
    01/24/2024 18:34:32
Change Hash
    d2d8a49790d59599d617f25f6020731f0260178d
Change Time
    01/24/2024 18:11:23

@maxfirman
Copy link
Author

Thanks @lidavidm.

Curiously, I've just tested it on my home machine (Fedora) and it returns data as expected. The issue only seems to exist on my work machine (Ubuntu 22).

As far as I can see I have the same versions of adbc and dremio installed. One other difference is that my work machine uses brew as the package manager and my home machine uses dnf.

My instinct is that there is some network policy on my work machine that is blocking the connection, although I am able to authenticate and execute the query (its just returning the results that hangs). I'm also able to query and retrieve results from Dremio on my work machine using the approach documented here: https://github.com/dremio-hub/arrow-flight-client-examples/blob/main/python/example.py#L33

I will work on getting a stack trace using py-spy.

@maxfirman
Copy link
Author

profile
py-spy profile.

@lidavidm
Copy link
Member

Oh, sorry, py-spy has a dump command that should just give you a stack trace (and can be run in a separate shell against an existing process), that would directly show us where it got stuck

@lidavidm
Copy link
Member

--native would also help to show the C extension stack traces (which would hopefully illuminate a bit more)

@maxfirman
Copy link
Author

Thanks. Unfortunately running the dump command requires sudo permissions, which I don't have.

I'm going to have to bother someone in our IT department to grant me temporary sudo permissions so I can get the trace dump.

@lidavidm
Copy link
Member

So, Dremio is telling the client to connect to "0.0.0.0":

>>> partitions, schema = cur.adbc_execute_partitions("SELECT 1")
>>> info = pyarrow.flight.FlightInfo.deserialize(partitions[0])
>>> info.endpoints[0].locations
[<pyarrow.flight.Location b'grpc+tcp://0.0.0.0:32010'>]

Is it possible that one machine routes this to localhost and the other (correctly?) drops this?

@lidavidm
Copy link
Member

It's possible we should catch and handle this explicitly. There's also apache/arrow#40084 which is supposed to explicitly do what I think Dremio is trying to implicitly do.

@stevelorddremio
Copy link

So, Dremio is telling the client to connect to "0.0.0.0":

>>> partitions, schema = cur.adbc_execute_partitions("SELECT 1")
>>> info = pyarrow.flight.FlightInfo.deserialize(partitions[0])
>>> info.endpoints[0].locations
[<pyarrow.flight.Location b'grpc+tcp://0.0.0.0:32010'>]

Is it possible that one machine routes this to localhost and the other (correctly?) drops this?

Yes, the behaviour will be as you described, which will likely be invalid.
This will be fixed post Dremio 24.3.3.

@maxfirman
Copy link
Author

Thanks @lidavidm, that is definitely the issue. I look forward to testing the fix once Dremio 24.3.3 lands.

@maxfirman
Copy link
Author

@stevelorddremio just to clarify by "post Dremio 24.3.3", do you mean to say the fix is included in 24.3.3 or the next release after 24.3.3?

I'm assuming the later, as I can't see any reference to this bug fix in the release notes, and I just tested against our cluster (running 24.3.3 Enterprise) and still see the same issue.

@stevelorddremio
Copy link

@maxfirman correct. It will be the next release(s) after 24.3.3.

@mgross-ebner
Copy link

@zeroshade Looks like the issue we discussed in apache/arrow#40090 will be fixed in Dremio.

@maxfirman
Copy link
Author

I can confirm that the following now works against Dremio 24.3.4:

with connect(
        uri="grpc+tls://<dremio-host>:32010",
        db_kwargs={
            "username": "<username>",
            "password": "<password>",
        },
) as connection:
    with connection.cursor() as cursor:
        cursor.execute("select 1")
        result = cursor.fetchall()
        print(result)

Prints:

[(1,)]

@lidavidm
Copy link
Member

Thanks for the follow-up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: question Usage question
Projects
None yet
Development

No branches or pull requests

4 participants