-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slowness in query submission and response. #421
Comments
Could you share your code? Is the delay occurring at the connect or execute stage? |
def execute_databricks_query(query):
"""
Function to execute a query inside Databricks and returns back the result.
Expects following environmental variables to be defined first,
DATABRICKS_SERVER_HOSTNAME, DATABRICKS_HTTP_PATH, DATABRICKS_TOKEN
"""
with sql.connect(server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
http_path = os.getenv("DATABRICKS_HTTP_PATH"),
access_token = os.getenv("DATABRICKS_TOKEN")) as connection:
with connection.cursor() as cursor:
cursor.execute(query)
# cursor.execute("SELECT * FROM lineage_catalog_qa.admin.meta_control_table")
result = cursor.fetchall()
return result
I'm using the above function. The slowness is happening at connect stage, the execution is fast and then the response back is taking time again. Note: Our environment is a AWS Privatelinked one and network flow is within VPC to Databricks. This issue is from past 1-2 weeks. |
What version of Python are you using? I opened #422 the other day regarding slow connection and found it could be solved by using a different Python version. Either way, if you run the function in debug mode, have you traced to call stack to see where the delay is occurring? |
I was trying with python 3.7 and later moved to python3.8 as I see python3.8 supports the latest sql-connector version 3.3. Moving from python3.7 to python3.8 reduced my query time to half. And, I see first 2 queries out of 5 mostly identical are running quick and then the delay starts. I have tried the debug too, will try to share the debug logs here, if it can helps. |
@jaames-bentley -> here is the debug version of it. I see a few are quick and other calls has a delay of at least 2-3 minutes. I would also give a try with latest version of python 3.10+ to verify the same. |
@shahrukh-shaik thank you for the log. It's actually interesting. I noticed "Starting new HTTPS connection (1): my-org-redacted-dev.cloud.databricks.com:443" lines. They actually come from the |
Not sure, if you can see the above function for query execution, I'm using PS: i have tried python3.10 with both urllib3<2 >=2 versions and haven't seen any difference. |
Yes, exactly |
dbx_connection=sql.connect(server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
http_path = os.getenv("DATABRICKS_HTTP_PATH"),
access_token = os.getenv("DATABRICKS_TOKEN"))
def execute_databricks_query(query,connection=dbx_connection):
"""
Function to execute a query inside Databricks and returns back the result.
Expects following environmental variables to be defined first,
DATABRICKS_SERVER_HOSTNAME, DATABRICKS_HTTP_PATH, DATABRICKS_TOKEN
"""
# commenting above code as there is a issue with slowness https://github.com/databricks/databricks-sql-python/issues/421
# with sql.connect(server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
# http_path = os.getenv("DATABRICKS_HTTP_PATH"),
# access_token = os.getenv("DATABRICKS_TOKEN")) as connection:
with dbx_connection.cursor() as cursor:
cursor.execute(query)
# cursor.execute("SELECT * FROM lineage_catalog_qa.admin.meta_control_table")
result = cursor.fetchall()
return result
I have made the code change to open the connection only once and closing this at the end of the script using if dbx_connection.open:
dbx_connection.close() Although I tried without closing it as I see in the source code, we have
|
While running the databricks query, the query submission and response is taking a lot of time to return back although actual runtime in the query history is within a second. But, the python gets data back after 3-5 minutes.
I'm running a sequence of 5 queries which opens and closes session after each one and it's ultimately delaying up to 15 minutes which in actuality shall complete in 5-10 seconds. The workspace is AWS private linked one.
The text was updated successfully, but these errors were encountered: