You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The second part of a set of issues for getting Dagster + SQLMesh Metrics running. This is to get sqlmesh running the python models on trino. If, somehow, this is performant enough without any additional work to run the metrics with the special pre-warmed duckdb as a cache + rolling query runner, then we can still accomplish all the metrics work without as much time spent developing a more complex option.
The text was updated successfully, but these errors were encountered:
Sadly, after having spent quite a few hours on this (there were some setup things to fix and then bugs in the python models). running this with just the python models as is on trino is still not enough. Some observations when running
We can't seem to saturate the requests to trino.
Queries for each rolling day take on the order of seconds 1-10s each. So for a 10 year period this would take ~10 hours.
This is vastly slower than our duckdb pre-warmed cache implementation
Trino has some limitations with query text size. This might cause issues with the dataframe writing.
We can adjust settings here to a point so this is less an immediate problem.
Still running into periodic errors.
I think this is due to the scaling mechanisms. So I may disable those for now until we get prometheus and KEDA setup to be able to scale based on additional dimensions like http requests.
Places we will go from here to explore (still not using duckdb):
We should try to see if we can saturate the requests to trino's workers. It would be nice to see if it's possible make the workers actually queue queries. At least then we will know it's functioning at it's limit.
We should use an external process to handle this test as adding this directly as part of the sqlmesh python model adds some complication. My current thought is to use an "Arrow Flight Protocol" compliant server that will stream table results to the caller. This is an open protocol used by arrow so it's something we could easily use in other places. We would call the service from the python model and stream the rows in and periodically write the results out.
What is it?
The second part of a set of issues for getting Dagster + SQLMesh Metrics running. This is to get sqlmesh running the python models on trino. If, somehow, this is performant enough without any additional work to run the metrics with the special pre-warmed duckdb as a cache + rolling query runner, then we can still accomplish all the metrics work without as much time spent developing a more complex option.
The text was updated successfully, but these errors were encountered: