-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add submit_taskgraph for a common UX #641
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
This looks like a better version of Can we make a note to deprecate that eventually? Maybe at least add a warning to |
Yes, it's a good idea to converge on one convenience function for running taskgraphs. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gspowley I left a few suggestions for how we could make this more Pythonic (for lack of a better term). Run silently by default and return a tuple, basically.
I like tqdm in command line programs but not in library functions so much, but that's my personal taste. I can see how this function and its visual feedback would be good for Notebook users.
How about run_taskgraph()
since this is much more than graph.compute()
?
Last suggestion: let's export this from tiledb.cloud.taskgraph
or tiledb.cloud.dag
, in a new runner.py or wrapper.py module.
|
||
# Display a link to the log in TileDB, if available | ||
if graph.server_graph_uuid: | ||
print( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's opt in to printing. I don't think a library function should print by default. I can't think of anything in Python's standard lib that prints except print()
.
Like if verbose and graph.server_graph_uuid:
|
||
if wait: | ||
# Display the progress bar in stdout, so the output is not red in a notebook | ||
pbar = tqdm( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, I think a progress bar should be explicitly opt-in and not conditioned on waiting. I can easily imagine wanting to waiting silently. How about if verbose: pbar = ...
?
except TimeoutError: | ||
# Still running | ||
pbar.set_description_str(str(graph.status)) | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused about what graph.wait()
does. Is the status set only when it returns? This is just a question stemming from my ignorance, not a blocker.
pbar.set_description_str(str(graph.status)) | ||
break | ||
|
||
return {"status": str(graph.status), "graph_id": str(graph.server_graph_uuid)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more natural for a Python function to return a tuple, no? More like standard lib functions is what i mean.
This PR adds
tiledb.cloud.utilities.submit_taskgraph
as a proposed common function to launch ingestion (and other) taskgraphs.If
tqdm
is available, a progress timer is displayed.Output while running:
Output when complete: