Raising Environment Error on Single Node Clusters #40
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This addresses #39 - would appreciate some feedback as I'm not sure if this is a great idea or not.
Currently, if you start
client = dask_databricks.client()
on a single node databricks cluster, it'll allow you to, but then hang indefinitely if you try to carry out any tasks (makes sense since there are no workers to do them).My thinking is it would be handy to throw an error if we can since if we're on a single-node cluster we already know that any attempted use of the dask_databricks client is going to end badly.
I wanted to implement this a little like this:
But databricks doesn't appear to have an environment variable like "NUMNODES" - it does have a "MASTER" environment variable, which for single node clusters tells you the core working as the driver (such as
local[4]
) and in distributed clusters gives you the tcp url.I've used that to implement it like this:
My thinking is that this is a pretty conservative check (owing to the fact that databricks could hypothetically change this implementation detail at some point, and I can't find any documentation of it). If there is no
MASTER
variable, nothing will get raised, so the only possible false positive is iflocal[4]
is set asMASTER
in a multi-node cluster which I don't think is possible?It's still a little hackier than I'd of liked though, but does boost the user-friendliness for beginners. Any thoughts?