Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure auto_nbands handler doesn't always terminate jobs #342

Merged
merged 4 commits into from
Jul 11, 2024

Conversation

esoteric-ephemera
Copy link
Contributor

Minor update to the logic of the auto_nbands check for VaspErrorHandler. This check sees if the number of bands has been updated by VASP, and currently it only checks to see if that updated number is very large.

However, there are cases where the user specifies an NBANDS that is incompatible with their parallelization settings, as NBANDS must be divisible by $(\mathrm{ranks}) / (\mathrm{KPAR} \times \mathrm{NCORE})$. In these cases, VASP increases the number of bands to ensure the calculation can still proceed. This can happen in MP's band structure workflows with uniform $k$-point densities.

However, since the current auto_nbands handler applies no corrections to the job, these otherwise successful runs are killed.

This PR adds logic to ensure that the calculation is rerun with a higher number of bands appropriate to the parallelization setting. This is kinda redundant, since VASP already does this. But I think it has to occur this way because VaspErrorHandler is monitoring the job and flags it for an auto_nbands error.

Another implementation concern: it's generally safer to decrease the number of bands since this requires a lower energy cutoff to converge each band. It might be safer to decrease NBANDS as a fix

@JaGeo
Copy link
Member

JaGeo commented Jul 10, 2024

Just a side note: for Lobster, we need a certain number of bands as a minimum. I would be worried that it isn't enough with such a fix.

@esoteric-ephemera
Copy link
Contributor Author

Thanks @JaGeo, do you mean NBANDS should be increased beyond the minimum needed to get the calculation to run? Or just avoiding decreasing NBANDS as a fix?

@JaGeo
Copy link
Member

JaGeo commented Jul 10, 2024

I think the latter as I am not sure you can determine this minimum for each run. There might be requirements for GW runs as well.

@esoteric-ephemera
Copy link
Contributor Author

That's a reasonable point. For a normal band structure run, I think it's best to have a fix like this where the number of bands is just increased enough for the calc to run.

For GW or BSE calculations, hopefully the user is more attuned to the need for higher number of bands.

@shyuep shyuep merged commit 01c994b into materialsproject:master Jul 11, 2024
8 checks passed
@Andrew-S-Rosen
Copy link
Member

However, since the current auto_nbands handler applies no corrections to the job, these otherwise successful runs are killed.

@esoteric-ephemera: Would it be better to issue a warning but not fail the job in the case of seemingly large nbands? Actually, that was my original intention but I suppose that may not be the case here...

@esoteric-ephemera
Copy link
Contributor Author

@Andrew-S-Rosen: not sure, for reasons that @JaGeo and I discussed above, having custodian automatically decrease the number of bands isn't always a good idea. But I'm not opposed to drafting this up

@Andrew-S-Rosen
Copy link
Member

Andrew-S-Rosen commented Jul 15, 2024

@esoteric-ephemera: What I meant was that the auto nbands warning I had added in #324 was not actually meant to kill the job. It was only meant to raise a warning to the user and then proceed as usual (with no changes).

@esoteric-ephemera
Copy link
Contributor Author

Ah ok! I can look into changing that

@Andrew-S-Rosen
Copy link
Member

Andrew-S-Rosen commented Jul 15, 2024

To be fair, that was apparently my bad. 😅 So obviously don't feel obligated to do so. But I just wanted to make it known that the issue you ran into was in part because of a mistake from me...

@utf
Copy link
Member

utf commented Oct 14, 2024

Is it possible to get a new release with this fix included?

@shyuep
Copy link
Member

shyuep commented Oct 14, 2024

Done.

@Andrew-S-Rosen
Copy link
Member

@shyuep: Something seems up here with the new release: from custodian import vasp no longer works.

@esoteric-ephemera
Copy link
Contributor Author

esoteric-ephemera commented Oct 15, 2024

@Andrew-S-Rosen @shyuep : looking in the 2024.10.14 custodian dir, I only have these files:
__init__.py
__pycache__
custodian.py
py.typed
utils.py

@esoteric-ephemera
Copy link
Contributor Author

Is it just that the pyproject.toml should have:
include = ["custodian*", "custodian.*"]
in the tool.setuptools.packages.find block? Right now, it just has custodian

@shyuep
Copy link
Member

shyuep commented Oct 15, 2024

The new release has fixed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants