Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable number of atoms / changes in PBC #59

Open
PythonFZ opened this issue Jul 30, 2024 · 2 comments
Open

Variable number of atoms / changes in PBC #59

PythonFZ opened this issue Jul 30, 2024 · 2 comments

Comments

@PythonFZ
Copy link

Hey zarrtraj devs.

I'm happy to see that there is some general progress in the community towards the usage of H5MD (and its derivatives).
I've written a similar library for the ASE which I link here https://github.com/zincware/ZnH5MD.

One shortcoming of H5MD is its inability / vague description of how to handle grand-canonical data as well as storing data with changes in the PBC. The latter one is not realistic for MD trajectories but it can be advantageous when storing e.g. training data for machine-learning potentials. In these cases I set the step/time to a fixed number of 1. Maybe 0 or NaN would be better?

I've made two extensions to H5MD to allow for these (https://github.com/zincware/ZnH5MD?tab=readme-ov-file#extended-h5md-format). I introduced a box/pbc group which would use boolean values to define the PBC in each dimension and is priorizied over the H5MD attribute. Further I use padding to allow for varying atom sizes or grand canonical simulations with np.nan in H5Py.

Both solutions, especially the np.nan padding are probably not ideal and maybe we could find a solution for these two shortcomings of H5MD together. I think the community would greatly benefit from a more general usage of an H5 standard.

@hmacdope
Copy link
Collaborator

hmacdope commented Sep 7, 2024

Hey @PythonFZ, sorry it took so long for me to see this, I realised I wasn't watching this repo, whoops!

For grand canonical data, unfortunately this is out of scope for MDAnalysis as we don't support grand canonical data. This is big limitation but not one we can easily address.

For lack of PBC in certain dimensions I think this is already handled in the specification here: https://www.nongnu.org/h5md/h5md.html#simulation-box, in particular the boundary attribute which can be for example in a 3D simulation with PBC in x and y [periodic, periodic, None]

For the grand canonical stuff there is also the following paragraph from the [particles](https://www.nongnu.org/h5md/h5md.html#particles-group) group documentation which I will copy below.

A fill value (see [§ 6.6](http://www.hdfgroup.org/HDF5/doc/UG/11_Datatypes.html#Fvalues) in (“HDF5 User’s guide,” n.d.)) may be defined for id/value upon dataset creation. When the identifier of a particle is equal to this user-defined value, the particle is considered non-existing, the entry serves as a placeholder. This permits the storage of subsystems whose number of particles varies in time. For the case of varying particle number, the dimension denoted by [N] above may be variable.

Which may be what you are looking for? i.e the spec may already support variable particle numbers.

but perhaps we can gets some insight from the legendary @pdebuyl, who wrote the specification.

@pdebuyl
Copy link

pdebuyl commented Sep 20, 2024

Hello @hmacdope indeed the non-periodic BC is considered in H5MD.

Considering grand-canonical systems, it was considered but we have no application using it that I am aware of. Since a few years I am active in remote sensing and could have missed a few things of course :-)

One possibility is to use the ID particle field. This is in the specification. It requires extra storage but should accomodate grand-canonical systems. You might need to add unique ID for new particles if you want to track particles. Or you could use species to store the existence or not of a particle.

Using the fill value can be tedious and you could simply declare particles with ID=-1 or of a specific species as non-existent.

If you don't need to track individual particles, using species is probably more efficient (as you can probably store species in a 8-bit int).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants