Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Provider #3

Open
ktschuett opened this issue Oct 1, 2017 · 2 comments
Open

Data Provider #3

ktschuett opened this issue Oct 1, 2017 · 2 comments
Assignees

Comments

@ktschuett
Copy link

I will have a look into the data provider interface and do a reference implementation for ASE DB files including periodic boundary conditions. For that I would slightly change the current interface, specifically:

  • make the DataProvider class abstract, e.g., removing 'read_database'
  • get_properties(property_names, idx=None) will return a dict of properties according to the given indices
  • iterate(property_names, idx=None) does the same but as a generator
  • implement a reference ASEDataProvider which will include the functionality currently in DataProvider
  • then we could have subclasses of DataProvider on top that do batching, pre-loading etc. for the low-level DataProviders

Any thoughts?

@ktschuett ktschuett self-assigned this Oct 1, 2017
@andersx
Copy link
Member

andersx commented Oct 2, 2017

Sounds good to me!

@ktschuett
Copy link
Author

After pondering a bit over the DataProvider (I was on vacation, so only pondering), I have some thoughts and ideas, also for the whole project:

  • when I have periodic boundary conditions, I preprocess the data by collecting neighborhood information and write that into the ASE database. How should I deal with that here? I could (1) just write that information into the DB, but this is somehow method-specific. Option (2) would be to copy the whole database, which would be a waste of memory. Perhaps it would be best, to require the user to choose in the config.
  • perhaps, we don't even need this dataprovider classes, we could have this package as a pure interface (as in the name). I would put all the code that does the work in my github as a separate, stand-alone package, same for GDML, SOAP, etc. Then, we can have here only template files for the interface, i.e., we will just do duck-typing to avoid dependencies to this package. That way, this package only needs to "point" to the interface classes of the other packages that take a path to ASE DB, train_idx, val_idx, and a json file with model config, and returns a predefined results object (e.g.. NamedTuple). This way, we can quickly integrate more codes and regularly check for compatibility issues with some CI tool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants