Data Provider #3

ktschuett · 2017-10-01T17:27:13Z

I will have a look into the data provider interface and do a reference implementation for ASE DB files including periodic boundary conditions. For that I would slightly change the current interface, specifically:

make the DataProvider class abstract, e.g., removing 'read_database'
get_properties(property_names, idx=None) will return a dict of properties according to the given indices
iterate(property_names, idx=None) does the same but as a generator
implement a reference ASEDataProvider which will include the functionality currently in DataProvider
then we could have subclasses of DataProvider on top that do batching, pre-loading etc. for the low-level DataProviders

Any thoughts?

andersx · 2017-10-02T08:42:49Z

Sounds good to me!

ktschuett · 2017-10-10T14:03:57Z

After pondering a bit over the DataProvider (I was on vacation, so only pondering), I have some thoughts and ideas, also for the whole project:

when I have periodic boundary conditions, I preprocess the data by collecting neighborhood information and write that into the ASE database. How should I deal with that here? I could (1) just write that information into the DB, but this is somehow method-specific. Option (2) would be to copy the whole database, which would be a waste of memory. Perhaps it would be best, to require the user to choose in the config.
perhaps, we don't even need this dataprovider classes, we could have this package as a pure interface (as in the name). I would put all the code that does the work in my github as a separate, stand-alone package, same for GDML, SOAP, etc. Then, we can have here only template files for the interface, i.e., we will just do duck-typing to avoid dependencies to this package. That way, this package only needs to "point" to the interface classes of the other packages that take a path to ASE DB, train_idx, val_idx, and a json file with model config, and returns a predefined results object (e.g.. NamedTuple). This way, we can quickly integrate more codes and regularly check for compatibility issues with some CI tool.

ktschuett self-assigned this Oct 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Provider #3

Data Provider #3

ktschuett commented Oct 1, 2017

andersx commented Oct 2, 2017

ktschuett commented Oct 10, 2017

Data Provider #3

Data Provider #3

Comments

ktschuett commented Oct 1, 2017

andersx commented Oct 2, 2017

ktschuett commented Oct 10, 2017