-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LMDB traversal cli #301
LMDB traversal cli #301
Conversation
Signed-off-by: Kin Long Kelvin Lee <[email protected]>
Signed-off-by: Kin Long Kelvin Lee <[email protected]>
Signed-off-by: Kin Long Kelvin Lee <[email protected]>
Signed-off-by: Kin Long Kelvin Lee <[email protected]>
Signed-off-by: Kin Long Kelvin Lee <[email protected]>
Signed-off-by: Kin Long Kelvin Lee <[email protected]>
Signed-off-by: Kin Long Kelvin Lee <[email protected]>
Signed-off-by: Kin Long Kelvin Lee <[email protected]>
Signed-off-by: Kin Long Kelvin Lee <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Not entirely sure why we would want to use a window size instead of just setting num_samples to something smaller. Made one comment for potential clean-up, but feel free to merge when ready.
matsciml/datasets/lmdb_cli.py
Outdated
transforms = [] | ||
if periodic: | ||
transforms.append(PeriodicPropertiesTransform(radius, adaptive_cutoff)) | ||
if graph_backend: | ||
transforms.append(PointCloudToGraphTransform(graph_backend)) | ||
target_class = ( | ||
BaseLMDBDataset | ||
if not dataset_type | ||
else registry.get_dataset_class(dataset_type) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could consolidate this common code block into its own function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed with 16da9d8
So window size is used by the running average, so as you're iterating through the dataset it will do (by default) a running average of properties based on 10 of the last samples. It's different from just capping the number of samples to go through, because you might want to sweep through the data and look for outliers. |
This PR adds a big QoL oriented CLI, which provides some high level functionality for inspecting LMDB datasets.
matsciml.datasets.lmdb_cli
module, which houses aclick
-based interface with multiple commands that perform various LMDB inspection taskspyproject.toml
to installlmdb_cli
as a "script", which allows you to access the CLI after installingmatsciml
simply by runninglmdb_cli
in the command line.