Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(memh5): fix a bug where shared datasets weren't actually shared #239

Merged
merged 3 commits into from
Jul 30, 2024

Conversation

ljgray
Copy link
Contributor

@ljgray ljgray commented Mar 20, 2023

This PR makes the shared argument in MemDiskGroup actually share the MemDataset object and not just the underlying data array. Based on the idea of shared datasets being maintained through operations such as redistribution, axis downselection is not allowed for shared datasets.

It's important to note that the default behaviour of memh5.deep_group_copy is (and has been since inception) to only make a shallow copy of the MemDataset - i.e., the underlying array is actually shared. It doesn't matter when using this method to copy from disk -> memory or memory-> disk, but it does have implications when copying memory -> memory. This probably shouldn't be the default behaviour, but we would have to be extremely careful if making any changes to avoid breaking anything.

I've also updated the docstring for deep_group_copy to make note of the copy behaviour.

This lets us remove the .copy method in draco.ContainerBase as it will properly replicate that behaviour (see radiocosmology/draco#231)

Also, add an option to skip specific datasets when copying.

@ljgray ljgray force-pushed the ljg/convert-distributed branch 4 times, most recently from f6021e3 to 897e85b Compare March 21, 2023 00:45
@ljgray ljgray marked this pull request as ready for review March 21, 2023 00:54
@ljgray ljgray requested a review from jrs65 March 21, 2023 00:56
@ljgray ljgray changed the title fix(memh5): fix a bug where shared dataset names weren't correct fix(memh5): fix a bug where shared datasets weren't actually shared Mar 21, 2023
@ljgray ljgray force-pushed the ljg/convert-distributed branch from 897e85b to c3dc714 Compare April 13, 2023 23:05
@ljgray ljgray force-pushed the ljg/convert-distributed branch 3 times, most recently from 714b167 to 19d5c0c Compare April 25, 2023 19:01
@ljgray ljgray force-pushed the ljg/convert-distributed branch from 19d5c0c to 196849f Compare May 12, 2023 22:11
@ljgray ljgray force-pushed the ljg/convert-distributed branch 5 times, most recently from fe5dfdb to fcd3cc1 Compare July 21, 2023 21:02
@jrs65
Copy link
Contributor

jrs65 commented Jul 26, 2023

I have just been looking through this one again and reflecting on what the behaviour should be. It's not clear to me at what level a shared dataset should be shared. Really the only purpose of it is to save on memory, so anything that does that is fine. I'm not sure about shared datasets maintaining that through redistribution though, it also might be good to allow shared selections but that has its own implications.

@ljgray
Copy link
Contributor Author

ljgray commented Jul 26, 2023

I think that it's valid for a user to expect that a shared dataset would remain shared through a redistribution, since that's a common operation and it defeats the purpose of memory savings if we lose those savings the moment we redistribute.

Selection is tricky since it would effectively have to replace the memdataset within the group with a new one with the selections applied, so I'm not sure how that would work without modifying the way memdatasets are referenced by the group (i.e., rather than the shared dataset just pointing to the same object, it would somehow have to point to a reference in the group tree which in turn references a dataset object, so if the dataset object changed the tree reference would still be the same thing). Unless you just mean applying selections during the copy, which has the potential to be tricky as well and probably wouldn't be used enough to be worth the effort (right now someone could just make the selection then make a shared copy, which would work fine)

@ljgray ljgray force-pushed the ljg/convert-distributed branch from fcd3cc1 to b84ce8a Compare August 1, 2023 19:48
@ljgray ljgray force-pushed the ljg/convert-distributed branch from b84ce8a to d923852 Compare November 17, 2023 23:52
@ljgray ljgray force-pushed the ljg/convert-distributed branch from d923852 to 8b8a462 Compare November 25, 2023 01:26
@ljgray ljgray force-pushed the ljg/convert-distributed branch from 8b8a462 to 0603c2e Compare March 28, 2024 17:55
@ljgray ljgray force-pushed the ljg/convert-distributed branch from 0603c2e to 098847e Compare May 10, 2024 16:35
@ljgray ljgray force-pushed the ljg/convert-distributed branch from 098847e to 8c4ff16 Compare May 21, 2024 18:22
@ljgray ljgray requested a review from ketiltrout May 21, 2024 18:24
@ljgray ljgray force-pushed the ljg/convert-distributed branch from 8c4ff16 to b3d7415 Compare May 21, 2024 19:43
caput/memh5.py Outdated Show resolved Hide resolved
@ljgray ljgray force-pushed the ljg/convert-distributed branch from b3d7415 to 45ea205 Compare June 17, 2024 22:58
@ljgray ljgray force-pushed the ljg/convert-distributed branch from 45ea205 to 6a4ec97 Compare July 3, 2024 19:51
@ljgray ljgray force-pushed the ljg/convert-distributed branch from 6a4ec97 to 074fdeb Compare July 26, 2024 20:17
@ljgray
Copy link
Contributor Author

ljgray commented Jul 26, 2024

@ketiltrout I've updated this to simplify the copy pathways and address the to_file issues you mentioned. Now, there should only be two paths:

  • If shallow is False (default), all datasets are deep-copied except those in shared. In the case where to_file is True, dataset names provided in shared are ignored
  • If shallow is True, all datasets are fully shared - no processing, data type changes, etc... are applied. We end up with a new group pointing to the same MemDataset objects as the original group.

@ljgray
Copy link
Contributor Author

ljgray commented Jul 26, 2024

@ketiltrout I also removed an extra byte order check, which was required for a now-fixed mpi4py bug.

@ljgray ljgray requested a review from ketiltrout July 26, 2024 20:31
@ljgray ljgray force-pushed the ljg/convert-distributed branch from 48b5022 to fddebde Compare July 26, 2024 22:00
Copy link
Member

@ketiltrout ketiltrout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this looks a lot better! I've made a few suggestions for your docstring.

One potential thing I've though of:

In the case when g2 is a file, instead of ignoring shared, another option would be to raise a ValueError on a non-empty value. I don't know if that's better or worse than what you've implemented.

ValueError could help users find catch where they've maybe misconstrued their call to deep_group_copy, but the downside is it forces any call to deep_group_copy to be type-aware (i.e. of the type g2), which may be a worse problem.

caput/memh5.py Outdated Show resolved Hide resolved
caput/memh5.py Outdated Show resolved Hide resolved
caput/memh5.py Outdated Show resolved Hide resolved
caput/memh5.py Outdated Show resolved Hide resolved
caput/memh5.py Outdated Show resolved Hide resolved
@ljgray ljgray force-pushed the ljg/convert-distributed branch from fddebde to 144b655 Compare July 29, 2024 22:27
@ljgray
Copy link
Contributor Author

ljgray commented Jul 29, 2024

Yeah, this looks a lot better! I've made a few suggestions for your docstring.

One potential thing I've though of:

In the case when g2 is a file, instead of ignoring shared, another option would be to raise a ValueError on a non-empty value. I don't know if that's better or worse than what you've implemented.

ValueError could help users find catch where they've maybe misconstrued their call to deep_group_copy, but the downside is it forces any call to deep_group_copy to be type-aware (i.e. of the type g2), which may be a worse problem.

I've decided to meet in the middle and add a warning. I don't really think we would want this to fail entirely, but it's worth letting the user know that something is happening that they didn't expect

@ketiltrout
Copy link
Member

Yeah, that's a good compromise.

ljgray and others added 3 commits July 29, 2024 15:29
Previously, 'deep_group_copy' would, by default, copy the 'MemDataset'
object of each dataset, but would continue to point to the same underlying
data array. This led to confusing behaviour where data would end up only
partially shared between two groups.

These changes restrict copying to one of two pathways: full deep copy or
full sharing of the 'MemDataset' object and all its attributes. Sharing
is explicitly disabled when copying from memory to disk.
This was originally required to fix a bug in mpi4py/mpi4py#177
This bug has now been fixed in mpi4py/mpi4py#179
@ljgray ljgray force-pushed the ljg/convert-distributed branch from 144b655 to b7bc6e3 Compare July 29, 2024 22:29
@ljgray ljgray merged commit f188a7e into master Jul 30, 2024
5 checks passed
@ljgray ljgray deleted the ljg/convert-distributed branch July 30, 2024 00:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants