fix(memh5): fix a bug where shared datasets weren't actually shared #239

ljgray · 2023-03-20T23:35:08Z

This PR makes the shared argument in MemDiskGroup actually share the MemDataset object and not just the underlying data array. Based on the idea of shared datasets being maintained through operations such as redistribution, axis downselection is not allowed for shared datasets.

It's important to note that the default behaviour of memh5.deep_group_copy is (and has been since inception) to only make a shallow copy of the MemDataset - i.e., the underlying array is actually shared. It doesn't matter when using this method to copy from disk -> memory or memory-> disk, but it does have implications when copying memory -> memory. This probably shouldn't be the default behaviour, but we would have to be extremely careful if making any changes to avoid breaking anything.

I've also updated the docstring for deep_group_copy to make note of the copy behaviour.

This lets us remove the .copy method in draco.ContainerBase as it will properly replicate that behaviour (see radiocosmology/draco#231)

~~Also, add an option to skip specific datasets when copying.~~

jrs65 · 2023-07-26T11:37:02Z

I have just been looking through this one again and reflecting on what the behaviour should be. It's not clear to me at what level a shared dataset should be shared. Really the only purpose of it is to save on memory, so anything that does that is fine. I'm not sure about shared datasets maintaining that through redistribution though, it also might be good to allow shared selections but that has its own implications.

ljgray · 2023-07-26T14:55:37Z

I think that it's valid for a user to expect that a shared dataset would remain shared through a redistribution, since that's a common operation and it defeats the purpose of memory savings if we lose those savings the moment we redistribute.

Selection is tricky since it would effectively have to replace the memdataset within the group with a new one with the selections applied, so I'm not sure how that would work without modifying the way memdatasets are referenced by the group (i.e., rather than the shared dataset just pointing to the same object, it would somehow have to point to a reference in the group tree which in turn references a dataset object, so if the dataset object changed the tree reference would still be the same thing). Unless you just mean applying selections during the copy, which has the potential to be tricky as well and probably wouldn't be used enough to be worth the effort (right now someone could just make the selection then make a shared copy, which would work fine)

caput/memh5.py

ljgray · 2024-07-26T20:28:20Z

@ketiltrout I've updated this to simplify the copy pathways and address the to_file issues you mentioned. Now, there should only be two paths:

If shallow is False (default), all datasets are deep-copied except those in shared. In the case where to_file is True, dataset names provided in shared are ignored
If shallow is True, all datasets are fully shared - no processing, data type changes, etc... are applied. We end up with a new group pointing to the same MemDataset objects as the original group.

ljgray · 2024-07-26T20:31:53Z

@ketiltrout I also removed an extra byte order check, which was required for a now-fixed mpi4py bug.

ketiltrout

Yeah, this looks a lot better! I've made a few suggestions for your docstring.

One potential thing I've though of:

In the case when g2 is a file, instead of ignoring shared, another option would be to raise a ValueError on a non-empty value. I don't know if that's better or worse than what you've implemented.

ValueError could help users find catch where they've maybe misconstrued their call to deep_group_copy, but the downside is it forces any call to deep_group_copy to be type-aware (i.e. of the type g2), which may be a worse problem.

caput/memh5.py

ljgray · 2024-07-29T22:29:05Z

Yeah, this looks a lot better! I've made a few suggestions for your docstring.

One potential thing I've though of:

In the case when g2 is a file, instead of ignoring shared, another option would be to raise a ValueError on a non-empty value. I don't know if that's better or worse than what you've implemented.

ValueError could help users find catch where they've maybe misconstrued their call to deep_group_copy, but the downside is it forces any call to deep_group_copy to be type-aware (i.e. of the type g2), which may be a worse problem.

I've decided to meet in the middle and add a warning. I don't really think we would want this to fail entirely, but it's worth letting the user know that something is happening that they didn't expect

ketiltrout · 2024-07-29T22:29:35Z

Yeah, that's a good compromise.

Previously, 'deep_group_copy' would, by default, copy the 'MemDataset' object of each dataset, but would continue to point to the same underlying data array. This led to confusing behaviour where data would end up only partially shared between two groups. These changes restrict copying to one of two pathways: full deep copy or full sharing of the 'MemDataset' object and all its attributes. Sharing is explicitly disabled when copying from memory to disk.

This was originally required to fix a bug in mpi4py/mpi4py#177 This bug has now been fixed in mpi4py/mpi4py#179

ljgray mentioned this pull request Mar 20, 2023

feat(containers): remove '.copy' method from ContainerBase. radiocosmology/draco#231

Merged

ljgray force-pushed the ljg/convert-distributed branch 4 times, most recently from f6021e3 to 897e85b Compare March 21, 2023 00:45

ljgray marked this pull request as ready for review March 21, 2023 00:54

ljgray requested a review from jrs65 March 21, 2023 00:56

ljgray changed the title ~~fix(memh5): fix a bug where shared dataset names weren't correct~~ fix(memh5): fix a bug where shared datasets weren't actually shared Mar 21, 2023

ljgray force-pushed the ljg/convert-distributed branch from 897e85b to c3dc714 Compare April 13, 2023 23:05

ljgray force-pushed the ljg/convert-distributed branch 3 times, most recently from 714b167 to 19d5c0c Compare April 25, 2023 19:01

ljgray force-pushed the ljg/convert-distributed branch from 19d5c0c to 196849f Compare May 12, 2023 22:11

ljgray force-pushed the ljg/convert-distributed branch 5 times, most recently from fe5dfdb to fcd3cc1 Compare July 21, 2023 21:02

ljgray force-pushed the ljg/convert-distributed branch from fcd3cc1 to b84ce8a Compare August 1, 2023 19:48

ljgray force-pushed the ljg/convert-distributed branch from b84ce8a to d923852 Compare November 17, 2023 23:52

ljgray force-pushed the ljg/convert-distributed branch from d923852 to 8b8a462 Compare November 25, 2023 01:26

ljgray force-pushed the ljg/convert-distributed branch from 8b8a462 to 0603c2e Compare March 28, 2024 17:55

ljgray force-pushed the ljg/convert-distributed branch from 0603c2e to 098847e Compare May 10, 2024 16:35

ljgray force-pushed the ljg/convert-distributed branch from 098847e to 8c4ff16 Compare May 21, 2024 18:22

ljgray mentioned this pull request May 21, 2024

Baseline-dependent rebinner radiocosmology/draco#269

Merged

ljgray requested a review from ketiltrout May 21, 2024 18:24

ljgray force-pushed the ljg/convert-distributed branch from 8c4ff16 to b3d7415 Compare May 21, 2024 19:43

ketiltrout reviewed May 21, 2024

View reviewed changes

caput/memh5.py Outdated Show resolved Hide resolved

ljgray force-pushed the ljg/convert-distributed branch from b3d7415 to 45ea205 Compare June 17, 2024 22:58

ljgray force-pushed the ljg/convert-distributed branch from 45ea205 to 6a4ec97 Compare July 3, 2024 19:51

ljgray force-pushed the ljg/convert-distributed branch from 6a4ec97 to 074fdeb Compare July 26, 2024 20:17

ljgray requested a review from ketiltrout July 26, 2024 20:31

ljgray force-pushed the ljg/convert-distributed branch from 48b5022 to fddebde Compare July 26, 2024 22:00

ketiltrout approved these changes Jul 28, 2024

View reviewed changes

caput/memh5.py Outdated Show resolved Hide resolved

caput/memh5.py Outdated Show resolved Hide resolved

caput/memh5.py Outdated Show resolved Hide resolved

caput/memh5.py Outdated Show resolved Hide resolved

caput/memh5.py Outdated Show resolved Hide resolved

ljgray force-pushed the ljg/convert-distributed branch from fddebde to 144b655 Compare July 29, 2024 22:27

ljgray and others added 3 commits July 29, 2024 15:29

feat(memh5): remove byte order check in 'deep_group_copy'

d10fdce

This was originally required to fix a bug in mpi4py/mpi4py#177 This bug has now been fixed in mpi4py/mpi4py#179

feat(memh5): add warning when trying to make shallow/shared copy to file

b7bc6e3

ljgray force-pushed the ljg/convert-distributed branch from 144b655 to b7bc6e3 Compare July 29, 2024 22:29

ljgray merged commit f188a7e into master Jul 30, 2024
5 checks passed

ljgray deleted the ljg/convert-distributed branch July 30, 2024 00:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(memh5): fix a bug where shared datasets weren't actually shared #239

fix(memh5): fix a bug where shared datasets weren't actually shared #239

ljgray commented Mar 20, 2023 •

edited

Loading

jrs65 commented Jul 26, 2023

ljgray commented Jul 26, 2023

ljgray commented Jul 26, 2024

ljgray commented Jul 26, 2024

ketiltrout left a comment

ljgray commented Jul 29, 2024

ketiltrout commented Jul 29, 2024

fix(memh5): fix a bug where shared datasets weren't actually shared #239

fix(memh5): fix a bug where shared datasets weren't actually shared #239

Conversation

ljgray commented Mar 20, 2023 • edited Loading

jrs65 commented Jul 26, 2023

ljgray commented Jul 26, 2023

ljgray commented Jul 26, 2024

ljgray commented Jul 26, 2024

ketiltrout left a comment

Choose a reason for hiding this comment

ljgray commented Jul 29, 2024

ketiltrout commented Jul 29, 2024

ljgray commented Mar 20, 2023 •

edited

Loading