Refactor the inner data abstraction #552
Replies: 2 comments 1 reply
-
personally, I'm totally down to get rid of the images = [img.science for img in work_unit.image_stack.get_images()] I was specifically thinking about this in regards to issues with re projection and data provenance (which we should discuss at this week's meeting), as we'll probably need to have a couple more elements of |
Beta Was this translation helpful? Give feedback.
-
Ok, so I agree that having an access to an abstraction that peers into the arrays on WU is good. To me that sounds like a: @dataclass
class LayeredImage:
science : np.array
mask : np.array
variance : np.array
psf : np.array
obstime : float or a struct on C++ side:
Any masking, convolution etc. functions then just operate on an array with overrides for lists of arrays and layered image. All this other stuff goes:
and then all of this can go: and then none of these indexing trick and abstractions we use C++ side ever have a reason to float up to Python (which they now do so all those extensive bindings can also go away). From the transition to Eigen we can implement convolution, masking, growing etc. in a way that they become functions usable directly in Python with numpy arrays without copies. So whatever masking and convolution code we have in C++ is now available Python side too - sweet. As it is now, we have to punt it all to C++ object this and that, to do an operation, then unpack to the attribute we want to get the result of the operation so we can then further work with them as numpy arrays because we don't have an effective tool to use these classes in the way C++ compiler enables us to. The impracticality of this is my primary annoyance. I guess I could see that it's easier to implement WU in terms of a list of layered images than three independent collections of arrays, So I could take that bit back, but we could also just implement a getter for LayeredImage from WU and an iterator, and make iteration over WU, return very cheap LayeredImage dataclasses. I don't think element-access actually requires any of the bulk we have just to support this class. |
Beta Was this translation helpful? Give feedback.
-
right now to write what is perceived as a simple thing sometimes we have to jump through inordinate amount of hoops.
For example:
because internally (in the core search, in C++) we wrap every array in RawImage, we stack those into LayeredImages, we then collect those into ImageStacks which go to StackSearch to get run. Anything we add in Python, like for example the work unit, is then one wrapper above all of this.
Then we have to peel the onion - from work unit to image stack, to layered image, to raw image, to underlying array. Much of the C++ functionality can be simplified, since the transition to Eigen, and we only really need one core data abstraction. A namespace one - like the StackSearch - that doesn't need to be availible from Python but just collects some functionality that helps manage and moderate GPU communication and execution, and one that holds the data.
My original preference was to just have one large ImageStack that has lists of arrays for science, mask, variance images, PSFs, WCSs and timestamps. The rest of the functionality can be implemented in terms of collections of Eigen/numpy arrays and lists/arrays of supporting data (f.e. timestamps), which are preferentially eigen row vectors in C++ so they map to numpy row vectors in Python and let us do things like list and boolean indexing and vectorized operations.
Then Steven brought up a good point about how he wanted to just directly set Psi/Phi, so now I'm thinking perhaps the derivation of those values should all be in Python and the core search is just those three lists - obstime, phi psi. Then the
WorkUnit
can jsut build the two arrays python-side.Beta Was this translation helpful? Give feedback.
All reactions