-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle ops between FDGrid and Python functions #458
base: develop
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #458 +/- ##
========================================
Coverage 80.48% 80.49%
========================================
Files 95 95
Lines 7442 7445 +3
========================================
+ Hits 5990 5993 +3
Misses 1452 1452
Continue to review full report at Codecov.
|
skfda/representation/grid.py
Outdated
@@ -741,11 +741,14 @@ def _get_op_matrix( | |||
self._check_same_dimensions(other) | |||
return other.data_matrix | |||
|
|||
elif isinstance(other, Callable): | |||
return np.array([[other(x) for x in _gp] for _gp in self.grid_points]).T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of things:
- I think we should limit support only to vectorized functions. Otherwise we have to call repeatedly the function for every point, which is too slow.
- I think this code won't work for functions of several variables (e.g. surfaces). For supporting that case we should probably create a matrix with every coordinate were functions are going to be evaluated, call the function with that matrix and then reshape the result.
- Another option would be to check first if the function supports vectorized input, and fall back to the slow point by point evaluation otherwise. This would be (much) less performant, but probably useful for newcomers and small datasets.
FData
are callables too. What should happen if we do ops betweenFData
with different domains? And between twoFDataGrid
with the same domain and differentgrid_points
? (we could lose commutativity in this case!) And between aFDataGrid
and aFDataBasis
with the same domain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- My understanding is that it doesn't make a real difference in terms of performance except for very small arrays, see [1].
2/3. This is true, it's now fixed. - This is also true, and a good point, which is why the testing for
Callable
is after testing forFDataGrid
. That way all your concerns are tackled by theFDataGrid
handler.
[1] https://stackoverflow.com/questions/35215161/most-efficient-way-to-map-function-over-numpy-array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I don't think this will ever be a hot path, so I wouldn't necessarily spend too much time thinking about micro-optimizing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay.
- My understanding is that it doesn't make a real difference in terms of performance except for very small arrays, see [1].
I didn't mean to use np.vectorize
, but to assume that the other
function accepts an array as input and return the array as output. This way, it would be very efficient for ufuncs like np.sqrt
or np.sin
, or functions created composing these, as they perform the iteration in C code, probably using SIMD instructions for further performance.
2/3. This is true, it's now fixed.
I would first try passing the array of coordinates, and if that fails pass the coordinates as different parameters for the efficiency reasons explained above.
2. This is also true, and a good point, which is why the testing for `Callable` is after testing for `FDataGrid`. That way all your concerns are tackled by the `FDataGrid` handler.
Ok, so for now FDataGrid
is special, and two FDataGrid
s with different domains cannot be operated together. If a user wants that they can always use lambda x: my_fdatagrid(x)
instead. We can also change this behaviour in the future if we found a reasonable way to support it.
I have added a commit to address most of your comments, let me know what you think! |
Hi there 👋
The goal of this PR is to be able to make it easier to do operations between FDGrids and normal python functions, like this
The feature seems to work and tests are still green but the PR probably needs a little bit more work to be mergeable (I don't think I manage well all possible grids since it's not clear to me what is possible to have as a grid), so I would be happy to get guidance on this.
Cheers!