Performance Considerations #294
Replies: 3 comments 3 replies
-
See cuNumeric (blog post) for possibly speeding up standard NumPy functionality with NVIDIA GPUs |
Beta Was this translation helpful? Give feedback.
-
Here's a series of great blog posts on improving Python speed. They're all from the same person and they build on each other, so it may be best to read them in this order. |
Beta Was this translation helpful? Give feedback.
-
One other use case that I cleaned up all the examples of in one of my PRs is the recreation of arrays: new_arr = np.array(arr > 0) vs new_arr = arr > 0 where the first example is 20% slower, which isn't terrible at 2.84ms for a 180x25x100x3x3 array, however, what I found in testing out a lot of code snippets within a lot of the calculations is that this speedup can add up unexpectedly because it also reduces the memory usage of the program (not computed). My working theory is that the memory reduction is the more important piece of the equation because as our arrays grow, our memory usage also balloons in many computations. |
Beta Was this translation helpful? Give feedback.
-
Ideas for further performance improvements.
See my python_performance repo for demos of these notes.
Numpy performance considerations
Vectorization and broadcasting are fundamental techniques for using Numpy in a performant manner. This PaperspaceBlog post taught me the foundation of this.
A major consideration when dealing with Numpy arrays is whether the array in an expression is or needs to be copied. This extra step can be time consuming and Numpy has infrastructure to avoid it when possible.
Array copies
Forcing a copy is often done accidentally. For example, the first expression below forces a copy of the array by including
[:]
while the second doesn't.The second is about 20% faster than the first.
This SO post is relevant: https://stackoverflow.com/questions/15424211/transfer-ownership-of-numpy-data
Numpy arrays are sometimes a View to the array. This sounds and acts like a pointer, but it may be something else at the low level
https://numpy.org/doc/stable/reference/generated/numpy.ndarray.view.html
https://numpy.org/doc/stable/reference/generated/numpy.ndarray.transpose.html#numpy.ndarray.transpose
https://www.w3schools.com/python/numpy/numpy_copy_vs_view.asp
Array access
Array access should be contiguous
No element-wise access is better - let Numpy do the multiplication
Data management with Attrs slots
https://threeofwands.com/attrs-ii-slots/
https://stackoverflow.com/questions/472000/usage-of-slots
https://www.youtube.com/watch?v=Fot3_9eDmOs
Beta Was this translation helpful? Give feedback.
All reactions