Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve runtime and peak memory usage of PVEngine.run_full_mode #140

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

kandersolar
Copy link
Contributor

@kandersolar kandersolar commented Feb 25, 2022

This PR does two things:

  1. As discussed in Runtime scales poorly with n_pvrows #134, the majority of the runtime of PVEngine.run_full_mode is in the matrix inversion. However, especially for large simulations, using a sparse linear solve function turns out to be significantly faster than explicitly inverting the full matrix. This replaces the linear algebra operations from numpy.linalg with sparse equivalents from scipy.sparse.linalg.
  2. PVEngine.run_full_mode creates many large matrices, most of which are only needed in certain parts of the function. This results in significant (GBs) memory usage for large simulations. Especially for large simulations, including del statements when these large matrices are no longer needed reduces maximum memory usage significantly.

Here are some runtime and peak memory comparisons of a typical call to the pvlib wrapper function (pvlib.bifacial.pvfactors_timeseries in v0.9.0). All values are the ratio of this PR to v1.5.2, so e.g. a value of 0.67 means this PR runs in 2/3 the time, or uses 2/3 the memory, relative to v1.5.2.

Runtime ratio:

n_timestamps  100    1000   5000   10000
n_pvrows                                
3              0.91   0.77   0.81   0.80
5              0.61   0.41   0.31   0.39
7              0.60   0.40   0.30   0.34

Peak RAM usage ratio:

n_timestamps  100    1000   5000   10000
n_pvrows                                
3              0.99   0.84   0.67   0.63
5              0.97   0.67   0.60   0.59
7              0.75   0.61   0.58   0.58

The above ratios, especially the peak memory values, are relatively stable on my machine. Timing is rather less stable; reducing external CPU load as much as possible and setting the power mode to "best battery" (in an attempt to disable CPU turbo boost and such) seems to help. I would be very interested to see if others can recreate the above results, especially on the timing side -- @spaneja kindly ran some for me earlier and the results were not wholly consistent with what I've shown above.

Miscellaneous to-do:

  • take a closer look at where memory is allocated and where it is used; the current del situation is probably still suboptimal
  • what's new
  • tests?
  • make scipy an explicit dependency (right now it's implicit via pvlib)
  • document _sparse_solve
  • ...and maybe put it somewhere that makes more sense?
  • consider making the sparse approach opt-in via a boolean kwarg or similar?

@kandersolar
Copy link
Contributor Author

I suppose this is ready for review! Open questions:

  • The private sparse solve function seems a bit out of place in engine.py; should it live somewhere else?
  • Should the new sparse approach be made opt-in somehow?
  • An alternative to using del is to break up run_full_mode into small functions where the intermediate arrays created in each sub function go out of scope when that function finishes. I think sub functions is probably the more pythonic approach, but it would be a bigger change than just throwing in a few dels. Any preference?
  • Most importantly: can other people replicate the runtime and memory improvements I'm seeing locally?

@kandersolar kandersolar marked this pull request as ready for review February 26, 2022 14:28
@mikofski
Copy link

You probably already know this, but the complexity/organization/shape of the matrix also affects computation speed. Diagonal, banded, & symmetrical matrices are easier to solve. If you can permute the matrix so that the non-zero elements are close to the diagonal axis you can further improve the computation simplicity. Although I believe most sparse solvers will try to permute the matrix automatically. See: https://en.m.wikipedia.org/wiki/Band_matrix

Copy link
Contributor

@anomam anomam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this @kanderso-nrel ! LGTM
I ran it quickly locally on my very old mac and I definitely saw an improvement 🚀 I only have a few minor comments below

from pvfactors.viewfactors import VFCalculator
from pvfactors.irradiance import HybridPerezOrdered
from pvfactors.config import DEFAULT_RHO_FRONT, DEFAULT_RHO_BACK


def _sparse_solve_3D(A, b):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any chance you could add the arg and returned types as type hints please? (instead of in the docstrings) most of the code doesn't have it but maybe we could start adding it for new implementations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but I've not used type hints before so please let me know if I've messed it up :)

# iteration across the time dimension is required
# - scipy 1.8.0 added new sparse arrays (as opposed to sparse matrices);
# they are 2-D only at time of writing, but in the future if they
# become N-D then it may make sense to use them here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot for adding these comments, it's super helpful 🚀 maybe we could have them in the docstrings as well?

@@ -250,6 +293,8 @@ def run_full_mode(self, fn_build_report=None):
# Calculate absorbed irradiance
qabs = (np.einsum('ijk,jk->ik', vf_aoi_matrix, q0)[:-1, :]
+ irradiance_abs_mat)
del irradiance_abs_mat
del vf_aoi_matrix

# --- Update surfaces with values: the lists are ordered by index
for idx_surf, ts_surface in enumerate(pvarray.all_ts_surfaces):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome 👍 it doesn't shock me to have the dels in this function for now, although it is true that it is not that frequent to see in python. We could maybe add a small note in the docstrings to explain why we're doing that

x_dense = np.linalg.solve(A, b)
x_sparse = _sparse_solve_3D(A, b)
assert x_sparse.shape == b.shape
np.testing.assert_allclose(x_dense, x_sparse)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome 👍

@kandersolar
Copy link
Contributor Author

If you can permute the matrix so that the non-zero elements are close to the diagonal axis you can further improve the computation simplicity. Although I believe most sparse solvers will try to permute the matrix automatically.

scipy.sparse.linalg.spsolve seems to include automatic column permuting but I didn't think I was qualified to try to do better than the default :) For reference, here's what a typical slice of a_mat looks like (color means nonzero):

image

@kandersolar
Copy link
Contributor Author

Passing empty timeseries inputs currently works on master but didn't work here; 480b350 fixes that and adds a relevant test. Not sure how valuable it is to support that use case, but might as well if it's easy...

Copy link
Contributor

@anomam anomam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So sorry for the delay, and thanks so much for the updates! LGTM 🚀

@kandersolar
Copy link
Contributor Author

Ping @campanelli-sunpower -- these changes would be nice to have in the main branch for a project @spaneja and I are doing, any chance this could get merged soonish?

@kandersolar kandersolar marked this pull request as draft May 19, 2022 16:46
@kandersolar
Copy link
Contributor Author

Putting this back to draft -- after further experimentation I'm no longer confident that this PR is the best approach. I will continue the discussion back in #134.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants