-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runtime scales poorly with n_pvrows #134
Comments
Thanks a lot for this study @kanderso-nrel , that's really awesome data that I never collected before 🙏 Back in the days I focused my time trying to increase the calculation speed with respect to the # of time data points using vectorization. But that's only 1 dimension of the 3 dimensional view factor matrix that is inverted in the PV engine. I would be curious to see where it spends the most time (if I have time maybe I'll try to run some quick profiling with pyinstrument or something like that); I would expect in the inversion process but I could be wrong. If that's the case, maybe doing the inversion using a GPU could help..? (I think it would be quite fast to transform the numpy array into a Pytorch tensor) The only other way to speed things up that I can think of right now is to make some approximations: about the vf matrix, or maybe about irradiance, reflections, etc. Out of curiosity, did you try running the "fast" mode of the engine? I'm wondering how far off it is compared to the full mode (IIRC it only calculates back side irradiance for a single PV row in the whole array). |
Thanks for the pointer to I continue to find it difficult to get consistent timings, but indeed the inversion of I'm far from a computational linear algebra expert but it seems like it should be possible to somehow take advantage of Much of the rest of the runtime in full mode is spent building the two vf matrices. The fast mode is substantially faster: tenths of a second for |
that's a good idea! I'll try to investigate on my end as well.
Thanks for trying it : ) I was expecting worse results!! But yeah, I guess it really depends on the application (so the user) in the end. |
Hmm, another idea is to never explicitly invert the matrix at all? For large simulations I see 10-15% faster calls to Lines 223 to 226 in e0ea9d5
with |
wow that's awesome @kanderso-nrel ! It's very true that |
Some more progress, this time by combining the two previous ideas: use a sparse linear solver instead of explicitly inverting the matrix. I want to do some more testing to make sure it behaves nicely outside of my little benchmarks, but here's a runtime comparison in the meantime -- calls to the pvlib wrapper function are 30% to 60% faster in this case: |
After continued efforts to speed up pvfactors (as always, collaborating with @spaneja), I think the timing interpretation I posted above is not wholly correct. tl;dr: I think just using a dense solver like The runtime of Click to expand!In [1]: from threadpoolctl import ThreadpoolController
...: import numpy as np # v1.22.3, installed from PyPI
...: import time
...:
...: controller = ThreadpoolController()
...: a = np.random.random((500, 321, 321))
...: b = np.random.random((500, 321))
In [2]: st = time.perf_counter()
...: _ = np.linalg.solve(a, b)
...: ed = time.perf_counter()
...: print(ed - st)
5.766736300000002
In [3]: st = time.perf_counter()
...: with controller.limit(limits=1, user_api='blas'):
...: _ = np.linalg.solve(a, b)
...: ed = time.perf_counter()
...: print(ed - st)
0.5034510000000019 In any case, I suspect the dramatic speedup I saw earlier in this thread and in #140 was more due to using the PyPI/OpenBlas numpy and the original
Here are the timings of calls to Here are my takeaways:
|
thanks a lot for this study @kanderso-nrel , I remember testing your PR locally back then before approving, and for me there was no slowdown but no crazy improvement either, so I assumed it was at least as good as the current implementation. I think your study can be quite useful for people potentially using |
Oftentimes it is desirable to neglect edge effects to model a row in the interior of a large array. Unfortunately, it seems that the runtime scaling with
n_pvrows
is quite bad -- I'm having trouble pinning down the degree of the asymptotic polynomial complexity (if it even is polynomial, might be factorial?), but it's certainly not linear or even quadratic:The good news is that going all the way to
n_pvrows > 30
is overkill for making edge effects negligible --n_pvrows=11
seems pretty good, at least for this array geometry:The bad news is that
n_pvrows=11
is still an order of magnitude slower thann_pvrows=3
, the current default in the pvlib wrapper function. The code for the above plots is available here: https://gist.github.com/kanderso-nrel/e88e3f7389b9d144a546dbe5651dfe1eI've not looked into how we might go about improving this situation. If I had to guess, it would require a pretty substantial refactor, if it's possible at all. It would be a pleasant surprise if that guess is incorrect :) But even if it can't be fixed, I think it's useful to have an issue documenting this effect.
The text was updated successfully, but these errors were encountered: