Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long runtimes on Windows for code using parameter estimate #293

Closed
lbianchi-lbl opened this issue Apr 15, 2021 · 11 comments
Closed

Long runtimes on Windows for code using parameter estimate #293

lbianchi-lbl opened this issue Apr 15, 2021 · 11 comments
Assignees
Labels
bug Something isn't working Priority:High High Priority Issue or PR

Comments

@lbianchi-lbl
Copy link
Contributor

This was originally reported on #249, where we noticed that example notebooks in Tutorials/Advanced/ParamEst take approximately 10 times longer to complete on Windows than on Ubuntu. This might or might not be related to recent changes in Pyomo and/or PySP.

I've started to look into this in more detail using one of the ParamEst notebooks, parameter_estimation_NRTL_using_state_block_solution_testing. Here's a summary of what I found out:

  • I extracted the Python code in the notebook to a standalone file (parmest_state_block.py, attached inside parmest.zip)
  • Running the file as a script with python parmest_state_block.py on Ubuntu and Windows seems to confirm the fact that the runtime on Windows (~400 s) is much longer than on Ubuntu (~40 s)
    • My testing on Windows was on a VM
    • Comparing the outputs (ubuntu.log and win.log, attached), my untrained eye didn't spot any glaring differences between the two
    • The output of pip list for each system is attached
  • Running the file through the standard cProfile compiler (python -m cProfile -o parmest.prof parmest_state_block.py) on the two OSes resulted in the two .prof files attached inside parmest.zip
    • These can be visualized interactively with the snakeviz tool (pip install snakeviz && snakeviz parmest.prof)
    • The information on Windows looks partial or incomplete (possibly because of differences in how the two OSes collect metrics from subprocesses?)
  • Finally, I ran the notebook manually on Windows, with similarly long runtimes. A PDF export of the finished output is attached (HTML version inside parmest.zip)

parameter_estimation_NRTL_using_state_block_solution_testing.pdf
parmest.zip
pip-list-ubuntu.txt
pip-list-win.txt
ubuntu.log
win.log

@lbianchi-lbl lbianchi-lbl added the bug Something isn't working label Apr 15, 2021
@jsiirola
Copy link
Contributor

@lbianchi-lbl: Looking at the profile, can you try setting tee=False in the parmest.Estimator() call in cell 8 and see if that resolves the problem?

@jghouse88
Copy link
Member

One other thing I am thinking of is that I should probably remove the bootstrapping cell at the end. This leads to two problems: randomness and time required to converge. A simple fix will be to point to paramest docs about bootstrapping.

@jsiirola
Copy link
Contributor

Actually, I am not sure that just changing tee will fix things. Can you try installing pywin32 on windows? That should enable Windows to use the same infrastructure that Linux uses. (win32pipe and win32file are available in conda, but not by default in vanilla python)

@lbianchi-lbl
Copy link
Contributor Author

lbianchi-lbl commented Apr 15, 2021

@jsiirola I've tried to set tee=False where you suggested, but it doesn't seem to have an effect. From pyomo/opt/solver/shellcmd.py I see that the subprocess call is still wrapped in TeeStream even if self._tee == False, but I don't know if joining the threads in TeeStream.close() is affected by whether ostreams includes sys.stdout or not.

pywin32 was already installed when I ran the tests. I had to downgrade the version to 225 following #258; I have no clue if using this version instead of a more recent one could be in some way related to this. I can try if I can update pywin32 (using a Python version other than 3.8) and see if it makes a difference.

@jsiirola
Copy link
Contributor

@jghouse88: I am pretty sure that the problem is not the bootstrap per se. It is that the infrastructure around subprocess management on windows is less than efficient. Bootstrapping just highlights it because of the number of solves.

@lbianchi-lbl, I feel that the problem is really around a polling interval that we have to use on Windows (because windows lacks a select() that can be used on file descriptors). Can you try just shortening the polling interval with:

import pyomo.common.tee
pyomo.common.tee._poll_interval = 0.01

@lbianchi-lbl
Copy link
Contributor Author

lbianchi-lbl commented Apr 15, 2021

@jsiirola this makes sense, I think understand where that comes into play now. I tried the fix you suggested, which helped a bit (runtime is ~150 s). Lowering it to 0.001 doesn't seem to reduce it further though, so I believe that the issue might lie in the Windows-specific parts of the code that's running inside the TeeStream threads here. The default profiler only looks at code running in the main thread, so it's not able to show us what happens there. I'll try again with a third-party profiler that doesn't have this limitation and see if that helps us understand this a bit better.

@jsiirola
Copy link
Contributor

@lbianchi-lbl Great! This is good progress (we are down to "only" 2x what I would consider the minimum (looking at the profiles, it looked like the solver was eating up ~65s, and there will always be Pyomo overhead). I will be interested in hearing what you discover. I was poking around Python's implementation of threading.py, and I think there are a couple changes that we can make to TeeStream to help it poll/clean up faster.

...but changing the default timeout to 0.01 upstream seems like something we should pursue. (it is used in two places: both for polling when processing the output from the solver, and when cleaning up. It would be interesting to see the relative impact those two uses have: do we need to shorten both, or just the one?)

@ksbeattie ksbeattie added the Priority:High High Priority Issue or PR label Apr 22, 2021
@lbianchi-lbl
Copy link
Contributor Author

I'll test this again but @jsiirola thinks this should be fixed on the Pyomo side.

@lbianchi-lbl
Copy link
Contributor Author

lbianchi-lbl commented Sep 2, 2021

@jsiirola I've managed to get some notebook runtime duration numbers from IDAES/examples-pse#64.


My very crude analysis consists of:

  • Compare runtimes of the same notebook (separately for each Python version) for Linux and Windows
  • To try to account for differences in environments where the notebooks are run for the two OSes, I normalized each duration by the shortest duration in the series: under the simplification that e.g. the VMs on which the Windows jobs are run are systematically slower than the Linux VMs by a constant factor, the normalized duration should be the same
  • Do a scatter plot of normalized duration (x for Linux, y for Windows), together with the y=x center line (corresponding to a duration ratio of 1)

examples-durations

From this analysis, it looks like the only significant difference (factor ~3) between Windows and Linux is in the degeneracy hunter notebook (https://github.com/IDAES/examples-pse/blob/main/src/Examples/Tools/degeneracy_hunter.ipynb); however, the absolute duration is small enough (<10 s) that it shouldn't negatively impact either interactive use or CI runs.

See also the interactive plot (should be a standalone HTML file, zipped because of GitHub restrictions on attachments: examples-durations.zip

TL;DR

  • From this analysis, the original problem that prompted opening this issue seems to be fixed, so, if you agree, we could close this
  • More in general, we could consider running similar analyses more systematically to detect any drastic changes in performance

@jsiirola
Copy link
Contributor

jsiirola commented Sep 2, 2021

This looks good. Thanks for digging into it!

@jsiirola jsiirola closed this as completed Sep 2, 2021
@adowling2
Copy link
Contributor

This makes sense. The Degeneracy Hunter notebook has an example of a simple but poorly formulated optimization problem for which Ipopt has many line search evaluations (for good mathematical/numerical reasons). I would not be surprised if Ipopt behaved differently on Windows and Linux for this problem. I've observed exact same versions of NLP solvers give different answers across platform on the same problem. Anyways, just posting here to archive my thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Priority:High High Priority Issue or PR
Projects
None yet
Development

No branches or pull requests

5 participants