Long runtimes on Windows for code using parameter estimate #293

lbianchi-lbl · 2021-04-15T18:20:16Z

This was originally reported on #249, where we noticed that example notebooks in Tutorials/Advanced/ParamEst take approximately 10 times longer to complete on Windows than on Ubuntu. This might or might not be related to recent changes in Pyomo and/or PySP.

I've started to look into this in more detail using one of the ParamEst notebooks, parameter_estimation_NRTL_using_state_block_solution_testing. Here's a summary of what I found out:

I extracted the Python code in the notebook to a standalone file (parmest_state_block.py, attached inside parmest.zip)
Running the file as a script with python parmest_state_block.py on Ubuntu and Windows seems to confirm the fact that the runtime on Windows (~400 s) is much longer than on Ubuntu (~40 s)
- My testing on Windows was on a VM
- Comparing the outputs (ubuntu.log and win.log, attached), my untrained eye didn't spot any glaring differences between the two
- The output of pip list for each system is attached
Running the file through the standard cProfile compiler (python -m cProfile -o parmest.prof parmest_state_block.py) on the two OSes resulted in the two .prof files attached inside parmest.zip
- These can be visualized interactively with the snakeviz tool (pip install snakeviz && snakeviz parmest.prof)
- The information on Windows looks partial or incomplete (possibly because of differences in how the two OSes collect metrics from subprocesses?)
Finally, I ran the notebook manually on Windows, with similarly long runtimes. A PDF export of the finished output is attached (HTML version inside parmest.zip)
- To be able to run the notebook, I had to apply the fix needed for Jupyter on Python 3.8 documented in Document fix to Python 3.8 DLL error #258

parameter_estimation_NRTL_using_state_block_solution_testing.pdf
parmest.zip
pip-list-ubuntu.txt
pip-list-win.txt
ubuntu.log
win.log

The text was updated successfully, but these errors were encountered:

jsiirola · 2021-04-15T21:03:16Z

@lbianchi-lbl: Looking at the profile, can you try setting tee=False in the parmest.Estimator() call in cell 8 and see if that resolves the problem?

jghouse88 · 2021-04-15T21:27:40Z

One other thing I am thinking of is that I should probably remove the bootstrapping cell at the end. This leads to two problems: randomness and time required to converge. A simple fix will be to point to paramest docs about bootstrapping.

jsiirola · 2021-04-15T21:29:22Z

Actually, I am not sure that just changing tee will fix things. Can you try installing pywin32 on windows? That should enable Windows to use the same infrastructure that Linux uses. (win32pipe and win32file are available in conda, but not by default in vanilla python)

lbianchi-lbl · 2021-04-15T21:53:01Z

@jsiirola I've tried to set tee=False where you suggested, but it doesn't seem to have an effect. From pyomo/opt/solver/shellcmd.py I see that the subprocess call is still wrapped in TeeStream even if self._tee == False, but I don't know if joining the threads in TeeStream.close() is affected by whether ostreams includes sys.stdout or not.

pywin32 was already installed when I ran the tests. I had to downgrade the version to 225 following #258; I have no clue if using this version instead of a more recent one could be in some way related to this. I can try if I can update pywin32 (using a Python version other than 3.8) and see if it makes a difference.

jsiirola · 2021-04-15T21:55:34Z

@jghouse88: I am pretty sure that the problem is not the bootstrap per se. It is that the infrastructure around subprocess management on windows is less than efficient. Bootstrapping just highlights it because of the number of solves.

@lbianchi-lbl, I feel that the problem is really around a polling interval that we have to use on Windows (because windows lacks a select() that can be used on file descriptors). Can you try just shortening the polling interval with:

import pyomo.common.tee
pyomo.common.tee._poll_interval = 0.01

lbianchi-lbl · 2021-04-15T22:39:50Z

@jsiirola this makes sense, I think understand where that comes into play now. I tried the fix you suggested, which helped a bit (runtime is ~150 s). Lowering it to 0.001 doesn't seem to reduce it further though, so I believe that the issue might lie in the Windows-specific parts of the code that's running inside the TeeStream threads here. The default profiler only looks at code running in the main thread, so it's not able to show us what happens there. I'll try again with a third-party profiler that doesn't have this limitation and see if that helps us understand this a bit better.

jsiirola · 2021-04-15T22:56:19Z

@lbianchi-lbl Great! This is good progress (we are down to "only" 2x what I would consider the minimum (looking at the profiles, it looked like the solver was eating up ~65s, and there will always be Pyomo overhead). I will be interested in hearing what you discover. I was poking around Python's implementation of threading.py, and I think there are a couple changes that we can make to TeeStream to help it poll/clean up faster.

...but changing the default timeout to 0.01 upstream seems like something we should pursue. (it is used in two places: both for polling when processing the output from the solver, and when cleaning up. It would be interesting to see the relative impact those two uses have: do we need to shorten both, or just the one?)

lbianchi-lbl · 2021-08-12T19:01:40Z

I'll test this again but @jsiirola thinks this should be fixed on the Pyomo side.

lbianchi-lbl · 2021-09-02T19:34:23Z

@jsiirola I've managed to get some notebook runtime duration numbers from IDAES/examples-pse#64.

My very crude analysis consists of:

Compare runtimes of the same notebook (separately for each Python version) for Linux and Windows
To try to account for differences in environments where the notebooks are run for the two OSes, I normalized each duration by the shortest duration in the series: under the simplification that e.g. the VMs on which the Windows jobs are run are systematically slower than the Linux VMs by a constant factor, the normalized duration should be the same
Do a scatter plot of normalized duration (x for Linux, y for Windows), together with the y=x center line (corresponding to a duration ratio of 1)

From this analysis, it looks like the only significant difference (factor ~3) between Windows and Linux is in the degeneracy hunter notebook (https://github.com/IDAES/examples-pse/blob/main/src/Examples/Tools/degeneracy_hunter.ipynb); however, the absolute duration is small enough (<10 s) that it shouldn't negatively impact either interactive use or CI runs.

See also the interactive plot (should be a standalone HTML file, zipped because of GitHub restrictions on attachments: examples-durations.zip

TL;DR

From this analysis, the original problem that prompted opening this issue seems to be fixed, so, if you agree, we could close this
More in general, we could consider running similar analyses more systematically to detect any drastic changes in performance

jsiirola · 2021-09-02T23:13:36Z

This looks good. Thanks for digging into it!

adowling2 · 2021-09-03T01:30:21Z

This makes sense. The Degeneracy Hunter notebook has an example of a simple but poorly formulated optimization problem for which Ipopt has many line search evaluations (for good mathematical/numerical reasons). I would not be surprised if Ipopt behaved differently on Windows and Linux for this problem. I've observed exact same versions of NLP solvers give different answers across platform on the same problem. Anyways, just posting here to archive my thoughts.

lbianchi-lbl added the bug Something isn't working label Apr 15, 2021

dangunter assigned lbianchi-lbl and jghouse88 Apr 15, 2021

jghouse88 mentioned this issue Apr 15, 2021

Removing bootstrap from parmest notebooks IDAES/examples-pse#43

Merged

jsiirola mentioned this issue Apr 20, 2021

Timing improvements Pyomo/pyomo#1939

Merged

ksbeattie added the Priority:High High Priority Issue or PR label Apr 22, 2021

lbianchi-lbl mentioned this issue Sep 2, 2021

[DO NOT MERGE] Collect duration data IDAES/examples-pse#64

Draft

jsiirola closed this as completed Sep 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long runtimes on Windows for code using parameter estimate #293

Long runtimes on Windows for code using parameter estimate #293

lbianchi-lbl commented Apr 15, 2021

jsiirola commented Apr 15, 2021

jghouse88 commented Apr 15, 2021

jsiirola commented Apr 15, 2021

lbianchi-lbl commented Apr 15, 2021 •

edited

Loading

jsiirola commented Apr 15, 2021

lbianchi-lbl commented Apr 15, 2021 •

edited

Loading

jsiirola commented Apr 15, 2021

lbianchi-lbl commented Aug 12, 2021

lbianchi-lbl commented Sep 2, 2021 •

edited

Loading

jsiirola commented Sep 2, 2021

adowling2 commented Sep 3, 2021

Long runtimes on Windows for code using parameter estimate #293

Long runtimes on Windows for code using parameter estimate #293

Comments

lbianchi-lbl commented Apr 15, 2021

jsiirola commented Apr 15, 2021

jghouse88 commented Apr 15, 2021

jsiirola commented Apr 15, 2021

lbianchi-lbl commented Apr 15, 2021 • edited Loading

jsiirola commented Apr 15, 2021

lbianchi-lbl commented Apr 15, 2021 • edited Loading

jsiirola commented Apr 15, 2021

lbianchi-lbl commented Aug 12, 2021

lbianchi-lbl commented Sep 2, 2021 • edited Loading

TL;DR

jsiirola commented Sep 2, 2021

adowling2 commented Sep 3, 2021

lbianchi-lbl commented Apr 15, 2021 •

edited

Loading

lbianchi-lbl commented Apr 15, 2021 •

edited

Loading

lbianchi-lbl commented Sep 2, 2021 •

edited

Loading