-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Phase Change addition #179
Conversation
@JRChreim please resolve merge conflicts. |
@sbryngelson @JRChreim These conflicts are the result of me fixing a toolchain issue and sharing it with @JRChreim before it was merged. I would resolve them now but I don't see any Phase Change tests. Is this to be expected? I see some minor changes in test/case.py and test/cases.py but no new tests have been added. I also don't see any new example cases. |
@henryleberre @JRChreim Henry is right we need phase change tests in the CI. Perhaps Jose you can chat with Jean about how to create a test? Of course we can also help out more if there’s something unique about the updated toolchain causing problems. |
@sbryngelson @henryleberre. Yes, I agree they should be included and I will work on having those added. I will make sure to (i) talk to Jean about including those and (ii) include meaningful test cases for the purpose of CI |
@JRChreim sounds good - I am converting this to a "draft" PR for now, and you can convert it back or message me when it is seemingly ready to merge! |
@sbryngelson @henryleberre you should see a new commit that alters Thank you! |
@JRChreim You need to resolve your merge conflicts. Please discuss this with @js-spratt (@henryleberre is out on internship) |
@sbryngelson , my understanding was that the merge conflicts were a fix of the toolchain done by @henryleberre (please see the comments above), and that he would resolve them. Apparently, I would be missing the tests for the phase change merge, which are now included in cases.py |
Yes the toolchain was changed, but that is part of the PR merge process, and much of it shouldn't conflict with your code. @henryleberre correct me if I'm wrong. In fact much of the code has changed since you opened this PR (!). |
Makes sense. I am working on resolving the conflicts, then. Thank you for the help! |
The merging conflicts have been resolved, but because there have been changes to pre_process/p_main.f90, pre_process/m_start_up.fpp, simulationp_main.f90, and simulationm_start_up.fpp, possibly a few changes will have to be done for m_phase_change.f90 to work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on src/common/m_variables_conversion.fpp
, s_convert_primitive_to_conservative_variables
variable Rtmp has been declared twice, on lines 1002 and 1005
@sbryngelson @JRChreim mentioned this PR was ready for review! |
@JRChreim Can you squash the temporary/[no ci] commits into one non-[no ci] one? We will be able to see the results from the GPU tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could maybe consider running your phase change module through MFC's code formatter to keep the code consistent. We provide the ./mfc.sh format
command for this. You should probably only stage/commit your own module and make some manual style changes if you'd like.
The requests have been addressed, as per request |
Please add back the CI so the tests run |
@JRChreim you have more merge conflicts to resolve now, and I still haven't seen the CI run and pass tests yet. Once those two things are done we should be able to merge. Thanks for your patience. |
@sbryngelson, for some reason I cannot click on the "resolve conflcts" button to continue the merge. This is the message appearing to me: "Only those with write access to this repository can merge pull requests." could you help with that? With respect to the CI run and failing tests, I was expecting those to happen when "model_eqns==3" (as you are probably observing). I have already discussed this with @henryleberre; the reason was a typo on the internal energy equations for
|
@JRChreim, I don't understand. If the tests fail because of a code problem, why not fix the bug and create new goldenfiles ("correct" files)? Otherwise, you just add extra code and maintain a bug. To merge, please try Google or click the link that says
|
@sbryngelson what you describe is what I thought we had agreed on with @JRChreim: to regenerate the incorrect golden files once he ensured that the discrepancy was only due to that one change (by commenting out his fix and making sure the tests would pass). @JRChreim To resolve the merge conflicts on your system (instead of through GitHub), you can: $ git remote add upstream [email protected]:MFlowCode/MFC.git
$ git pull upstream master
... resolve conflicts
... push |
@JRChreim @henryleberre I don't recall this but it sounds like something I would say. Sounds good to me. |
Ok, thank you for the advice. So I will report this to Tim, so he is aware that I am devoting time to finish this pull request including the tests. At the moment, I am not being able to compile MFC (neither my fork, nor master) on Bridges2 and have momentarily lost access to Delta (due to issues with the duo authentication), so my only option is to compile it on Bridges2. I have requested help on slack, but unfortunately it did not help. I am waiting to see if I can get any feedback while I keep trying to find a solution. |
I am not sure it is meaningless. Part of the point of regression testing is the acceptance that we are not 100% sure what is meaningless, and when someone will introduce an unintended feature or bug that makes the test meaningful. For those reasons, we do regression testing as part of the test suite (all of it at the moment, actually). So I prefer 2, indeed. |
I replied to you with a fix within a few hours on Christmas day -- I hope this is sufficient for your purposes and it is the best I can do. I also believe it is unwise to accept untested code into this MFC instance. Let me know if you or Tim disagree, and we can discuss further as needed. Thanks! |
Great, I will continue working on this until I get the chance to talk to him. Thank you so much! |
Great, thank you for the update. I will proceed as requested |
@sbryngelson , I found the issue with the GPU version of the code, and it seemed a few variables were not being transferred to the GPUs, causing calculation errors. So this is completed, tested, and they work. In terms of the tests, an alternative would be to make the relaxation procedure less strict (i.e., increasing the under-relaxation factor for the Newton solvers used) such that the tests run faster. For this approach, however, I would have to regenerate the golden-files since changing the under-relaxation would alter the solution obtained (to decimal places). Let me know if that approach is fine so I can proceed with it. The phase-change tests should be much faster by doing so, and the solver capabilities are tested. Another approach would be to keep the code as is a decrease the number of time-steps for all tests. This does not seem to be as effective in terms of computation time. FYI, the phase change module is still under development while this pull request is merged, meaning it is possible that we will have to regenerate the golden files as changes continue to be incorporated in the code. As of the moment, there is not a validation case that we can use as a ground truth for these module. |
@sbryngelson how would you like me to report the performance tests? Would just informing the computation time sufficient, or other metrics are needed? Also, how many grid points/CPU and grid points/GPU are needed to perform these tests? |
@JRChreim I see. I think it is OK to limit the number of iterations for phase change during testing. This seems like a perfectly reasonable way to limit computational cost while indeed properly testing that the code is doing what is expected. Is the max number of iterations an input variable in I recommend making the 2D phase-change tests no more expensive than any of the 3D tests (or so). |
@JRChreim you can just post the simulation time results here for a 3D test case. Perhaps you can run a case on 1 GPU (I recommend 1 million grid points (or 100^3 domain) per 16GB of GPU RAM/memory) that has no phase change, then that same case with phase change again using 1 GPU on the same machine. I think I said something similar above. Repeat the same process with 1 CPU core. The other performance test that I think should be fine, but would like to see tested regardless is scaling. Do a "weak scaling" test for a 3D case, which means doubling the problem size (total number of grid points), doubling the number of GPUs (or CPU cores), and making sure that the simulation time stays the same. You can do this from 1 GPU and continue doubling until 16 or 32 GPUs or so. Same with CPU cores. |
- Added documentation on case.md - Added doxygen-style comments on m_phase_change.fpp - Altered the calculation of FT and dFdT so it is GPU-enabled - Altered pre_process/m_checker.f90, simulation/m_checker.fpp, pre_process/m_global_parameters.fpp, simulation/m_global_parameters.fpp such that palpha_eps and ptgalpha_eps are now dflt_real by default - regenerated golden files so that phase change tests are faster.
@sbryngelson , please see the outcome of the tests. I believe this should cover your needs: Details: this is an expansion tube (3D) simulation with and without phase-change. The initial condition consists of a velocity discontinuity at the center of the domain (-vel to the -x direction, vel to the +x direction), and as a consequence of the pressure decrease, phase change is activated. Note that because this is a simpler problem, its cost is not extravagant. But of course this is case dependent, as we solve as nonlinear system of equations for every cell at which phase-change is activated, so it is virtually impossible to get and exact cost for phase-change. I ran ~200 time steps per simulation reported, and collected the 'Final time' for each of them as, per previous discussions, means the time per step. The results indicate that the increase in computation time is not as intense for this problem, as phase-change is confined to the interface generated due to the velocity discontinuity. For both cases (with PC and without PC), there seems to be a slight increase in time per step as the number of elements increase, but keep in mind that --case-optimization was not on. Regardless, the increase is subtle. Also note that, for phase change on, an increase in simulation time IS EXPECTED with the resolution, because this means more elements are present in the expansion wave region, at which phase change happens for this problem. So, a higher number of cells are activated for phase change with increasing resolution. Also, note that, although not directly comparable, the time per step significantly decreases with the use of GPUs in comparison to CPUs. Take, for example: which is about 15 times faster by using a single GPU vs 8 CPUs (same problem, similar resolution) |
Thanks @JRChreim! I can finish reviewing this now, I think. One final performance test we need is a benchmark to make sure the non-phase-change case is still as fast as we expect and that you didn't accidentally introduce a slowdown to the non-phase-change code somehow. It's an easy one, fortunately. For this, you need to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbryngelson, everything looks good to me. I could not find any obvious issues! The documentation also renders fine.
@sbryngelson , I repeated the same test as the one for weak scaling, removing one fluid and toggling off phase-change (3D 2-fluid test case, without phase change, now): This is how I performed the test: To enter the node (GPU partition is not shared) "interact -p GPU --gres=gpu:v100-16:8 -t 01:00:00" modules loaded through "source ./mfc.sh load" options b and g Then, I finally did: ./mfc.sh run ../PerfTests/case-optimization/ETNRM6.py -t pre_process simulation --case-optimization -N 1 -n 1 -b mpirun so: Simulating a 100x100x100 case on 1 rank(s)
|
According to this document I would expect ~100K time steps/hr (with RK3), = 0.0036 seconds/step. It looks like you are getting 9500 steps/hr with 3.8s/step (about a factor of 10 too slow). I'm not quite sure what's going on here. Can you pull the current master branch to a separate directory, test it the same way you did above, and see what you get? If that is also the same speed, then it might have to do with how the case is created? Perhaps you can share the |
Hi @sbryngelson, sorry for the confusion, but let me know if that helps: It was mentioned that: But we have: Which is approximately what I am obtaining; at the end of the file, "Final Time 3.79.....E-002", the expected order, as shown above. Thus I believe the time per step is appropriate. Let me know if that makes sense |
Whoops, I did the calculation wrong. Ok, thanks. Can you add some 1D and possibly 2/3D cases to |
- added 2 examples for phase-change simualtions: 1D expansion tube, with three fluids 2D unidirectional shock-tube with a column of water in front of it, phase change activated - changed tol = 5e-7 to tol = 1e-10 as per request
@JRChreim will merge once tests finish running |
I added two examples: 1D expansion tube, with three fluids: 1D_exp_tube_phasechange |
That's a win @sbryngelson and @henryleberre ! Thank you for the help in this journey XD |
Changes to the MFC master branch to include the phase change module