Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phase Change addition #179

Merged
merged 66 commits into from
Jan 4, 2024
Merged

Phase Change addition #179

merged 66 commits into from
Jan 4, 2024

Conversation

JRChreim
Copy link
Contributor

Changes to the MFC master branch to include the phase change module

@sbryngelson
Copy link
Member

@JRChreim please resolve merge conflicts.

@henryleberre
Copy link
Member

@sbryngelson @JRChreim These conflicts are the result of me fixing a toolchain issue and sharing it with @JRChreim before it was merged. I would resolve them now but I don't see any Phase Change tests. Is this to be expected? I see some minor changes in test/case.py and test/cases.py but no new tests have been added. I also don't see any new example cases.

@sbryngelson
Copy link
Member

sbryngelson commented Jun 29, 2023

@henryleberre @JRChreim Henry is right we need phase change tests in the CI. Perhaps Jose you can chat with Jean about how to create a test? Of course we can also help out more if there’s something unique about the updated toolchain causing problems.

@JRChreim
Copy link
Contributor Author

@sbryngelson @henryleberre. Yes, I agree they should be included and I will work on having those added. I will make sure to (i) talk to Jean about including those and (ii) include meaningful test cases for the purpose of CI

@sbryngelson sbryngelson marked this pull request as draft June 30, 2023 00:20
@sbryngelson
Copy link
Member

@JRChreim sounds good - I am converting this to a "draft" PR for now, and you can convert it back or message me when it is seemingly ready to merge!

@JRChreim JRChreim marked this pull request as ready for review July 18, 2023 06:26
@JRChreim
Copy link
Contributor Author

@sbryngelson @henryleberre you should see a new commit that alters m_phasechange.f90 and cases.py, the latter including the tests for phase change. Please let me know if there is anything else you need (or if I should provide the respective golden files for the new tests).

Thank you!

@sbryngelson
Copy link
Member

@JRChreim You need to resolve your merge conflicts. Please discuss this with @js-spratt (@henryleberre is out on internship)

@JRChreim
Copy link
Contributor Author

@sbryngelson , my understanding was that the merge conflicts were a fix of the toolchain done by @henryleberre (please see the comments above), and that he would resolve them. Apparently, I would be missing the tests for the phase change merge, which are now included in cases.py

@sbryngelson
Copy link
Member

Yes the toolchain was changed, but that is part of the PR merge process, and much of it shouldn't conflict with your code. @henryleberre correct me if I'm wrong. In fact much of the code has changed since you opened this PR (!).

@JRChreim
Copy link
Contributor Author

Makes sense. I am working on resolving the conflicts, then. Thank you for the help!

@JRChreim
Copy link
Contributor Author

The merging conflicts have been resolved, but because there have been changes to pre_process/p_main.f90, pre_process/m_start_up.fpp, simulationp_main.f90, and simulationm_start_up.fpp, possibly a few changes will have to be done for m_phase_change.f90 to work

Copy link
Contributor Author

@JRChreim JRChreim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on src/common/m_variables_conversion.fpp, s_convert_primitive_to_conservative_variables

variable Rtmp has been declared twice, on lines 1002 and 1005

@sbryngelson sbryngelson added the enhancement New feature or request label Jul 19, 2023
toolchain/mfc/test/cases.py Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
mfc.sh Outdated Show resolved Hide resolved
mfc.sh Outdated Show resolved Hide resolved
misc/run-phoenix-release-gpu.sh Outdated Show resolved Hide resolved
@henryleberre
Copy link
Member

@sbryngelson @JRChreim mentioned this PR was ready for review!

@henryleberre
Copy link
Member

@JRChreim Can you squash the temporary/[no ci] commits into one non-[no ci] one? We will be able to see the results from the GPU tests.

Copy link
Member

@henryleberre henryleberre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could maybe consider running your phase change module through MFC's code formatter to keep the code consistent. We provide the ./mfc.sh format command for this. You should probably only stage/commit your own module and make some manual style changes if you'd like.

src/common/m_phase_change.f90 Outdated Show resolved Hide resolved
@JRChreim
Copy link
Contributor Author

The requests have been addressed, as per request

@sbryngelson
Copy link
Member

Please add back the CI so the tests run

@sbryngelson
Copy link
Member

@JRChreim you have more merge conflicts to resolve now, and I still haven't seen the CI run and pass tests yet. Once those two things are done we should be able to merge. Thanks for your patience.

@JRChreim
Copy link
Contributor Author

@sbryngelson, for some reason I cannot click on the "resolve conflcts" button to continue the merge. This is the message appearing to me:

"Only those with write access to this repository can merge pull requests."

could you help with that?

With respect to the CI run and failing tests, I was expecting those to happen when "model_eqns==3" (as you are probably observing). I have already discussed this with @henryleberre; the reason was a typo on the internal energy equations for m_variables_conversion.fpp. This is the comment left on lines 1049 to 1056 of this module:


! uncomment this DO LOOP to satisfy the failing tests. Note, however, that
                        ! this expression for the internal energy is incorrect. The expression below
                        ! this one should be the correct one
                        ! do i = internalEnergies_idx%beg, internalEnergies_idx%end
                        !     q_cons_vf(i)%sf(j, k, l) = q_cons_vf(i - adv_idx%end)%sf(j, k, l)* &
                        !                                fluid_pp(i - adv_idx%end)%gamma* &
                        !                                q_prim_vf(E_idx)%sf(j, k, l) + &
                        !                                fluid_pp(i - adv_idx%end)%pi_inf

@sbryngelson
Copy link
Member

sbryngelson commented Jul 25, 2023

@JRChreim, I don't understand. If the tests fail because of a code problem, why not fix the bug and create new goldenfiles ("correct" files)? Otherwise, you just add extra code and maintain a bug.

To merge, please try Google or click the link that says

This branch has conflicts that must be resolved.
Use the command line to resolve conflicts before continuing.

@henryleberre
Copy link
Member

@sbryngelson what you describe is what I thought we had agreed on with @JRChreim: to regenerate the incorrect golden files once he ensured that the discrepancy was only due to that one change (by commenting out his fix and making sure the tests would pass). @JRChreim To resolve the merge conflicts on your system (instead of through GitHub), you can:

$ git remote add upstream [email protected]:MFlowCode/MFC.git
$ git pull upstream master
... resolve conflicts
... push

@sbryngelson
Copy link
Member

sbryngelson commented Jul 25, 2023

@sbryngelson what you describe is what I thought we had agreed on with @JRChreim: to regenerate the incorrect golden files once he ensured that the discrepancy was only due to that one change (by commenting out his fix and making sure the tests would pass). @JRChreim To resolve the merge conflicts on your system (instead of through GitHub), you can:

$ git remote add upstream [email protected]:MFlowCode/MFC.git
$ git pull upstream master
... resolve conflicts
... push

@JRChreim @henryleberre I don't recall this but it sounds like something I would say. Sounds good to me.

@JRChreim
Copy link
Contributor Author

I left some comments @JRChreim . I have some other questions/requests.

* Add documentation in the `docs/documentation` in the appropriate place for your new variables and features

* What is the performance of this subroutine on GPU? CPU?
  
  * Is it what you expect?
    
    * Can you run a case with and without phase change on GPUs and see how it does? You can make that max iterations 1 at first so that the phase change should be "free," computational-cost-wise.
  * What about parallel scaling, does this scale to many GPUs or many GPUs?

The documentation will be added now.
For the tests, I can run a case with and without phase change to test the module, with a maximum single time-step. Could you please provide me with further information about how you want this test to be done? The same question is valid for the case of parallel scaling.
Regardless, I understand that the performance tests do not prevent the pull request to be approved. So, I do not see any reason they cannot be done concurrently. Blocking merging because of the performance results seems to be a drawback since the code has been frequently changed. I am afraid that if I spend time to do the tests and the merge stays blocked, the code will change again and I will have to rebase it another time. And everytime I try to rebase/merge the code, new requests are made (creating tests, documenting on doxygen, adding documentation in the docs/documentation, GPU/CPU performance tests, just a few to be mentioned). I understand these are important for code development, and I appreciate the effort in doing this, but most of them will most likely not add anything to my Ph.D. thesis (at least for the moment), and this is also something I believe Professor Tim do not want me to spend much time, as we have other priorities too. So, I insist that it is better to make sure this PR gets merged WHILE I can perform the tests and report back sop I do not have to re-do work one more time.

Please test your code on Bridges2 or Delta on GPUs (V100 or A100) and record the performance as compared to the same case without phase change. Also run the case on both 1 GPU and several (perhaps 32 or so). When you do this, does the cost of the simulation get cut by a factor of 32? Do the same with CPU simulations (here 1 core and 32 cores across 2 or 4 nodes). I recommend doing this for both a 1D and 3D case.

Regarding "blocking merging." Merging new code into your should be very quick and straightforward if you tend to your code regularly (once every few weeks). The reason you have had challenges is that this PR goes several months without attention. If you follow this standard (which I recommend strongly), then you will never have to worry about such stress 😃

I left some comments @JRChreim . I have some other questions/requests.

* Add documentation in the `docs/documentation` in the appropriate place for your new variables and features

* What is the performance of this subroutine on GPU? CPU?
  
  * Is it what you expect?
    
    * Can you run a case with and without phase change on GPUs and see how it does? You can make that max iterations 1 at first so that the phase change should be "free," computational-cost-wise.
  * What about parallel scaling, does this scale to many GPUs or many GPUs?

The documentation will be added now.
For the tests, I can run a case with and without phase change to test the module, with a maximum single time-step. Could you please provide me with further information about how you want this test to be done? The same question is valid for the case of parallel scaling.
Regardless, I understand that the performance tests do not prevent the pull request to be approved. So, I do not see any reason they cannot be done concurrently. Blocking merging because of the performance results seems to be a drawback since the code has been frequently changed. I am afraid that if I spend time to do the tests and the merge stays blocked, the code will change again and I will have to rebase it another time. And everytime I try to rebase/merge the code, new requests are made (creating tests, documenting on doxygen, adding documentation in the docs/documentation, GPU/CPU performance tests, just a few to be mentioned). I understand these are important for code development, and I appreciate the effort in doing this, but most of them will most likely not add anything to my Ph.D. thesis (at least for the moment), and this is also something I believe Professor Tim do not want me to spend much time, as we have other priorities too. So, I insist that it is better to make sure this PR gets merged WHILE I can perform the tests and report back sop I do not have to re-do work one more time.

Please test your code on Bridges2 or Delta on GPUs (V100 or A100) and record the performance as compared to the same case without phase change. Also run the case on both 1 GPU and several (perhaps 32 or so). When you do this, does the cost of the simulation get cut by a factor of 32? Do the same with CPU simulations (here 1 core and 32 cores across 2 or 4 nodes). I recommend doing this for both a 1D and 3D case.

Regarding "blocking merging." Merging new code into your should be very quick and straightforward if you tend to your code regularly (once every few weeks). The reason you have had challenges is that this PR goes several months without attention. If you follow this standard (which I recommend strongly), then you will never have to worry about such stress 😃

Ok, thank you for the advice. So I will report this to Tim, so he is aware that I am devoting time to finish this pull request including the tests. At the moment, I am not being able to compile MFC (neither my fork, nor master) on Bridges2 and have momentarily lost access to Delta (due to issues with the duo authentication), so my only option is to compile it on Bridges2. I have requested help on slack, but unfortunately it did not help. I am waiting to see if I can get any feedback while I keep trying to find a solution.

@sbryngelson
Copy link
Member

Your CI tests on Phoenix CPU and GPU are failing, do you know why?
Also, your test cases take very long to run, can you make the 3D ones smaller/shorter somehow?

No, I don't know the reason. I checked the golden files on my local copy, and I did find the value printed in the output, which could indicate that the golden files should be the same. I believe it is something particular to Phoenix, as for all the other tests the results were successful, and they were successful on my local computer as well. I have not introduced changes to the code other than the requested changes about commenting the code (and deleting the #ifndef MFC_OpenACC statements; but they did not do any calculations, they would just break the code in case errors were found). So, phase change alone should not have altered the results.
No, I cannot make the 3D simulations smaller/shorter, I have carefully chosen these tests because they thoroughly test the code in all its aspects. They, however, are NOT needed: the module is agnostic to the dimensionality of the problem, so I do not see any problems in keeping just the four 1D phase-change problems (A83CADB5, 9EF19F0A, 842C6FFC, 786DE444). That should be sufficient to test the module.
Can you not just use fewer time steps perhaps?

Yes please use fewer time steps. When you run your tests they should not be much longer than the other tests we have.

How about keeping the 1D cases only. As mentioned, I don't believe this will be a drawback, while it could significantly decrease the time

Since this is just a source term relaxed at the end of each time step this makes sense. Please keep the 2D cases, though.

The 3D cases only have pT-relaxation activated, which are not as time-consuming. The most time-consuming cases are the ones with pTg-relaxation equilibrium, and there are just 1D and 2D cases. @henryleberre and I have already discussed this, and already reduced the number of tests. There are two options available:

  1. Keep only the 1D cases, which I believe it is the best approach.
  2. Keep the 2D cases and use fewer time-steps, as suggested. This will potentially not test the 2D case thoroughly, which makes it meaningless to keep, in my opinion. So I would not recommend this.

Please, let me know if 2. is still the preference, so I can move futher

I am not sure it is meaningless. Part of the point of regression testing is the acceptance that we are not 100% sure what is meaningless, and when someone will introduce an unintended feature or bug that makes the test meaningful. For those reasons, we do regression testing as part of the test suite (all of it at the moment, actually). So I prefer 2, indeed.

@sbryngelson
Copy link
Member

I left some comments @JRChreim . I have some other questions/requests.

* Add documentation in the `docs/documentation` in the appropriate place for your new variables and features

* What is the performance of this subroutine on GPU? CPU?
  
  * Is it what you expect?
    
    * Can you run a case with and without phase change on GPUs and see how it does? You can make that max iterations 1 at first so that the phase change should be "free," computational-cost-wise.
  * What about parallel scaling, does this scale to many GPUs or many GPUs?

The documentation will be added now.
For the tests, I can run a case with and without phase change to test the module, with a maximum single time-step. Could you please provide me with further information about how you want this test to be done? The same question is valid for the case of parallel scaling.
Regardless, I understand that the performance tests do not prevent the pull request to be approved. So, I do not see any reason they cannot be done concurrently. Blocking merging because of the performance results seems to be a drawback since the code has been frequently changed. I am afraid that if I spend time to do the tests and the merge stays blocked, the code will change again and I will have to rebase it another time. And everytime I try to rebase/merge the code, new requests are made (creating tests, documenting on doxygen, adding documentation in the docs/documentation, GPU/CPU performance tests, just a few to be mentioned). I understand these are important for code development, and I appreciate the effort in doing this, but most of them will most likely not add anything to my Ph.D. thesis (at least for the moment), and this is also something I believe Professor Tim do not want me to spend much time, as we have other priorities too. So, I insist that it is better to make sure this PR gets merged WHILE I can perform the tests and report back sop I do not have to re-do work one more time.

Please test your code on Bridges2 or Delta on GPUs (V100 or A100) and record the performance as compared to the same case without phase change. Also run the case on both 1 GPU and several (perhaps 32 or so). When you do this, does the cost of the simulation get cut by a factor of 32? Do the same with CPU simulations (here 1 core and 32 cores across 2 or 4 nodes). I recommend doing this for both a 1D and 3D case.
Regarding "blocking merging." Merging new code into your should be very quick and straightforward if you tend to your code regularly (once every few weeks). The reason you have had challenges is that this PR goes several months without attention. If you follow this standard (which I recommend strongly), then you will never have to worry about such stress 😃

I left some comments @JRChreim . I have some other questions/requests.

* Add documentation in the `docs/documentation` in the appropriate place for your new variables and features

* What is the performance of this subroutine on GPU? CPU?
  
  * Is it what you expect?
    
    * Can you run a case with and without phase change on GPUs and see how it does? You can make that max iterations 1 at first so that the phase change should be "free," computational-cost-wise.
  * What about parallel scaling, does this scale to many GPUs or many GPUs?

The documentation will be added now.
For the tests, I can run a case with and without phase change to test the module, with a maximum single time-step. Could you please provide me with further information about how you want this test to be done? The same question is valid for the case of parallel scaling.
Regardless, I understand that the performance tests do not prevent the pull request to be approved. So, I do not see any reason they cannot be done concurrently. Blocking merging because of the performance results seems to be a drawback since the code has been frequently changed. I am afraid that if I spend time to do the tests and the merge stays blocked, the code will change again and I will have to rebase it another time. And everytime I try to rebase/merge the code, new requests are made (creating tests, documenting on doxygen, adding documentation in the docs/documentation, GPU/CPU performance tests, just a few to be mentioned). I understand these are important for code development, and I appreciate the effort in doing this, but most of them will most likely not add anything to my Ph.D. thesis (at least for the moment), and this is also something I believe Professor Tim do not want me to spend much time, as we have other priorities too. So, I insist that it is better to make sure this PR gets merged WHILE I can perform the tests and report back sop I do not have to re-do work one more time.

Please test your code on Bridges2 or Delta on GPUs (V100 or A100) and record the performance as compared to the same case without phase change. Also run the case on both 1 GPU and several (perhaps 32 or so). When you do this, does the cost of the simulation get cut by a factor of 32? Do the same with CPU simulations (here 1 core and 32 cores across 2 or 4 nodes). I recommend doing this for both a 1D and 3D case.
Regarding "blocking merging." Merging new code into your should be very quick and straightforward if you tend to your code regularly (once every few weeks). The reason you have had challenges is that this PR goes several months without attention. If you follow this standard (which I recommend strongly), then you will never have to worry about such stress 😃

Ok, thank you for the advice. So I will report this to Tim, so he is aware that I am devoting time to finish this pull request including the tests. At the moment, I am not being able to compile MFC (neither my fork, nor master) on Bridges2 and have momentarily lost access to Delta (due to issues with the duo authentication), so my only option is to compile it on Bridges2. I have requested help on slack, but unfortunately it did not help. I am waiting to see if I can get any feedback while I keep trying to find a solution.

I replied to you with a fix within a few hours on Christmas day -- I hope this is sufficient for your purposes and it is the best I can do. I also believe it is unwise to accept untested code into this MFC instance. Let me know if you or Tim disagree, and we can discuss further as needed. Thanks!

@JRChreim
Copy link
Contributor Author

I left some comments @JRChreim . I have some other questions/requests.

* Add documentation in the `docs/documentation` in the appropriate place for your new variables and features

* What is the performance of this subroutine on GPU? CPU?
  
  * Is it what you expect?
    
    * Can you run a case with and without phase change on GPUs and see how it does? You can make that max iterations 1 at first so that the phase change should be "free," computational-cost-wise.
  * What about parallel scaling, does this scale to many GPUs or many GPUs?

The documentation will be added now.
For the tests, I can run a case with and without phase change to test the module, with a maximum single time-step. Could you please provide me with further information about how you want this test to be done? The same question is valid for the case of parallel scaling.
Regardless, I understand that the performance tests do not prevent the pull request to be approved. So, I do not see any reason they cannot be done concurrently. Blocking merging because of the performance results seems to be a drawback since the code has been frequently changed. I am afraid that if I spend time to do the tests and the merge stays blocked, the code will change again and I will have to rebase it another time. And everytime I try to rebase/merge the code, new requests are made (creating tests, documenting on doxygen, adding documentation in the docs/documentation, GPU/CPU performance tests, just a few to be mentioned). I understand these are important for code development, and I appreciate the effort in doing this, but most of them will most likely not add anything to my Ph.D. thesis (at least for the moment), and this is also something I believe Professor Tim do not want me to spend much time, as we have other priorities too. So, I insist that it is better to make sure this PR gets merged WHILE I can perform the tests and report back sop I do not have to re-do work one more time.

Please test your code on Bridges2 or Delta on GPUs (V100 or A100) and record the performance as compared to the same case without phase change. Also run the case on both 1 GPU and several (perhaps 32 or so). When you do this, does the cost of the simulation get cut by a factor of 32? Do the same with CPU simulations (here 1 core and 32 cores across 2 or 4 nodes). I recommend doing this for both a 1D and 3D case.
Regarding "blocking merging." Merging new code into your should be very quick and straightforward if you tend to your code regularly (once every few weeks). The reason you have had challenges is that this PR goes several months without attention. If you follow this standard (which I recommend strongly), then you will never have to worry about such stress 😃

I left some comments @JRChreim . I have some other questions/requests.

* Add documentation in the `docs/documentation` in the appropriate place for your new variables and features

* What is the performance of this subroutine on GPU? CPU?
  
  * Is it what you expect?
    
    * Can you run a case with and without phase change on GPUs and see how it does? You can make that max iterations 1 at first so that the phase change should be "free," computational-cost-wise.
  * What about parallel scaling, does this scale to many GPUs or many GPUs?

The documentation will be added now.
For the tests, I can run a case with and without phase change to test the module, with a maximum single time-step. Could you please provide me with further information about how you want this test to be done? The same question is valid for the case of parallel scaling.
Regardless, I understand that the performance tests do not prevent the pull request to be approved. So, I do not see any reason they cannot be done concurrently. Blocking merging because of the performance results seems to be a drawback since the code has been frequently changed. I am afraid that if I spend time to do the tests and the merge stays blocked, the code will change again and I will have to rebase it another time. And everytime I try to rebase/merge the code, new requests are made (creating tests, documenting on doxygen, adding documentation in the docs/documentation, GPU/CPU performance tests, just a few to be mentioned). I understand these are important for code development, and I appreciate the effort in doing this, but most of them will most likely not add anything to my Ph.D. thesis (at least for the moment), and this is also something I believe Professor Tim do not want me to spend much time, as we have other priorities too. So, I insist that it is better to make sure this PR gets merged WHILE I can perform the tests and report back sop I do not have to re-do work one more time.

Please test your code on Bridges2 or Delta on GPUs (V100 or A100) and record the performance as compared to the same case without phase change. Also run the case on both 1 GPU and several (perhaps 32 or so). When you do this, does the cost of the simulation get cut by a factor of 32? Do the same with CPU simulations (here 1 core and 32 cores across 2 or 4 nodes). I recommend doing this for both a 1D and 3D case.
Regarding "blocking merging." Merging new code into your should be very quick and straightforward if you tend to your code regularly (once every few weeks). The reason you have had challenges is that this PR goes several months without attention. If you follow this standard (which I recommend strongly), then you will never have to worry about such stress 😃

Ok, thank you for the advice. So I will report this to Tim, so he is aware that I am devoting time to finish this pull request including the tests. At the moment, I am not being able to compile MFC (neither my fork, nor master) on Bridges2 and have momentarily lost access to Delta (due to issues with the duo authentication), so my only option is to compile it on Bridges2. I have requested help on slack, but unfortunately it did not help. I am waiting to see if I can get any feedback while I keep trying to find a solution.

I replied to you with a fix within a few hours on Christmas day -- I hope this is sufficient for your purposes and it is the best I can do. I also believe it is unwise to accept untested code into this MFC instance. Let me know if you or Tim disagree, and we can discuss further as needed. Thanks!

Great, I will continue working on this until I get the chance to talk to him. Thank you so much!

@JRChreim
Copy link
Contributor Author

Your CI tests on Phoenix CPU and GPU are failing, do you know why?
Also, your test cases take very long to run, can you make the 3D ones smaller/shorter somehow?

No, I don't know the reason. I checked the golden files on my local copy, and I did find the value printed in the output, which could indicate that the golden files should be the same. I believe it is something particular to Phoenix, as for all the other tests the results were successful, and they were successful on my local computer as well. I have not introduced changes to the code other than the requested changes about commenting the code (and deleting the #ifndef MFC_OpenACC statements; but they did not do any calculations, they would just break the code in case errors were found). So, phase change alone should not have altered the results.
No, I cannot make the 3D simulations smaller/shorter, I have carefully chosen these tests because they thoroughly test the code in all its aspects. They, however, are NOT needed: the module is agnostic to the dimensionality of the problem, so I do not see any problems in keeping just the four 1D phase-change problems (A83CADB5, 9EF19F0A, 842C6FFC, 786DE444). That should be sufficient to test the module.
Can you not just use fewer time steps perhaps?

Yes please use fewer time steps. When you run your tests they should not be much longer than the other tests we have.

How about keeping the 1D cases only. As mentioned, I don't believe this will be a drawback, while it could significantly decrease the time

Since this is just a source term relaxed at the end of each time step this makes sense. Please keep the 2D cases, though.

The 3D cases only have pT-relaxation activated, which are not as time-consuming. The most time-consuming cases are the ones with pTg-relaxation equilibrium, and there are just 1D and 2D cases. @henryleberre and I have already discussed this, and already reduced the number of tests. There are two options available:

  1. Keep only the 1D cases, which I believe it is the best approach.
  2. Keep the 2D cases and use fewer time-steps, as suggested. This will potentially not test the 2D case thoroughly, which makes it meaningless to keep, in my opinion. So I would not recommend this.

Please, let me know if 2. is still the preference, so I can move futher

I am not sure it is meaningless. Part of the point of regression testing is the acceptance that we are not 100% sure what is meaningless, and when someone will introduce an unintended feature or bug that makes the test meaningful. For those reasons, we do regression testing as part of the test suite (all of it at the moment, actually). So I prefer 2, indeed.

Great, thank you for the update. I will proceed as requested

@JRChreim
Copy link
Contributor Author

JRChreim commented Dec 26, 2023

@JRChreim the NVHPC CPU runner is failing because it runs out of time on the Slurm job. You can (for now) increase this by changing the requested time for the slurm job, which is here https://github.com/JRChreim/MFC-JRChreim/blob/a4b11e4529609c2d2becdf791a73bafccd3be121/misc/run-phoenix-release-cpu.sh#L6

to 4 hours or so. (ultimately we want this process to not take 4 hours!)

The GPU runner is failing in a more legitimate way, suggesting that the code is incorrect. Does it pass tests on GPUs when you run it yourself? If so, on what cluster?

@sbryngelson , I found the issue with the GPU version of the code, and it seemed a few variables were not being transferred to the GPUs, causing calculation errors. So this is completed, tested, and they work. In terms of the tests, an alternative would be to make the relaxation procedure less strict (i.e., increasing the under-relaxation factor for the Newton solvers used) such that the tests run faster. For this approach, however, I would have to regenerate the golden-files since changing the under-relaxation would alter the solution obtained (to decimal places). Let me know if that approach is fine so I can proceed with it. The phase-change tests should be much faster by doing so, and the solver capabilities are tested.

Another approach would be to keep the code as is a decrease the number of time-steps for all tests. This does not seem to be as effective in terms of computation time.

FYI, the phase change module is still under development while this pull request is merged, meaning it is possible that we will have to regenerate the golden files as changes continue to be incorporated in the code. As of the moment, there is not a validation case that we can use as a ground truth for these module.

@JRChreim
Copy link
Contributor Author

JRChreim commented Dec 26, 2023

Your CI tests on Phoenix CPU and GPU are failing, do you know why?
Also, your test cases take very long to run, can you make the 3D ones smaller/shorter somehow?

No, I don't know the reason. I checked the golden files on my local copy, and I did find the value printed in the output, which could indicate that the golden files should be the same. I believe it is something particular to Phoenix, as for all the other tests the results were successful, and they were successful on my local computer as well. I have not introduced changes to the code other than the requested changes about commenting the code (and deleting the #ifndef MFC_OpenACC statements; but they did not do any calculations, they would just break the code in case errors were found). So, phase change alone should not have altered the results.
No, I cannot make the 3D simulations smaller/shorter, I have carefully chosen these tests because they thoroughly test the code in all its aspects. They, however, are NOT needed: the module is agnostic to the dimensionality of the problem, so I do not see any problems in keeping just the four 1D phase-change problems (A83CADB5, 9EF19F0A, 842C6FFC, 786DE444). That should be sufficient to test the module.
Can you not just use fewer time steps perhaps?

Yes please use fewer time steps. When you run your tests they should not be much longer than the other tests we have.

How about keeping the 1D cases only. As mentioned, I don't believe this will be a drawback, while it could significantly decrease the time

Since this is just a source term relaxed at the end of each time step this makes sense. Please keep the 2D cases, though.

The 3D cases only have pT-relaxation activated, which are not as time-consuming. The most time-consuming cases are the ones with pTg-relaxation equilibrium, and there are just 1D and 2D cases. @henryleberre and I have already discussed this, and already reduced the number of tests. There are two options available:

  1. Keep only the 1D cases, which I believe it is the best approach.
  2. Keep the 2D cases and use fewer time-steps, as suggested. This will potentially not test the 2D case thoroughly, which makes it meaningless to keep, in my opinion. So I would not recommend this.

Please, let me know if 2. is still the preference, so I can move futher

I am not sure it is meaningless. Part of the point of regression testing is the acceptance that we are not 100% sure what is meaningless, and when someone will introduce an unintended feature or bug that makes the test meaningful. For those reasons, we do regression testing as part of the test suite (all of it at the moment, actually). So I prefer 2, indeed.

Great, thank you for the update. I will proceed as requested

@sbryngelson how would you like me to report the performance tests? Would just informing the computation time sufficient, or other metrics are needed? Also, how many grid points/CPU and grid points/GPU are needed to perform these tests?

@sbryngelson
Copy link
Member

@JRChreim the NVHPC CPU runner is failing because it runs out of time on the Slurm job. You can (for now) increase this by changing the requested time for the slurm job, which is here JRChreim/MFC-JRChreim@a4b11e4/misc/run-phoenix-release-cpu.sh#L6
to 4 hours or so. (ultimately we want this process to not take 4 hours!)
The GPU runner is failing in a more legitimate way, suggesting that the code is incorrect. Does it pass tests on GPUs when you run it yourself? If so, on what cluster?

@sbryngelson , I found the issue with the GPU version of the code, and it seemed a few variables were not being transferred to the GPUs, causing calculation errors. So this is completed, tested, and they work. In terms of the tests, an alternative would be to make the relaxation procedure less strict (i.e., increasing the under-relaxation factor for the Newton solvers used) such that the tests run faster. For this approach, however, I would have to regenerate the golden-files since changing the under-relaxation would alter the solution obtained (to decimal places). Let me know if that approach is fine so I can proceed with it. The phase-change tests should be much faster by doing so, and the solver capabilities are tested.

Another approach would be to keep the code as is a decrease the number of time-steps for all tests. This does not seem to be as effective in terms of computation time.

FYI, the phase change module is still under development while this pull request is merged, meaning it is possible that we will have to regenerate the golden files as changes continue to be incorporated in the code. As of the moment, there is not a validation case that we can use as a ground truth for these module.

@JRChreim I see. I think it is OK to limit the number of iterations for phase change during testing. This seems like a perfectly reasonable way to limit computational cost while indeed properly testing that the code is doing what is expected. Is the max number of iterations an input variable in case.py? If not, I recommend making it one.

I recommend making the 2D phase-change tests no more expensive than any of the 3D tests (or so).

@sbryngelson
Copy link
Member

Your CI tests on Phoenix CPU and GPU are failing, do you know why?
Also, your test cases take very long to run, can you make the 3D ones smaller/shorter somehow?

No, I don't know the reason. I checked the golden files on my local copy, and I did find the value printed in the output, which could indicate that the golden files should be the same. I believe it is something particular to Phoenix, as for all the other tests the results were successful, and they were successful on my local computer as well. I have not introduced changes to the code other than the requested changes about commenting the code (and deleting the #ifndef MFC_OpenACC statements; but they did not do any calculations, they would just break the code in case errors were found). So, phase change alone should not have altered the results.
No, I cannot make the 3D simulations smaller/shorter, I have carefully chosen these tests because they thoroughly test the code in all its aspects. They, however, are NOT needed: the module is agnostic to the dimensionality of the problem, so I do not see any problems in keeping just the four 1D phase-change problems (A83CADB5, 9EF19F0A, 842C6FFC, 786DE444). That should be sufficient to test the module.
Can you not just use fewer time steps perhaps?

Yes please use fewer time steps. When you run your tests they should not be much longer than the other tests we have.

How about keeping the 1D cases only. As mentioned, I don't believe this will be a drawback, while it could significantly decrease the time

Since this is just a source term relaxed at the end of each time step this makes sense. Please keep the 2D cases, though.

The 3D cases only have pT-relaxation activated, which are not as time-consuming. The most time-consuming cases are the ones with pTg-relaxation equilibrium, and there are just 1D and 2D cases. @henryleberre and I have already discussed this, and already reduced the number of tests. There are two options available:

  1. Keep only the 1D cases, which I believe it is the best approach.
  2. Keep the 2D cases and use fewer time-steps, as suggested. This will potentially not test the 2D case thoroughly, which makes it meaningless to keep, in my opinion. So I would not recommend this.

Please, let me know if 2. is still the preference, so I can move futher

I am not sure it is meaningless. Part of the point of regression testing is the acceptance that we are not 100% sure what is meaningless, and when someone will introduce an unintended feature or bug that makes the test meaningful. For those reasons, we do regression testing as part of the test suite (all of it at the moment, actually). So I prefer 2, indeed.

Great, thank you for the update. I will proceed as requested

@sbryngelson how would you like me to report the performance tests? Would just informing the computation time sufficient, or other metrics are needed? Also, how many grid points/CPU and grid points/GPU are needed to perform these tests?

@JRChreim you can just post the simulation time results here for a 3D test case. Perhaps you can run a case on 1 GPU (I recommend 1 million grid points (or 100^3 domain) per 16GB of GPU RAM/memory) that has no phase change, then that same case with phase change again using 1 GPU on the same machine. I think I said something similar above. Repeat the same process with 1 CPU core.

The other performance test that I think should be fine, but would like to see tested regardless is scaling. Do a "weak scaling" test for a 3D case, which means doubling the problem size (total number of grid points), doubling the number of GPUs (or CPU cores), and making sure that the simulation time stays the same. You can do this from 1 GPU and continue doubling until 16 or 32 GPUs or so. Same with CPU cores.

- Added documentation on case.md
- Added doxygen-style comments on m_phase_change.fpp
- Altered the calculation of FT and dFdT so it is GPU-enabled
- Altered pre_process/m_checker.f90, simulation/m_checker.fpp, pre_process/m_global_parameters.fpp, simulation/m_global_parameters.fpp such that palpha_eps and ptgalpha_eps are now dflt_real by default
- regenerated golden files so that phase change tests are faster.
@JRChreim
Copy link
Contributor Author

JRChreim commented Jan 2, 2024

Your CI tests on Phoenix CPU and GPU are failing, do you know why?
Also, your test cases take very long to run, can you make the 3D ones smaller/shorter somehow?

No, I don't know the reason. I checked the golden files on my local copy, and I did find the value printed in the output, which could indicate that the golden files should be the same. I believe it is something particular to Phoenix, as for all the other tests the results were successful, and they were successful on my local computer as well. I have not introduced changes to the code other than the requested changes about commenting the code (and deleting the #ifndef MFC_OpenACC statements; but they did not do any calculations, they would just break the code in case errors were found). So, phase change alone should not have altered the results.
No, I cannot make the 3D simulations smaller/shorter, I have carefully chosen these tests because they thoroughly test the code in all its aspects. They, however, are NOT needed: the module is agnostic to the dimensionality of the problem, so I do not see any problems in keeping just the four 1D phase-change problems (A83CADB5, 9EF19F0A, 842C6FFC, 786DE444). That should be sufficient to test the module.
Can you not just use fewer time steps perhaps?

Yes please use fewer time steps. When you run your tests they should not be much longer than the other tests we have.

How about keeping the 1D cases only. As mentioned, I don't believe this will be a drawback, while it could significantly decrease the time

Since this is just a source term relaxed at the end of each time step this makes sense. Please keep the 2D cases, though.

The 3D cases only have pT-relaxation activated, which are not as time-consuming. The most time-consuming cases are the ones with pTg-relaxation equilibrium, and there are just 1D and 2D cases. @henryleberre and I have already discussed this, and already reduced the number of tests. There are two options available:

  1. Keep only the 1D cases, which I believe it is the best approach.
  2. Keep the 2D cases and use fewer time-steps, as suggested. This will potentially not test the 2D case thoroughly, which makes it meaningless to keep, in my opinion. So I would not recommend this.

Please, let me know if 2. is still the preference, so I can move futher

I am not sure it is meaningless. Part of the point of regression testing is the acceptance that we are not 100% sure what is meaningless, and when someone will introduce an unintended feature or bug that makes the test meaningful. For those reasons, we do regression testing as part of the test suite (all of it at the moment, actually). So I prefer 2, indeed.

Great, thank you for the update. I will proceed as requested

@sbryngelson how would you like me to report the performance tests? Would just informing the computation time sufficient, or other metrics are needed? Also, how many grid points/CPU and grid points/GPU are needed to perform these tests?

@JRChreim you can just post the simulation time results here for a 3D test case. Perhaps you can run a case on 1 GPU (I recommend 1 million grid points (or 100^3 domain) per 16GB of GPU RAM/memory) that has no phase change, then that same case with phase change again using 1 GPU on the same machine. I think I said something similar above. Repeat the same process with 1 CPU core.

The other performance test that I think should be fine, but would like to see tested regardless is scaling. Do a "weak scaling" test for a 3D case, which means doubling the problem size (total number of grid points), doubling the number of GPUs (or CPU cores), and making sure that the simulation time stays the same. You can do this from 1 GPU and continue doubling until 16 or 32 GPUs or so. Same with CPU cores.

@sbryngelson , please see the outcome of the tests. I believe this should cover your needs:

image

Details: this is an expansion tube (3D) simulation with and without phase-change. The initial condition consists of a velocity discontinuity at the center of the domain (-vel to the -x direction, vel to the +x direction), and as a consequence of the pressure decrease, phase change is activated. Note that because this is a simpler problem, its cost is not extravagant. But of course this is case dependent, as we solve as nonlinear system of equations for every cell at which phase-change is activated, so it is virtually impossible to get and exact cost for phase-change.

I ran ~200 time steps per simulation reported, and collected the 'Final time' for each of them as, per previous discussions, means the time per step. The results indicate that the increase in computation time is not as intense for this problem, as phase-change is confined to the interface generated due to the velocity discontinuity. For both cases (with PC and without PC), there seems to be a slight increase in time per step as the number of elements increase, but keep in mind that --case-optimization was not on. Regardless, the increase is subtle. Also note that, for phase change on, an increase in simulation time IS EXPECTED with the resolution, because this means more elements are present in the expansion wave region, at which phase change happens for this problem. So, a higher number of cells are activated for phase change with increasing resolution. Also, note that, although not directly comparable, the time per step significantly decreases with the use of GPUs in comparison to CPUs. Take, for example:

image

which is about 15 times faster by using a single GPU vs 8 CPUs (same problem, similar resolution)

@sbryngelson
Copy link
Member

Thanks @JRChreim! I can finish reviewing this now, I think.

One final performance test we need is a benchmark to make sure the non-phase-change case is still as fast as we expect and that you didn't accidentally introduce a slowdown to the non-phase-change code somehow. It's an easy one, fortunately. For this, you need to use --case-optimization for a 3D 2-fluid test case without phase change and with 1M grid points on 1 GPU (V100 preferred). We know how fast such a case should take so we can make sure it is as expected. In the future, this test will be automated by the CI.

Copy link
Member

@henryleberre henryleberre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbryngelson, everything looks good to me. I could not find any obvious issues! The documentation also renders fine.

@JRChreim
Copy link
Contributor Author

JRChreim commented Jan 3, 2024

Thanks @JRChreim! I can finish reviewing this now, I think.

One final performance test we need is a benchmark to make sure the non-phase-change case is still as fast as we expect and that you didn't accidentally introduce a slowdown to the non-phase-change code somehow. It's an easy one, fortunately. For this, you need to use --case-optimization for a 3D 2-fluid test case without phase change and with 1M grid points on 1 GPU (V100 preferred). We know how fast such a case should take so we can make sure it is as expected. In the future, this test will be automated by the CI.

@sbryngelson , I repeated the same test as the one for weak scaling, removing one fluid and toggling off phase-change (3D 2-fluid test case, without phase change, now):

This is how I performed the test:

To enter the node (GPU partition is not shared)

"interact -p GPU --gres=gpu:v100-16:8 -t 01:00:00"

modules loaded through

"source ./mfc.sh load"

options b and g

Then, I finally did:

./mfc.sh run ../PerfTests/case-optimization/ETNRM6.py -t pre_process simulation --case-optimization -N 1 -n 1 -b mpirun

so:

Simulating a 100x100x100 case on 1 rank(s)
[ 0%] Time step 1 of 201 @ t_step = 0
...
[100%] Time step 200 of 201 @ t_step = 199
Final Time 3.7949906319988028E-002

Done (in 0:00:09.319101)

@sbryngelson
Copy link
Member

sbryngelson commented Jan 3, 2024

Thanks @JRChreim! I can finish reviewing this now, I think.
One final performance test we need is a benchmark to make sure the non-phase-change case is still as fast as we expect and that you didn't accidentally introduce a slowdown to the non-phase-change code somehow. It's an easy one, fortunately. For this, you need to use --case-optimization for a 3D 2-fluid test case without phase change and with 1M grid points on 1 GPU (V100 preferred). We know how fast such a case should take so we can make sure it is as expected. In the future, this test will be automated by the CI.

@sbryngelson , I repeated the same test as the one for weak scaling, removing one fluid and toggling off phase-change (3D 2-fluid test case, without phase change, now):

This is how I performed the test:

To enter the node (GPU partition is not shared)

"interact -p GPU --gres=gpu:v100-16:8 -t 01:00:00"

modules loaded through

"source ./mfc.sh load"

options b and g

Then, I finally did:

./mfc.sh run ../PerfTests/case-optimization/ETNRM6.py -t pre_process simulation --case-optimization -N 1 -n 1 -b mpirun

so:

Simulating a 100x100x100 case on 1 rank(s) [ 0%] Time step 1 of 201 @ t_step = 0 ... [100%] Time step 200 of 201 @ t_step = 199 Final Time 3.7949906319988028E-002

Done (in 0:00:09.319101)

According to this document I would expect ~100K time steps/hr (with RK3), = 0.0036 seconds/step. It looks like you are getting 9500 steps/hr with 3.8s/step (about a factor of 10 too slow).

I'm not quite sure what's going on here. Can you pull the current master branch to a separate directory, test it the same way you did above, and see what you get? If that is also the same speed, then it might have to do with how the case is created? Perhaps you can share the ETNRM6.py file.

@JRChreim
Copy link
Contributor Author

JRChreim commented Jan 3, 2024

Thanks @JRChreim! I can finish reviewing this now, I think.
One final performance test we need is a benchmark to make sure the non-phase-change case is still as fast as we expect and that you didn't accidentally introduce a slowdown to the non-phase-change code somehow. It's an easy one, fortunately. For this, you need to use --case-optimization for a 3D 2-fluid test case without phase change and with 1M grid points on 1 GPU (V100 preferred). We know how fast such a case should take so we can make sure it is as expected. In the future, this test will be automated by the CI.

@sbryngelson , I repeated the same test as the one for weak scaling, removing one fluid and toggling off phase-change (3D 2-fluid test case, without phase change, now):
This is how I performed the test:
To enter the node (GPU partition is not shared)
"interact -p GPU --gres=gpu:v100-16:8 -t 01:00:00"
modules loaded through
"source ./mfc.sh load"
options b and g
Then, I finally did:
./mfc.sh run ../PerfTests/case-optimization/ETNRM6.py -t pre_process simulation --case-optimization -N 1 -n 1 -b mpirun
so:
Simulating a 100x100x100 case on 1 rank(s) [ 0%] Time step 1 of 201 @ t_step = 0 ... [100%] Time step 200 of 201 @ t_step = 199 Final Time 3.7949906319988028E-002

Done (in 0:00:09.319101)

According to this document I would expect ~100K time steps/hr (with RK3), = 0.0036 seconds/step. It looks like you are getting 9500 steps/hr with 3.8s/step (about a factor of 10 too slow).

I'm not quite sure what's going on here. Can you pull the current master branch to a separate directory, test it the same way you did above, and see what you get? If that is also the same speed, then it might have to do with how the case is created? Perhaps you can share the ETNRM6.py file.

Hi @sbryngelson, sorry for the confusion, but let me know if that helps:

It was mentioned that:
"
According to this document I would expect ~100K time steps/hr (with RK3), = 0.0036 seconds/step. It looks like you are getting 9500 steps/hr with 3.8s/step (about a factor of 10 too slow)."

But we have:
100K time steps/hr = 100.000 steps/3600 seconds ~ 27.78 steps/second ~ 0.036 seconds/step = 3.6E-02 seconds/step

Which is approximately what I am obtaining; at the end of the file, "Final Time 3.79.....E-002", the expected order, as shown above. Thus I believe the time per step is appropriate. Let me know if that makes sense

@sbryngelson
Copy link
Member

Thanks @JRChreim! I can finish reviewing this now, I think.
One final performance test we need is a benchmark to make sure the non-phase-change case is still as fast as we expect and that you didn't accidentally introduce a slowdown to the non-phase-change code somehow. It's an easy one, fortunately. For this, you need to use --case-optimization for a 3D 2-fluid test case without phase change and with 1M grid points on 1 GPU (V100 preferred). We know how fast such a case should take so we can make sure it is as expected. In the future, this test will be automated by the CI.

@sbryngelson , I repeated the same test as the one for weak scaling, removing one fluid and toggling off phase-change (3D 2-fluid test case, without phase change, now):
This is how I performed the test:
To enter the node (GPU partition is not shared)
"interact -p GPU --gres=gpu:v100-16:8 -t 01:00:00"
modules loaded through
"source ./mfc.sh load"
options b and g
Then, I finally did:
./mfc.sh run ../PerfTests/case-optimization/ETNRM6.py -t pre_process simulation --case-optimization -N 1 -n 1 -b mpirun
so:
Simulating a 100x100x100 case on 1 rank(s) [ 0%] Time step 1 of 201 @ t_step = 0 ... [100%] Time step 200 of 201 @ t_step = 199 Final Time 3.7949906319988028E-002

Done (in 0:00:09.319101)

According to this document I would expect ~100K time steps/hr (with RK3), = 0.0036 seconds/step. It looks like you are getting 9500 steps/hr with 3.8s/step (about a factor of 10 too slow).
I'm not quite sure what's going on here. Can you pull the current master branch to a separate directory, test it the same way you did above, and see what you get? If that is also the same speed, then it might have to do with how the case is created? Perhaps you can share the ETNRM6.py file.

Hi @sbryngelson, sorry for the confusion, but let me know if that helps:

It was mentioned that: " According to this document I would expect ~100K time steps/hr (with RK3), = 0.0036 seconds/step. It looks like you are getting 9500 steps/hr with 3.8s/step (about a factor of 10 too slow)."

But we have: 100K time steps/hr = 100.000 steps/3600 seconds ~ 27.78 steps/second ~ 0.036 seconds/step = 3.6E-02 seconds/step

Which is approximately what I am obtaining; at the end of the file, "Final Time 3.79.....E-002", the expected order, as shown above. Thus I believe the time per step is appropriate. Let me know if that makes sense

Whoops, I did the calculation wrong. Ok, thanks.

Can you add some 1D and possibly 2/3D cases to examples/ that use phase change features to this PR? I'm sure you already have some.

sbryngelson
sbryngelson previously approved these changes Jan 3, 2024
- added 2 examples for phase-change simualtions:
1D expansion tube, with three fluids
2D unidirectional shock-tube with a column of water in front of it, phase change activated

- changed tol = 5e-7 to tol = 1e-10 as per request
@sbryngelson
Copy link
Member

@JRChreim will merge once tests finish running

@JRChreim
Copy link
Contributor Author

JRChreim commented Jan 4, 2024

Thanks @JRChreim! I can finish reviewing this now, I think.
One final performance test we need is a benchmark to make sure the non-phase-change case is still as fast as we expect and that you didn't accidentally introduce a slowdown to the non-phase-change code somehow. It's an easy one, fortunately. For this, you need to use --case-optimization for a 3D 2-fluid test case without phase change and with 1M grid points on 1 GPU (V100 preferred). We know how fast such a case should take so we can make sure it is as expected. In the future, this test will be automated by the CI.

@sbryngelson , I repeated the same test as the one for weak scaling, removing one fluid and toggling off phase-change (3D 2-fluid test case, without phase change, now):
This is how I performed the test:
To enter the node (GPU partition is not shared)
"interact -p GPU --gres=gpu:v100-16:8 -t 01:00:00"
modules loaded through
"source ./mfc.sh load"
options b and g
Then, I finally did:
./mfc.sh run ../PerfTests/case-optimization/ETNRM6.py -t pre_process simulation --case-optimization -N 1 -n 1 -b mpirun
so:
Simulating a 100x100x100 case on 1 rank(s) [ 0%] Time step 1 of 201 @ t_step = 0 ... [100%] Time step 200 of 201 @ t_step = 199 Final Time 3.7949906319988028E-002

Done (in 0:00:09.319101)

According to this document I would expect ~100K time steps/hr (with RK3), = 0.0036 seconds/step. It looks like you are getting 9500 steps/hr with 3.8s/step (about a factor of 10 too slow).
I'm not quite sure what's going on here. Can you pull the current master branch to a separate directory, test it the same way you did above, and see what you get? If that is also the same speed, then it might have to do with how the case is created? Perhaps you can share the ETNRM6.py file.

Hi @sbryngelson, sorry for the confusion, but let me know if that helps:
It was mentioned that: " According to this document I would expect ~100K time steps/hr (with RK3), = 0.0036 seconds/step. It looks like you are getting 9500 steps/hr with 3.8s/step (about a factor of 10 too slow)."
But we have: 100K time steps/hr = 100.000 steps/3600 seconds ~ 27.78 steps/second ~ 0.036 seconds/step = 3.6E-02 seconds/step
Which is approximately what I am obtaining; at the end of the file, "Final Time 3.79.....E-002", the expected order, as shown above. Thus I believe the time per step is appropriate. Let me know if that makes sense

Whoops, I did the calculation wrong. Ok, thanks.

Can you add some 1D and possibly 2/3D cases to examples/ that use phase change features to this PR? I'm sure you already have some.

Thanks @JRChreim! I can finish reviewing this now, I think.
One final performance test we need is a benchmark to make sure the non-phase-change case is still as fast as we expect and that you didn't accidentally introduce a slowdown to the non-phase-change code somehow. It's an easy one, fortunately. For this, you need to use --case-optimization for a 3D 2-fluid test case without phase change and with 1M grid points on 1 GPU (V100 preferred). We know how fast such a case should take so we can make sure it is as expected. In the future, this test will be automated by the CI.

@sbryngelson , I repeated the same test as the one for weak scaling, removing one fluid and toggling off phase-change (3D 2-fluid test case, without phase change, now):
This is how I performed the test:
To enter the node (GPU partition is not shared)
"interact -p GPU --gres=gpu:v100-16:8 -t 01:00:00"
modules loaded through
"source ./mfc.sh load"
options b and g
Then, I finally did:
./mfc.sh run ../PerfTests/case-optimization/ETNRM6.py -t pre_process simulation --case-optimization -N 1 -n 1 -b mpirun
so:
Simulating a 100x100x100 case on 1 rank(s) [ 0%] Time step 1 of 201 @ t_step = 0 ... [100%] Time step 200 of 201 @ t_step = 199 Final Time 3.7949906319988028E-002

Done (in 0:00:09.319101)

According to this document I would expect ~100K time steps/hr (with RK3), = 0.0036 seconds/step. It looks like you are getting 9500 steps/hr with 3.8s/step (about a factor of 10 too slow).
I'm not quite sure what's going on here. Can you pull the current master branch to a separate directory, test it the same way you did above, and see what you get? If that is also the same speed, then it might have to do with how the case is created? Perhaps you can share the ETNRM6.py file.

Hi @sbryngelson, sorry for the confusion, but let me know if that helps:
It was mentioned that: " According to this document I would expect ~100K time steps/hr (with RK3), = 0.0036 seconds/step. It looks like you are getting 9500 steps/hr with 3.8s/step (about a factor of 10 too slow)."
But we have: 100K time steps/hr = 100.000 steps/3600 seconds ~ 27.78 steps/second ~ 0.036 seconds/step = 3.6E-02 seconds/step
Which is approximately what I am obtaining; at the end of the file, "Final Time 3.79.....E-002", the expected order, as shown above. Thus I believe the time per step is appropriate. Let me know if that makes sense

Whoops, I did the calculation wrong. Ok, thanks.

Can you add some 1D and possibly 2/3D cases to examples/ that use phase change features to this PR? I'm sure you already have some.

I added two examples:

1D expansion tube, with three fluids: 1D_exp_tube_phasechange
2D unidirectional shock-tube with a column of water in front of it, phase change activated: 2D_shocktube_phasechange

@sbryngelson sbryngelson merged commit 90e7343 into MFlowCode:master Jan 4, 2024
15 checks passed
@JRChreim
Copy link
Contributor Author

JRChreim commented Jan 4, 2024

@JRChreim will merge once tests finish running

That's a win @sbryngelson and @henryleberre ! Thank you for the help in this journey XD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

Successfully merging this pull request may close these issues.

4 participants