Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not update MPAS masks during advection #115

Merged

Conversation

trhille
Copy link

@trhille trhille commented Apr 12, 2024

Remove updates of MPAS masks during advection and within Runge-Kutta loop for more accurate time integration.

@trhille trhille added the in progress Still being worked on, don't merge yet! label Apr 12, 2024
@trhille
Copy link
Author

trhille commented Apr 12, 2024

@mperego, I believe this addresses your concerns about mask updates occurring within the RK loop. The code changes are straightforward, but we will have to carefully evaluate how it changes model behavior. It will very likely shift our regression testing baselines, and there may be issues with budgets closing.

@@ -627,6 +643,9 @@ subroutine li_advection_thickness_tracers(&
enddo
endif

! We need an updated set of masks to calculate fluxAcrossGroundingLine,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the flux at the grounding line is calculated at each stage? Also, if I understand correctly at this point the thickness is not consistent with the velocity. Wouldn't be better to move the calculation of the GL flux outside of this function?

@trhille trhille added clean up and removed in progress Still being worked on, don't merge yet! labels Aug 20, 2024
Remove calls to li_calculate_mask where possible within the RK loop.
Where we need masks for budget calculations, reset masks to their
pre-advection states once the necessary calculation is complete.
Remove cellMask from vertical_remap, as it is not used in that routine.
Update masks before RK loop, but not at start of each advection stage.
@trhille trhille force-pushed the mali/remove_mask_updates_in_RK_loop branch from be26a55 to f6a6653 Compare August 20, 2024 15:20
@trhille
Copy link
Author

trhille commented Aug 20, 2024

Force-push from be26a55 to f6a6653 was a rebase onto MALI-Dev/develop to incorporate changes from #110.

@trhille
Copy link
Author

trhille commented Aug 20, 2024

When using forward Euler time integration, all full_integration tests pass validation. All tests pass baseline comparison except for landice_greenland_sia_restart_test, landice_greenland_sia_decomposition_test, landice_greenland_fo_decomposition_test, and landice_greenland_fo_restart_test. The diffs for these are extremely small, with thickness L2 and Linf norms O(10^-3), normalVelocity L2 and Linf norms O(10^-15) for SIA and O(10^-18 – 10^-20) for FO.

@trhille
Copy link
Author

trhille commented Aug 20, 2024

I ran full_integration with the following settings in the default namelist file for both baseline and comparison:

    config_thickness_advection = 'fct'
    config_tracer_advection = 'none'
    config_horiz_tracer_adv_order = 3
    config_advection_coef_3rd_order = 0.25

    config_time_integration = 'runge_kutta'
    config_rk_order = 2

All tests passed validation, except for landice_hydro_radial_restart_test, which has diffs that I believe to be negligibly small:

waterThickness       Time index: 0, 1, 2
2:  l1: 1.47243328640911e-14  l2: 1.19614938685988e-15  linf: 2.22044604925031e-16
  FAIL /pscratch/sd/t/trhille/test_MALI-Dev_PR115/rk2_comparison/landice/hydro_radial/restart_test/full_run/output.nc
       /pscratch/sd/t/trhille/test_MALI-Dev_PR115/rk2_comparison/landice/hydro_radial/restart_test/restart_run/output.nc
waterPressure        Time index: 0, 1, 2
2:  l1: 1.38534232974052e-08  l2: 1.18911276023052e-09  linf: 4.65661287307739e-10
  FAIL /pscratch/sd/t/trhille/test_MALI-Dev_PR115/rk2_comparison/landice/hydro_radial/restart_test/full_run/output.nc
       /pscratch/sd/t/trhille/test_MALI-Dev_PR115/rk2_comparison/landice/hydro_radial/restart_test/restart_run/output.nc

All tests pass baseline comparison except for the four greenland tests, as with forward Euler, which have diffs in the same very small range as reported for forward Euler above.

@trhille
Copy link
Author

trhille commented Aug 20, 2024

I tested this on a copy of an Amery run directory with calving turned off (restore-calving turned on), 3rd order fct with coeff = 0.25, advective CFL fraction 0.3333, and RK2 time.
Blue: this PR; orange: head of develop
image

So, this change certainly has an effect, but it's very small.

Copy link

@matthewhoffman matthewhoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trhille , the implementation appears correct to me, so from here it comes down to how to test this, like the previous PR. I have a few ideas, but I'm not sure if they are each worth the effort and/or if they are sufficient.

  1. Halfar dome mesh convergence test - much like previous PR, it would be nice to confirm that this PR does not degrade the mesh convergence of the Halfar dome. This would again be a do-no-evil test as we can't expect to do better than 1st order with Halfar and we're already there. But Halfar at least will be affected by the mask calculations that indicate where ice exists.
  2. MISMIP+ mesh convergence test - this domain would be useful for considering floating/grounded mask changes. Without an analytical solution this will be challenging, but we could evaluate error relative to a high resolution reference solution. The other challenge is that different resolutions yield slightly different spun up states, but again, we can take the high resolution spin up as the reference initial condition. What I propose is:
  • do a 500m spinup (no ice-shelf basal melting) to steady state. If it is truly at steady state, it should be independent of order of accuracy of advection and time stepping, but we could use 3rd order with this PR to be safe (our belief of our most accurate model)
  • interpolate the SS ice thickness to 1, 2, 4, 8 km meshes
  • run Ice1r experiment (draft-dependent melt rate applied for 100 years) for each resolution, repeated for ensembles with 1st, 2nd, and 3rd order methods, with and without this PR (a total of 6 ensembles x 5 resolutions = 30 runs)
  • Evaluate convergence order of centerline GL position at year 100 (the primary metric from MISMIP+) relative to 500m reference solution for each ensemble. If this PR increases convergence or maintains similar convergence for each order of accuracy method, we approve the PR.
  • Possibly repeat with a modification to the MISMIP+ protocol to use a uniform ice-shelf basal melt rate. The standard MISMIP+ melt parameterization has melt rate fall to zero near the GL, which would not stress test the grounded/floating masking very much. Imposing a strong uniform melt rate would ensure resolving the floating/grounded mask properly matters.

@trhille
Copy link
Author

trhille commented Aug 20, 2024

Thanks, @matthewhoffman. Here are the results of the Halfar test using RK2, 2nd order FCT:
image

This is the some convergence order that we saw in #110 .

@trhille
Copy link
Author

trhille commented Aug 21, 2024

All tests passed validation, except for landice_hydro_radial_restart_test, which has diffs that I believe to be negligibly small:

Just a note to remind us that we don't actually expect this test to pass for RK2 or RK3. Here's the relevant information from the original Runge-Kutta PR: #86 (comment)

"I also tested landice/humboldt/mesh-3km_restart_test/velo-none_calving-none_subglacialhydro using RK2 with config_SGH = .false.. This also failed validation, with the same thickness errors as before: 90: l1: 2.75890560397229e-11 l2: 2.08855832908240e-12 linf: 4.54747350886464e-13. Plotting up the differences between the first and last output times shows roundoff-level errors in both full_run and restart_run, but they are not the same. An equivalent run using forward Euler time stepping showed all zeros. So apparently there are two issues when using Runge-Kutta time stepping with advection turned off: round-off error in thickness and a non-BFB restart.
Upon further testing, it turns out that I can get the hydro restart tests to pass validation by adding a call to li_calculate_layerThickness again after layerThickness is summed into thickness, here: https://github.com/trhille/E3SM/blob/implement_rk_time_integration/components/mpas-albany-landice/src/mode_forward/mpas_li_time_integration_fe_rk.F#L387
However, that does not get rid of the roundoff error issue. It simply makes the roundoff errors consistent between the full run and the restart tun. Given that the case of no advection with higher-order time integration is basically useless, we have decided not to address this issue in this PR."

@matthewhoffman matthewhoffman self-assigned this Aug 21, 2024
@matthewhoffman
Copy link

@trhille , thanks for the additional testing and sleuthing. Based on what you've reported, I feel comfortable merging this. @mperego , are you comfortable with proceeding as is or do you have any other concerns? Are you ok with leaving the workaround for the grounding line flux calculation?

@mperego
Copy link

mperego commented Aug 22, 2024

Thanks for working on this @trhille. I'm OK with merging this PR.
Regarding the GLF calculation, what's the plan? My understanding is that at the moment the velocity used to calculate the GLF is not consistent with the thickness, which can make it hard to debug because it won't be consistent with the thickness and velocity that gets stored in the netcdf file. Also, here the GLF is calculated at each stage, but when does the GLF get stored?

@trhille
Copy link
Author

trhille commented Aug 22, 2024

@mperego, calculating the GL flux in this way is necessary to close the mass budgets, but I think you're correct that it won't be consistent with the thickness and velocity that get stored. I can look further into that, but it's outside the realm of this PR.

Regarding where the flux is stored, all the mass budget terms get updated each RK stage here: https://github.com/trhille/E3SM/blob/mali/remove_mask_updates_in_RK_loop/components/mpas-albany-landice/src/mode_forward/mpas_li_time_integration_fe_rk.F#L473-L480

And then they are finalized here after the RK loop: https://github.com/trhille/E3SM/blob/mali/remove_mask_updates_in_RK_loop/components/mpas-albany-landice/src/mode_forward/mpas_li_time_integration_fe_rk.F#L533-L539

@mperego
Copy link

mperego commented Aug 23, 2024

Thanks. Why the mass budget needs to be consistent at each stage? Anyway, we can talk about this in a different venue.

@matthewhoffman
Copy link

Given there is an ongoing PR to deal with closing the mass budget (#38), I feel ok leaving the sort of incomplete handling of the grounding line flux calculation as it is in thiS PR. I will proceed with merging this.

@matthewhoffman
Copy link

I reran the default full_integration suite on perlmutter with a merge of this branch onto develop. I got the same results as @trhille above - the 4 Greenland cases showed very small differences from baseline, but otherwise everything passes. As discussed above, we consider this acceptable.

Test Runtimes:
00:35 PASS landice_dome_2000m_sia_restart_test
00:10 PASS landice_dome_2000m_sia_decomposition_test
00:09 PASS landice_dome_variable_resolution_sia_restart_test
00:20 PASS landice_dome_variable_resolution_sia_decomposition_test
00:29 PASS landice_enthalpy_benchmark_A
00:28 PASS landice_eismint2_decomposition_test
00:27 PASS landice_eismint2_enthalpy_decomposition_test
00:50 PASS landice_eismint2_restart_test
00:27 PASS landice_eismint2_enthalpy_restart_test
00:22 FAIL landice_greenland_sia_restart_test
00:14 FAIL landice_greenland_sia_decomposition_test
00:23 PASS landice_hydro_radial_restart_test
00:16 PASS landice_hydro_radial_decomposition_test
00:39 PASS landice_humboldt_mesh-3km_decomposition_test_velo-none_calving-none_subglacialhydro
00:36 PASS landice_humboldt_mesh-3km_restart_test_velo-none_calving-none_subglacialhydro
00:23 PASS landice_dome_2000m_fo_decomposition_test
00:19 PASS landice_dome_2000m_fo_restart_test
00:15 PASS landice_dome_variable_resolution_fo_decomposition_test
00:15 PASS landice_dome_variable_resolution_fo_restart_test
00:18 PASS landice_circular_shelf_decomposition_test
00:45 FAIL landice_greenland_fo_decomposition_test
00:45 FAIL landice_greenland_fo_restart_test
00:23 PASS landice_thwaites_fo_decomposition_test
00:29 PASS landice_thwaites_fo_restart_test
00:12 PASS landice_thwaites_fo-depthInt_decomposition_test
00:20 PASS landice_thwaites_fo-depthInt_restart_test
00:35 PASS landice_humboldt_mesh-3km_restart_test_velo-fo_calving-von_mises_stress_damage-threshold_faceMelting
00:18 PASS landice_humboldt_mesh-3km_restart_test_velo-fo-depthInt_calving-von_mises_stress_damage-threshold_faceMelting
Total runtime 12:09
FAIL: 4 tests failed, see above.

@matthewhoffman matthewhoffman merged commit 82a7afe into MALI-Dev:develop Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants