Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable traditional threading as an option #3149

Merged

Conversation

aerorahul
Copy link
Contributor

@aerorahul aerorahul commented Dec 10, 2024

Description

In a previous PR, stubs were added to enable traditional threading in the UFS.
This PR takes the next step to execute the model with traditional threading.
Specifically;

  1. Do not use the _esmf ufs_configure file for traditional threading. This sets the globalResourceControl to false (traditional threading) ... this was done in the previous PR
  2. when calculating components pet bounds do not multiply with number of threads also in ufs_configure ... this PR
  3. export OMP_NUM_THREADS=$UFS_THREADS ... this was done in the previous PR

The default behaviour to use ESMF-managed threading is retained. Traditional threading can be achieved by toggling USE_ESMF_THREADING flag in config.fcst

Closes #3122 - as in allows the model to be configured to run w/ traditional threading. Testing on C768 S2SW was successful on Hercules (This was previously reported to be failing)

The use of traditional threading will be inefficient for model components that are not thread-enabled and supported e.g. MOM6 and CICE6, but since they require smaller number of tasks compared to ATM or WAV, the CPU 'waste' is not huge.

Type of change

  • Bug fix (fixes something broken)
  • New feature (adds functionality)
  • Maintenance (code refactor, clean-up, new CI test, etc.)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO
  • Does this change require an update to any of the following submodules? NO (If YES, please add a link to any PRs that are pending.)
    • EMC verif-global
    • GDAS
    • GFS-utils
    • GSI
    • GSI-monitor
    • GSI-utils
    • UFS-utils
    • UFS-weather-model
    • wxflow

How has this been tested?

Testing on Hercules for a C768 S2SW case was run with traditional threading (4 threads)
Some particulars of the run are as follows:
Run directory: /work/noaa/stmp/rmahajan/HERCULES/RUNDIRS/c768s2swtt/gfs.2019120300/gfsfcst.2019120300/fcst.2077795/
Run log: /work2/noaa/stmp/rmahajan/RUNTESTS//COMROOT/c768s2swtt/logs/2019120300/gfs_fcst_seg0.log

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have documented my code, including function, input, and output descriptions
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • This change is covered by an existing CI test or a new one has been added
  • Any new scripts have been added to the .github/CODEOWNERS file with owners
  • I have made corresponding changes to the system documentation if necessary

@aerorahul aerorahul marked this pull request as ready for review December 10, 2024 22:24
@aerorahul
Copy link
Contributor Author

After some waiting for WW3 points, it seems the model progressed in the test on Hercules. The run will time out at 30 mins, because I was testing it. Still worth taking a look at the log.

@aerorahul
Copy link
Contributor Author

A run of the test C768_S2SW succeeded on Hercules with traditional threading.
This test was previously reported to be failing on Hercules and Orion.
I have not tested this for C1152 or Orion.

@WalterKolczynski-NOAA WalterKolczynski-NOAA added the CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera label Dec 11, 2024
@emcbot emcbot added CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera CI-Hera-Running **Bot use only** CI testing on Hera for this PR is in-progress and removed CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera labels Dec 11, 2024
@aerorahul
Copy link
Contributor Author

@JessicaMeixner-NOAA is there an expectation the model reproduces results based on who is managing threading?

@JessicaMeixner-NOAA
Copy link
Contributor

I would think that we should get the same answers regardless of if it's ESMF managed threading or not, or how many threads. I don't know if ufs-weather-model currently tests, but assume that was probably an early test if not a current one.

@emcbot emcbot added CI-Hera-Passed **Bot use only** CI testing on Hera for this PR has completed successfully and removed CI-Hera-Running **Bot use only** CI testing on Hera for this PR is in-progress labels Dec 11, 2024
@emcbot
Copy link

emcbot commented Dec 11, 2024

CI Passed on Hera in Build# 1
Built and ran in directory /scratch1/NCEPDEV/global/CI/3149


Experiment C48_ATM_26a0c420 Completed 2 Cycles: *SUCCESS* at Wed Dec 11 19:51:36 UTC 2024
Experiment C48mx500_3DVarAOWCDA_26a0c420 Completed 2 Cycles: *SUCCESS* at Wed Dec 11 20:09:53 UTC 2024
Experiment C48mx500_hybAOWCDA_26a0c420 Completed 2 Cycles: *SUCCESS* at Wed Dec 11 20:22:02 UTC 2024
Experiment C96_S2SWA_gefs_replay_ics_26a0c420 Completed 1 Cycles: *SUCCESS* at Wed Dec 11 20:28:51 UTC 2024
Experiment C96_atm3DVar_26a0c420 Completed 3 Cycles: *SUCCESS* at Wed Dec 11 21:35:31 UTC 2024
Experiment C96C48_hybatmDA_26a0c420 Completed 3 Cycles: *SUCCESS* at Wed Dec 11 21:35:31 UTC 2024
Experiment C96C48_hybatmaerosnowDA_26a0c420 Completed 3 Cycles: *SUCCESS* at Wed Dec 11 21:35:31 UTC 2024
Experiment C96C48_ufs_hybatmDA_26a0c420 Completed 3 Cycles: *SUCCESS* at Wed Dec 11 22:12:05 UTC 2024
Experiment C48_S2SW_26a0c420 Completed 2 Cycles: *SUCCESS* at Wed Dec 11 22:12:30 UTC 2024
Experiment C48_S2SWA_gefs_26a0c420 Completed 1 Cycles: *SUCCESS* at Wed Dec 11 23:03:30 UTC 2024

@WalterKolczynski-NOAA WalterKolczynski-NOAA added CI-Hercules-Ready **CM use only** PR is ready for CI testing on Hercules CI-Wcoss2-Building **Bot use only** CI testing is cloning/building on WCOSS labels Dec 12, 2024
@emcbot emcbot added CI-Hercules-Building **Bot use only** CI testing is cloning/building on Hercules and removed CI-Hercules-Ready **CM use only** PR is ready for CI testing on Hercules labels Dec 12, 2024
@WalterKolczynski-NOAA WalterKolczynski-NOAA added CI-Wcoss2-Running **Bot use only** CI testing on WCOSS for this PR is in-progress and removed CI-Wcoss2-Building **Bot use only** CI testing is cloning/building on WCOSS labels Dec 12, 2024
@WalterKolczynski-NOAA
Copy link
Contributor

CI Tests set up to run in /lfs/h2/emc/ptmp/walter.kolczynski/PR/PR_3149/RUNTESTS on WCOSS

@emcbot emcbot added CI-Hercules-Running **Bot use only** CI testing on Hercules for this PR is in-progress and removed CI-Hercules-Building **Bot use only** CI testing is cloning/building on Hercules labels Dec 12, 2024
@WalterKolczynski-NOAA WalterKolczynski-NOAA added CI-Wcoss2-Passed **Bot use only** CI testing on WCOSS for this PR has completed successfully and removed CI-Wcoss2-Running **Bot use only** CI testing on WCOSS for this PR is in-progress labels Dec 12, 2024
@emcbot emcbot added CI-Hercules-Passed **Bot use only** CI testing on Hercules for this PR has completed successfully and removed CI-Hercules-Running **Bot use only** CI testing on Hercules for this PR is in-progress labels Dec 13, 2024
@emcbot
Copy link

emcbot commented Dec 13, 2024

CI Passed on Hercules in Build# 2
Built and ran in directory /work2/noaa/global/CI/HERCULES/3149


Experiment C48mx500_3DVarAOWCDA_26a0c420 Completed 2 Cycles: *SUCCESS* at Thu Dec 12 12:22:48 CST 2024
Experiment C48mx500_hybAOWCDA_26a0c420 Completed 2 Cycles: *SUCCESS* at Thu Dec 12 12:46:56 CST 2024
Experiment C48_ATM_26a0c420 Completed 2 Cycles: *SUCCESS* at Thu Dec 12 18:37:47 CST 2024
Experiment C96_atm3DVar_26a0c420 Completed 3 Cycles: *SUCCESS* at Thu Dec 12 19:26:16 CST 2024
Experiment C96C48_hybatmDA_26a0c420 Completed 3 Cycles: *SUCCESS* at Thu Dec 12 19:32:44 CST 2024
Experiment C96_S2SWA_gefs_replay_ics_26a0c420 Completed 1 Cycles: *SUCCESS* at Thu Dec 12 20:38:05 CST 2024
Experiment C48_S2SW_26a0c420 Completed 2 Cycles: *SUCCESS* at Thu Dec 12 22:20:15 CST 2024
Experiment C48_S2SWA_gefs_26a0c420 Completed 1 Cycles: *SUCCESS* at Fri Dec 13 01:35:27 CST 2024

@WalterKolczynski-NOAA WalterKolczynski-NOAA merged commit 0db859e into NOAA-EMC:develop Dec 13, 2024
10 of 11 checks passed
sbanihash pushed a commit to sbanihash/global-workflow that referenced this pull request Dec 14, 2024
In a previous PR, stubs were added to enable traditional threading in
the UFS.
This PR takes the next step to execute the model with traditional
threading.
Specifically;
1. Do not use the `_esmf` `ufs_configure` file for traditional
threading. This sets the `globalResourceControl` to `false` (traditional
threading) ... this was done in the previous PR
2. when calculating components pet bounds do not multiply with number of
threads also in `ufs_configure` ... this PR
3. `export OMP_NUM_THREADS=$UFS_THREADS` ... this was done in the
previous PR

The default behaviour to use ESMF-managed threading is retained.
Traditional threading can be achieved by toggling `USE_ESMF_THREADING`
flag in `config.fcst`

Closes NOAA-EMC#3122 - as in allows the model to be configured to run w/
traditional threading. Testing on C768 S2SW was successful on Hercules
(This was previously reported to be failing)

The use of traditional threading will be inefficient for model
components that are not thread-enabled and supported e.g. MOM6 and
CICE6, but since they require smaller number of tasks compared to ATM or
WAV, the CPU 'waste' is not huge.
@aerorahul aerorahul deleted the feature/traditional_threading branch December 16, 2024 14:13
danholdaway added a commit to danholdaway/global-workflow that referenced this pull request Jan 27, 2025
* develop:
  Remove WAFS files and references from `develop` (NOAA-EMC#3263)
  fix intel stack version number on c5 (NOAA-EMC#3258)
  Update gsi_monitor and ufs_utils hashes to recent hashes for C5/C6 build and run (NOAA-EMC#3252)
  Enable DA cycling on gaea C5/C6 (NOAA-EMC#3255)
  Copy post-processed sea ice increment for diagnostics (NOAA-EMC#3235)
  Only run METplus in the 3Dvar tests (NOAA-EMC#3245)
  Clone, build, and run C48_ATM and C48_S2SW on Gaea C5 and C6 (NOAA-EMC#3106)
  Add echgres as a dependency only for RUN=enkfgdas, not enkfgfs (NOAA-EMC#3246)
  Add domain level to wave gridded COM path (NOAA-EMC#3137)
  CI JJOB Tests using CMake (NOAA-EMC#3214)
  Make assorted updates to waves (NOAA-EMC#3190)
  Move WCOSS2 LD_LIBRARY_PATH patches to load_ufsda_modules.sh (NOAA-EMC#3236)
  Adding a gefs_arch task to GEFS workflow (NOAA-EMC#3211)
  Add additional GEFS variables needed for AI/ML applications  (NOAA-EMC#3221)
  Add bmat task dependency to marine LETKF task (NOAA-EMC#3224)
  Resolve bug with LMOD_TMOD_FIND_FIRST setting affecting build on WCOSS2 (NOAA-EMC#3229)
  Reinstate product groups (NOAA-EMC#3208)
  Additional fixes for downstream jobs (NOAA-EMC#3187)
  Turn IAU off during staging job for cold start experiments (NOAA-EMC#3215)
  Update the gdas.cd hash and enable GDASApp to run on WCOSS2 (NOAA-EMC#3220)
  Update upload-artifact to v4 (NOAA-EMC#3216)
  Prevent duplicate case generation in generate_workflows.sh (NOAA-EMC#3217)
  Update g-w to cycle with C1152 ATM (NOAA-EMC#3206)
  Separate use of initial increment/perturbation file from REPLAY/+03 ICs  (NOAA-EMC#3119)
  Update gsi_enkf hash and gsi_ver (NOAA-EMC#3207)
  Remove cpus-per-task from APRUN_OCNANALECEN on WCOSS2 (NOAA-EMC#3212)
  Remove 5WAVH from AWIPS GRIB2 parm files (NOAA-EMC#3146)
  Remove multi-grid wave support (NOAA-EMC#3188)
  Add echgres as a dependency for earc (NOAA-EMC#3202)
  Ensure OCNRES and ICERES have 3 digits in the archive script (NOAA-EMC#3199)
  Set runtime shell requirements within Jenkins Pipeline (NOAA-EMC#3171)
  Add efcs and epos to ufs_hybatm xml (NOAA-EMC#3192) (NOAA-EMC#3193)
  Fix GEFS and SFS compile flags in build_all.sh (NOAA-EMC#3197)
  Remove early-cycle EnKF forecast (NOAA-EMC#3185)
  Fix mod_icec bug in atmos_prod (NOAA-EMC#3167)
  Create compute build option (NOAA-EMC#3186)
  Support global-workflow using Rocky 8 on CSPs (NOAA-EMC#2998)
  Change orog gravity wave drag scheme for grid sizes less than 10km (NOAA-EMC#3175)
  Switch snow DA to use 2DVar for deterministic and ensemble mean (NOAA-EMC#3163)
  Update compression options for GEFS history files (NOAA-EMC#3184)
  Update compression options for high res history files (NOAA-EMC#3178)
  Turn DO_TEST_MODE off (NOAA-EMC#3177)
  Hotfix for gdas_arch div/0 (NOAA-EMC#3169)
  Allow building of the ufs-weather-model, WW3 pre/post execs for GFS, GEFS, SFS in the same clone of global-workflow (NOAA-EMC#3098)
  Switch Aerosol DA to use JCB and Jedi class (NOAA-EMC#3125)
  Update ufs-weather-model to 2024-12-06 commit  (NOAA-EMC#3145)
  Enable traditional threading as an option (NOAA-EMC#3149)
  Update HPC_ACCOUNT on Hercules to fv3-cpu (NOAA-EMC#3164)
  Turn C96C48_ufs_hybatmDA and C48mx500_3DVarAOWCDA into a regression test (NOAA-EMC#3120)
  Update GSI analysis jobs to use COMIN/COMOUT (NOAA-EMC#3092)
  Update HPC Tier Definitions (NOAA-EMC#3138)
  Add marine hybrid envar (NOAA-EMC#3041)
  Archive the experiment directory along with git status/diff output (NOAA-EMC#3105)
  Use stochastic restart patterns on rerun (NOAA-EMC#3077)
  Point Jenkinsfile back to CI/ (NOAA-EMC#3139)
  Fix wave restart for cold start and add ic version file (NOAA-EMC#3112)
  Allow users to override the default account at setup time (NOAA-EMC#3127)
  Refactor gridded wave post (NOAA-EMC#3014)
  Update docs related to NOAA CSPs (NOAA-EMC#3043)
  Allow APP to differ between RUNs (NOAA-EMC#2943)
  Run one executable for soca2cice (instead of two) (NOAA-EMC#3118)
  Speed up GSI analysis jobs in CI testing (NOAA-EMC#3115)
  Make aerosol output frequency variable (NOAA-EMC#2982)
  Add new stations to GFS BUFR sounding products (NOAA-EMC#3107)
  JCB-based obs+bias staging, Jedi class updates, and marine B-matrix refactoring (NOAA-EMC#2992)
  Enable tapering of atm ens perts at the model top (NOAA-EMC#3097)
  Update JGDAS ENKF POST  job  (NOAA-EMC#3090)
  SFS Runs at C96mx100  (NOAA-EMC#2960)
  Move machine-based options from config.base to host files (NOAA-EMC#3053)
  Remove RUNDIRS before running CI cases to cover re-run events (NOAA-EMC#3076)
  CI GitHub pipeline (hotfix) update for fetching repo name (NOAA-EMC#3084)
  Update JGDAS ENKF ECEN job  (NOAA-EMC#3050)
  Update snow obs processing job (NOAA-EMC#3055)
  Update to action workflow pipeline in default repo for development  (NOAA-EMC#3062)
  Update to action workflow pipeline in default repo for development (NOAA-EMC#3061)
  Update workflow pipeline (NOAA-EMC#3060)
  PW CI pipeline update5 ready for review so it can be merged and tested (NOAA-EMC#3059)
  Revert "GitHub CI Pipeline update for debugging forked PR support" (NOAA-EMC#3057)
  GitHub CI Pipeline update for debugging forked PR support (NOAA-EMC#3056)
  Add more ocean variables for post-processing in GEFS (NOAA-EMC#2995)
  Auto provisioning of PW clusters from GitHub CI added (NOAA-EMC#3051)
  Fix the name of the TC tracker filenames in archive.py (NOAA-EMC#3030)
  Make wxflow links static instead of from link_workflow (NOAA-EMC#3008)
  Update global jdas enkf diag job with COMIN/COMOUT for COM prefix (NOAA-EMC#2959)
  Add run and finalize methods to marine LETKF task (NOAA-EMC#2944)
  Fix wave restarts and GEFS FHOUT/FHMAX (NOAA-EMC#3009)
  Disabling hyper-threading (NOAA-EMC#2965)
  GitHub Actions Pipeline Updates for Self-Hosted Runners on PW (NOAA-EMC#3018)
  CI jekninsfile update hotfix (NOAA-EMC#3038)
  Update gdas.cd (NOAA-EMC#2978)
  Add ability to add tag to pslots with generate_workflows (NOAA-EMC#3036)
  CI update to shell environment with HOMEgfs to HOME_GFS for systems that need the path (NOAA-EMC#3013)
  Quick updated to Jenkins (health check) launch script (NOAA-EMC#3033)
  Document the generate_workflows.sh script (NOAA-EMC#3028)
  Replace gfs_cyc with an interval (NOAA-EMC#2928)
  Hotfix: Fix generate_workflows.sh optional build flags (NOAA-EMC#3024)
  Add a tool to run multiple YAML cases locally (NOAA-EMC#3004)
  Hotfix: Correctly set overwrite option when specified (NOAA-EMC#3021)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI-Hera-Passed **Bot use only** CI testing on Hera for this PR has completed successfully CI-Hercules-Passed **Bot use only** CI testing on Hercules for this PR has completed successfully CI-Wcoss2-Passed **Bot use only** CI testing on WCOSS for this PR has completed successfully
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Traditional threading as option for forecast model
6 participants