Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorFlow and Horovod test #122

Closed
wants to merge 9 commits into from
Closed

Conversation

casparvl
Copy link
Collaborator

@casparvl casparvl commented Jul 5, 2021

I recreated the TensorFlow tests from #106 in the library-of-tests fromat, and reusing the same standard hooks as I've set up for e.g. GROMACS.

To run this test case:

  • Make sure that tests/reframe is in your PYTHONPATH (so that the eessi_utils is found)
  • Adjust the attached settings.py for your system (adapt partition names, #CPU cores, #sockets, # GPUs)
  • Make sure you have a new enough ReFrame installation (3.6.2 or newer should work)
  • Make sure you have a flat module naming scheme (the find_modules logic is used based on the assumption of a flat module naming scheme). Note that it does not have to be the EESSI software stack per se - it could be some locally installed modules, provided that module av Horovod and module av TensorFlow return you respective Horovod and TensorFlow modules.

After that, simply run e.g.

PYTHONPATH=$PYTHONPATH:$(pwd) reframe --config-file=config/settings_cartesius.py --checkpath eessi-checks/applications/tensorflow2.py -r --performance-report

in the software-layer/tests/reframe directory.

Caspar van Leeuwen added 2 commits December 17, 2021 15:41
…le. Set binding if mpirun is used. Update config file for magic castle to contain the relevant items
@boegel
Copy link
Contributor

boegel commented Jun 5, 2023

@casparvl Is this still relevant, especially with EESSI/test-suite#38?

I've just introduced 2021.12 and 2023.04 branches, and would like to get rid of the main branch.

@casparvl
Copy link
Collaborator Author

casparvl commented Apr 2, 2024

Closing this PR. We now have a native TensorFlow distributed test through EESSI/test-suite#38 . This PR (#122) has a TensorFlow test which uses Horovod for distribution. That's nice, if you want to test Horovod, but we can always re-implement that later in the EESSI test suite.

@casparvl casparvl closed this Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants