Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle duplicate modules #132

Open
laraPPr opened this issue Apr 4, 2024 · 7 comments
Open

Handle duplicate modules #132

laraPPr opened this issue Apr 4, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@laraPPr
Copy link
Collaborator

laraPPr commented Apr 4, 2024

When running the test-suite on the local software stack and the EESSI software stack at the same time it is possible to come across duplicate modules.

For example on the Doduo cluster at the UGent we have the module TensorFlow/2.13.0-foss-2023a which is also available in the EESSI software stack.

When running the test-suite ReFrame will generate two tests with the same rfm_hash:

[ RUN      ] EESSI_TensorFlow %scale=4_cores %module_name=TensorFlow/2.13.0-foss-2023a %device_type=cpu /179d37ae @doduo:doduo+default

[ RUN      ] EESSI_TensorFlow %scale=4_cores %module_name=TensorFlow/2.13.0-foss-2023a %device_type=cpu /179d37ae @doduo:doduo+default

Because of this the test-case is run twice and the results are added to the same output and error files.
This causes the sanity check to fail with sanity error: 2 != 1

The tests will also use the same module twice (The module that is first in the MODULEPATH) and not testing first the local one and than the EESSI one or the other way around.

A possible solution is to deduplicate identical module name in find_modules and give a warning to the user that this is done and specify which module on their respective MODULEPATH will get tested.

@laraPPr laraPPr added the bug Something isn't working label Apr 4, 2024
@satishskamath
Copy link
Collaborator

This problem occurs, when the module system can access 2 different installations of the same module but with the same module name.

@smoors
Copy link
Collaborator

smoors commented Aug 6, 2024

this issue is hard to fix in a way that works everywhere.
i think it's better to error out in this case and tell the user to remove duplicate modules by updating their MODULEPATH.
it's up to the user to decide which one they want to test.

@laraPPr
Copy link
Collaborator Author

laraPPr commented Aug 6, 2024

And could we give a meaningful error to the user if multiple tests are run with the same ID?

@smoors
Copy link
Collaborator

smoors commented Aug 6, 2024

And could we give a meaningful error to the user if multiple tests are run with the same ID?

no clue if/how we can do that. but this shouldn't happen if there are no duplicate modules, right?

@laraPPr
Copy link
Collaborator Author

laraPPr commented Aug 6, 2024

Nope does not happen than but how where do we tell the users than don't have duplicate modules?

@smoors
Copy link
Collaborator

smoors commented Aug 6, 2024

in the find_modules function we can create a list of the matched modules and check if there are any duplicates in the list,
which should not happen.

@laraPPr
Copy link
Collaborator Author

laraPPr commented Aug 7, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants