Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Example CI + Fixes to some broken example! #525

Closed
wants to merge 72 commits into from

Conversation

okBrian
Copy link
Contributor

@okBrian okBrian commented Jul 17, 2024

Description

Added New Example CI, - Average runtime - 30 minutes

Why MacOS runner?
From my initial testing its much faster than the ubuntu images. Some examples ran over two times faster on the MacOS runner

Why 15625, (125, 125), or (25, 25, 25) for cell boundaries?
Some 3D examples break if m < 25, and this seems like a good spot to put examples so they don't take too long

Fixes #474

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

Scope

  • This PR comprises a set of related changes with a common goal

How Has This Been Tested?

  • Github Actions Run all example with CI
  • Run all examples locally

Test Configuration:
gcc 14, Ubuntu 22,04.4 LTS
Github Action Macos Runner

Checklist

  • I have added comments for the new code
  • I ran ./mfc.sh format before committing my code
  • This PR does not introduce any repeated code (it follows the DRY principle)
  • I cannot think of a way to condense this code and reduce any introduced additional line count

@okBrian
Copy link
Contributor Author

okBrian commented Jul 17, 2024

I'm not exactly sure whats wrong with the test suite, but I think if I add the --generate flag they should be okay. Do we want the test suite to even test for examples because that would add another 30 minutes or more to them. Also I need to remove the -r flag from the example suites but it looks like it worked fine.

@sbryngelson
Copy link
Member

@okBrian we can add this as a separate GitHub workflow (Examples-test.yml, Example Smoke Test).

@sbryngelson
Copy link
Member

the -r --generate things I don't understand at the moment

@sbryngelson
Copy link
Member

@okBrian It's important that these cases run and produce output that isn't nonsense. So you can run the "example suite" with ./mfc.sh test —a (+ your flags, I guess?), where -a already exists. ' This runs the files through post_process and then checks for NaNs/Infinity.

If parallel_IO is not true then I'm not sure if -a works or not. You may need to turn it on via your toolchain tricks.

@henryleberre
Copy link
Member

I'm not exactly sure whats wrong with the test suite, but I think if I add the --generate flag they should be okay. Do we want the test suite to even test for examples because that would add another 30 minutes or more to them. Also I need to remove the -r flag from the example suites but it looks like it worked fine.

If you specify --generate it will compare the pre-process and simulation results against itself (not with the reference values from this PR). So you shouldn't pass --generate.

Whether we should or should not use the -r (--relentless) flag is debatable but we have currently opted not to use it so we should probably keep it that way. The main advantage is that we don't spend CI resources running more tests if one already failed and the workflow is marked as failed right away when the first test case fails.

@@ -180,7 +181,7 @@ def _handle_case(case: TestCase, devices: typing.Set[int]):
raise MFCException(f"Test {case}: {msg}")

if ARG("test_all"):
case.delete_output()
# case.delete_output()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might want to put this back?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It breaks the 1D bubble cases. Not sure why maybe its deleting the D folder I'm creating.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps associated with current bug in CI that if it has to 'retry' a simulation, it will try to create a directory that already exists

@okBrian
Copy link
Contributor Author

okBrian commented Jul 27, 2024

I am expecting the code to not work with --no-mpi so thats another thing to debug... I think everything with mpi should pass hopefully.

@okBrian okBrian marked this pull request as draft July 28, 2024 01:48
@sbryngelson
Copy link
Member

@okBrian you should be able to login to Frontier to see why your CI is failing. It is failing at build (link) time. It might be due to a change you made in the src/, or it could be due to the flags you added to CMake. It might be easier to debug Frontier by changing the flags and rerunning CI or just doing it on your own after login into Frontier and building there.

@sbryngelson
Copy link
Member

@okBrian, I also noticed that it failed in some cases with GNU and Intel compilers. This suggests that the flags might not be doing what they should (or you didn't add enough). One way to be sure you "have enough" is just using -O0 for all --strict builds and ensuring that passes. If that doesn't work, then there's something else awry.

CMakeLists.txt Outdated
@@ -186,13 +207,15 @@ elseif ((CMAKE_Fortran_COMPILER_ID STREQUAL "NVHPC") OR (CMAKE_Fortran_COMPILER_
if (CMAKE_BUILD_TYPE STREQUAL "Debug")
add_compile_options(
$<$<COMPILE_LANGUAGE:Fortran>:-O0>
-C
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure this syntax works. you could double check by looking at the cmake output of a specific run. i think you need to add
$<$<COMPILE_LANGUAGE:Fortran>:-<flag>> on separate lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix this in my next commit. Thanks

@okBrian
Copy link
Contributor Author

okBrian commented Sep 25, 2024

With the most recent commits these are the problematic examples
MacOS, MPI, Debug, False
2D -> Example Viscous Error: 2.51E-02
1D -> Example -> sodHypo Error: 1.01E-04

Ubuntu MPI. no-Debug, True
1D -> Example -> hypo_2materials Error: 1.74E-04
2D -> Example -> ibm_cfl_dt Error: 1.51E-04
1D -> Example -> sodHypo: Error: 1.01E-04
2D -> Example -> viscous Error: 2.51E-02
2D -> 1 Fluid(s) -> IBM Error: 1.33E-10
2D -> 1 Fluid(s) -> Viscous -> IBM Error: 2.21E-10
2D -> 1 Fluid(s) -> Viscous -> IBM -> model_eqns=3 Error: 2.21E-10
2D -> 2 Fluid(s) -> IBM Error: 5.95E-10

Ubuntu Mpi, Debug, False
2D -> Example -> ibm_cfl_dt Error: 1.65E-04

GT CPU Pheonix
2D -> Example -> laplace_pressure_jump -> NaN's detected in Case

@sbryngelson
Copy link
Member

@okBrian can you visualize the output of a few of these cases (compared to the cases that are 'working')? I suspect that, given how diverse they are, some of them are just numerically unstable and eventually produce very large numbers that different compilers handle differently.

@okBrian
Copy link
Contributor Author

okBrian commented Oct 28, 2024

Remaking PR

@okBrian okBrian closed this Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Building with strict floating point operations for testing purposes Add examples/ into CI
3 participants