-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP Quda work clover force #549
Conversation
I think that the call to |
I asked for help to quda developers lattice/quda#1330 (comment) |
It seems to me that instead where |
sorry I made a typos in the comment the function that get an error is
yes I agree |
…quda_work_clover_force
@kostrzewa can I merge the other way around, i.e. quda_work into this quda_work_clover_force to get the automatic integration test working again? |
yes, that would be the way to go anyway to synchronise before merging everything together |
@Marcogarofalo using 020dfa3 (tmLQCD) and c310d9c53431e2cf75a86e2cf0956ebcc2b9e12e (QUDA), I get the following behaviour on Juwels Booster:
In a test run which works fine with the current quda_work head commit:
using this input file: /p/scratch/isdlqcd/kostrzewa2/benchmarks/tmLQCD_QUDA_HMC/quda-tm_force_64c128/64c128_n8mpi4nt24.input |
@Marcogarofalo, which was the last QUDA commit that you tested with? It might be that some of the recent changes have messed with our interfacing with the code or, more likely, I did something wrong. |
I've started some more test runs using the functionality that you added in aeffd7a / 5d3caec and #577 on 48c96 lattices at two quark masses. One of the two runs has started and is not showing any differences > 1e-9 so far. The problem is that it always took a while for the instability to be triggered so we need to wait and see. |
move compare_derivative into its own compilation unit and make the strictness variable, specify a 'name' argument
@Marcogarofalo I think I might have at least figured out why one set of my test jobs was misbehaving. To be more specific, the ones that I referred to in:
|
…ke sure to not miss anything by doing a MPI_MAX reduction on rank 0.
FYI I have a 64c128 test job running on Booster in which there appear to be no problems. /p/scratch/isdlqcd/kostrzewa2/benchmarks/tmLQCD_QUDA_HMC/quda-tm_force_64c128/jobscript/logs/log_64c128_tm_force_quda_full_n8_9214292.out I will perform another longer run (this one has artifically short trajectories), but so far it seems that all is well. |
Hi, thanks for the good news. I am also running a test of the 64x128 in
|
Are you running the same number of trajectories per job? Otherwise the random numbers won't be identical and you are bound to see differences quite early on. After some number of trajectories there's of course nothing that you can do: the runs will diverge somewhere around 20-30 trajectories or even slightly earlier. |
In my test using a full 2+1+1 run on a 64c128 lattice things are looking good so far (the top seven trajectories are with the force offloading enabled):
I will update the above when the second set of trajectories with force offloading disabled has completed. |
no, I'll redo the test then. |
The test above #549 (comment) looks fine to me. Could you add something to the documentation about |
After doing the test with the same random number in the host and device run I got good agreement after 30 trajectories
|
A comment to 44449ad: the DET and DETRATIO monomials do support |
we never tested:
I don't see an application of this setup, at the current stage it is likely to crash. |
true ... I guess the check for "UseExternalLibrary" should include a simultaneous check for "UseExternalInverter" and spit out an error if the two are not set together... |
Note that I've merged with the current quda_work as there was a conflict in the documentation here. |
I'm a bit stuck testing this on LUMI-G because of lattice/quda#1432 I can test using an unofficial stack based on ROCm 5.6.1 but I've already found that there are issues when both GDR and P2P are enabled with my current production code (relatively recent tmLQCD-quda_work and quda-develop-02391b12). I see solvers diverging so I first need to find a combination ROCm 5.6.1 (x) GDR=[0,1] (x) P2P=[0,3] which works correctly with that before I can then move on to testing the offloaded fermionic force. |
I was able to test this on LUMI-G now and it works perfectly fine. 10% speedup overall in a large run on 128 nodes which should translate to more than that at a smaller scale. |
@Marcogarofalo this is really awesome, thanks! |
Thank you for the help! |
No description provided.