WIP Quda work clover force #549

kostrzewa · 2022-10-06T17:32:37Z

No description provided.

Marcogarofalo · 2022-10-12T16:36:14Z

I think that the call to reorder_spinor_eo_toQuda is necessary. However the quda function reorder_gauge_toQuda return an error

Marcogarofalo · 2022-10-12T17:06:34Z

I asked for help to quda developers lattice/quda#1330 (comment)

kostrzewa · 2022-10-18T16:23:46Z

I think that the call to reorder_spinor_eo_toQuda is necessary. However the quda function reorder_gauge_toQuda return an error

reorder_gauge_toQuda is our own function: what kind of error can it return?

It seems to me that instead where computeCloverForceQuda's third argument should be an array of arrays, you are simply passing spinorIn. As a quick workaround, I guess passing &spinorIn might work when only a single field is required for x, right?

Marcogarofalo · 2022-10-18T16:32:35Z

I think that the call to reorder_spinor_eo_toQuda is necessary. However the quda function reorder_gauge_toQuda return an error

reorder_gauge_toQuda is our own function: what kind of error can it return?

sorry I made a typos in the comment the function that get an error is computeCloverForceQuda , see lattice/quda#1330

It seems to me that instead where computeCloverForceQuda's third argument should be an array of arrays, you are simply passing spinorIn. As a quick workaround, I guess passing &spinorIn might work when only a single field is required for x, right?

yes I agree

…quda_work_clover_force

Marcogarofalo · 2023-05-10T10:06:46Z

@kostrzewa can I merge the other way around, i.e. quda_work into this quda_work_clover_force to get the automatic integration test working again?

kostrzewa · 2023-05-10T10:09:49Z

@kostrzewa can I merge the other way around, i.e. quda_work into this quda_work_clover_force to get the automatic integration test working again?

yes, that would be the way to go anyway to synchronise before merging everything together

kostrzewa · 2023-12-07T23:02:46Z

@Marcogarofalo using 020dfa3 (tmLQCD) and c310d9c53431e2cf75a86e2cf0956ebcc2b9e12e (QUDA), I get the following behaviour on Juwels Booster:

00001033 0.540737337562 3039327.268655220047 0.000000e+00 101 1861 1115 9364 0 3.845400e+02 3.080185e-01
00001034 0.540737337562 3033882.783575160429 0.000000e+00 100 1850 1098 9277 0 1.325076e+02 3.080185e-01
00001035 0.540737337562 3023880.860261276364 0.000000e+00 101 1868 1111 9355 0 1.325901e+02 3.080185e-01

In a test run which works fine with the current quda_work head commit:

00001033 0.539109865241 -0.139158694074 1.149306e+00 101 1864 1026 8308 1 4.022610e+02 3.051663e-01
00001034 0.537638696155 -0.114508185536 1.121322e+00 100 1860 1023 8298 1 2.138116e+02 3.026756e-01
00001035 0.536272387032 -0.084111206234 1.087750e+00 102 1870 1047 8383 1 2.137449e+02 3.004461e-01

using this input file: /p/scratch/isdlqcd/kostrzewa2/benchmarks/tmLQCD_QUDA_HMC/quda-tm_force_64c128/64c128_n8mpi4nt24.input

monomial/cloverdetratio_monomial.c

kostrzewa · 2023-12-08T13:58:47Z

@Marcogarofalo using 020dfa3 (tmLQCD) and c310d9c53431e2cf75a86e2cf0956ebcc2b9e12e (QUDA) ...

@Marcogarofalo, which was the last QUDA commit that you tested with? It might be that some of the recent changes have messed with our interfacing with the code or, more likely, I did something wrong.

kostrzewa · 2024-01-17T11:39:45Z

I've started some more test runs using the functionality that you added in aeffd7a / 5d3caec and #577 on 48c96 lattices at two quark masses. One of the two runs has started and is not showing any differences > 1e-9 so far. The problem is that it always took a while for the instability to be triggered so we need to wait and see.

move compare_derivative into its own compilation unit and make the strictness variable, specify a 'name' argument

kostrzewa · 2024-01-17T15:51:32Z

@Marcogarofalo I think I might have at least figured out why one set of my test jobs was misbehaving. To be more specific, the ones that I referred to in:

I have two nf=2+1+1 test runs on 24c48 and 48c96 lattices in which, after a few integration steps, solvers begin to diverge (in one case the MG fails to converge, in a another run a simple CG with a large rho fails to converge and finally in yet another, a multi-shift solve diverges and produces a NaN residual).

I was using fc31c7d before you had added the 07fc55e fix.

…ke sure to not miss anything by doing a MPI_MAX reduction on rank 0.

kostrzewa · 2024-01-17T18:09:14Z

FYI I have a 64c128 test job running on Booster in which there appear to be no problems.

/p/scratch/isdlqcd/kostrzewa2/benchmarks/tmLQCD_QUDA_HMC/quda-tm_force_64c128/jobscript/logs/log_64c128_tm_force_quda_full_n8_9214292.out

I will perform another longer run (this one has artifically short trajectories), but so far it seems that all is well.

Marcogarofalo · 2024-01-18T10:57:24Z

Hi, thanks for the good news. I am also running a test of the 64x128 in
/qbigwork2/garofalo/builds/tmLQCD_fermionic_forces/phys_test/
so far the plaquette is ok while $\Delta H$ started to show differences

::::::::::::::
device/output.data
::::::::::::::
00001033 0.539124377721 -0.181064112112 1.198492e+00 99 1850 1008 8217 1 7.052586e+02 3.051752e-01
00001034 0.537658649417 -0.136323347688 1.146052e+00 102 1869 1036 8334 1 6.922392e+02 3.026848e-01
00001035 0.536295420353 -0.072135765105 1.074801e+00 101 1888 1037 8464 1 6.917731e+02 3.004625e-01
00001036 0.535011552829 -0.135401181877 1.144996e+00 101 1875 1025 8362 1 7.069782e+02 2.983943e-01
00001037 0.533843994664 0.000084528700 9.999155e-01 101 1875 1033 8372 1 6.941697e+02 2.965856e-01
00001038 0.532752973206 -0.088634211570 1.092681e+00 100 1872 1018 8313 1 7.003413e+02 2.948908e-01
00001039 0.531704500459 -0.006030147895 1.006048e+00 101 1866 1025 8286 1 6.922550e+02 2.933619e-01
00001040 0.530735472711 -0.022664899006 1.022924e+00 101 1883 1028 8354 1 6.910189e+02 2.919589e-01
::::::::::::::
host/output.data
::::::::::::::
00001033 0.539124377721 -0.181256163865 1.198722e+00 99 1850 1010 8218 1 1.087587e+03 3.051752e-01
00001034 0.537658649417 -0.136414341629 1.146157e+00 102 1866 1037 8334 1 1.079608e+03 3.026848e-01
00001035 0.536289797778 -0.074235649779 1.077061e+00 101 1885 1037 8444 1 1.114638e+03 3.004565e-01
00001036 0.535016437923 -0.062977313995 1.065003e+00 100 1877 1019 8385 1 1.103632e+03 2.984491e-01
00001037 0.533841333976 -0.071600986645 1.074227e+00 100 1848 1009 8207 1 1.101621e+03 2.966407e-01

kostrzewa · 2024-01-18T12:16:48Z

Are you running the same number of trajectories per job? Otherwise the random numbers won't be identical and you are bound to see differences quite early on. After some number of trajectories there's of course nothing that you can do: the runs will diverge somewhere around 20-30 trajectories or even slightly earlier.

kostrzewa · 2024-01-18T12:19:21Z

In my test using a full 2+1+1 run on a 64c128 lattice things are looking good so far (the top seven trajectories are with the force offloading enabled):

# n8 with quda force full | quda_version quda-develop-25d85b | slurm_job_id 9214636
00001033 0.540752271381 0.043441347778 9.574887e-01 100 14226 175 12811 34394 0 1010 61291 1214 43192 932 3870 109 6159 4976 77081 1 4.504216e+03 3.080557e-01
00001034 0.540752271381 0.565656803548 5.679870e-01 102 14220 179 12795 35007 0 1040 61200 1267 43109 932 3848 109 5938 5168 78220 0 4.421921e+03 3.080557e-01
00001035 0.540713299781 0.471504589543 6.240626e-01 99 14225 175 12849 34024 0 1005 61374 1210 43367 933 3822 107 5853 4949 77301 1 4.452976e+03 3.080064e-01
00001036 0.540689619806 -0.210317837074 1.234070e+00 199 28422 278 25633 48012 0 1920 122603 1901 86464 975 7664 169 11686 7556 153829 1 8.718310e+03 3.079745e-01
00001037 0.540717490165 -0.093398010358 1.097899e+00 99 14214 174 12805 34093 0 1010 61197 1213 43164 934 3835 106 5917 4964 77340 1 4.464691e+03 3.080230e-01
00001038 0.540701838357 -0.041311362758 1.042177e+00 100 14174 176 12763 34198 0 1012 61049 1224 42988 932 3843 109 5865 4968 76331 1 4.421069e+03 3.080025e-01
00001039 0.540722992628 -0.098816601560 1.103864e+00 98 14308 174 12939 33863 0 998 61671 1214 43740 935 3829 108 5950 4894 76960 1 4.427378e+03 3.080321e-01
# n8 with quda no_force full | quda_version quda-develop-25d85b | slurm_job_id 9217535
00001033 0.540752271382 0.043261365965 9.576611e-01 100 14224 175 12813 34388 0 1010 61298 1214 43175 932 3874 109 6158 4974 77093 1 5.152877e+03 3.080557e-01
00001034 0.540752271382 0.565140699968 5.682802e-01 101 14221 180 12794 35034 0 1038 61215 1250 43091 933 3846 109 5942 5161 78216 0 5.070436e+03 3.080557e-01
00001035 0.540713299781 0.471584610641 6.240127e-01 99 14226 175 12850 34019 0 1006 61371 1209 43347 933 3822 107 5851 4957 77339 1 5.078761e+03 3.080064e-01
00001036 0.540689619817 -0.210258966312 1.233998e+00 199 28417 278 25631 48004 0 1922 122607 1908 86476 975 7666 169 11701 7541 153884 1 9.954171e+03 3.079745e-01
00001037 0.540717490182 -0.093846568838 1.098391e+00 99 14217 174 12807 34072 0 1007 61186 1211 43166 934 3838 107 5913 4956 77344 1 5.083100e+03 3.080230e-01
00001038 0.540701838370 -0.041504077613 1.042377e+00 100 14175 177 12764 34232 0 1013 61043 1225 43011 931 3841 108 5864 4971 76314 1 5.042457e+03 3.080025e-01
00001039 0.540722992685 -0.097678773105 1.102609e+00 98 14306 175 12936 33853 0 998 61648 1214 43753 935 3830 108 5942 4923 76966 1 5.047067e+03 3.080321e-01

I will update the above when the second set of trajectories with force offloading disabled has completed.

kostrzewa · 2024-01-18T12:20:26Z

I think whatever problems I seemed to have were related to the one oversight of employing fc31c7d before you had added the 07fc55e fix plus later on perhaps some random mismatch between tmLQCD and the QUDA version that I was using.

Marcogarofalo · 2024-01-18T13:23:23Z

Are you running the same number of trajectories per job?

no, I'll redo the test then.

kostrzewa · 2024-01-19T09:30:59Z

The test above #549 (comment) looks fine to me. Could you add something to the documentation about UseExternalLibrary = quda having to be set to use the offloaded fermionic force? I think we can then merge this and start using in production. Thanks a lot!

Marcogarofalo · 2024-01-19T13:12:10Z

After doing the test with the same random number in the host and device run I got good agreement after 30 trajectories

==> device/output.data <==
00001033 0.539124377721 -0.181064112112 1.198492e+00 99 1850 1008 8217 1 6.779430e+02 3.051752e-01
00001034 0.537658649417 -0.136323347688 1.146052e+00 102 1869 1036 8334 1 6.657363e+02 3.026848e-01
...
00001063 0.516763973986 -0.057475058362 1.059159e+00 101 1924 867 7139 1 6.612186e+02 2.728876e-01

==> host/output.data <==
00001033 0.539124377721 -0.181048505008 1.198473e+00 99 1850 1008 8220 1 1.075596e+03 3.051752e-01
00001034 0.537658649417 -0.136431051418 1.146176e+00 102 1869 1034 8337 1 1.071386e+03 3.026848e-01
...
00001063 0.516763973986 -0.057487478480 1.059172e+00 101 1928 867 7138 1 1.080475e+03 2.728876e-01

kostrzewa · 2024-01-19T13:53:59Z

A comment to 44449ad: the DET and DETRATIO monomials do support EnableExternalInverter (or at least they should).

Marcogarofalo · 2024-01-19T14:11:31Z

we never tested:

  useexternalinverter = no
  UseExternalLibrary = quda

I don't see an application of this setup, at the current stage it is likely to crash.

kostrzewa · 2024-01-19T14:17:11Z

we never tested:

true ... I guess the check for "UseExternalLibrary" should include a simultaneous check for "UseExternalInverter" and spit out an error if the two are not set together...

kostrzewa · 2024-01-19T14:17:39Z

Note that I've merged with the current quda_work as there was a conflict in the documentation here.

…verter=quda

kostrzewa · 2024-01-23T10:58:13Z

I'm a bit stuck testing this on LUMI-G because of lattice/quda#1432

I can test using an unofficial stack based on ROCm 5.6.1 but I've already found that there are issues when both GDR and P2P are enabled with my current production code (relatively recent tmLQCD-quda_work and quda-develop-02391b12). I see solvers diverging so I first need to find a combination ROCm 5.6.1 (x) GDR=[0,1] (x) P2P=[0,3] which works correctly with that before I can then move on to testing the offloaded fermionic force.

read_input.l

…ions of QUDA

monomial/monomial.c

kostrzewa · 2024-01-27T06:48:37Z

I was able to test this on LUMI-G now and it works perfectly fine. 10% speedup overall in a large run on 128 nodes which should translate to more than that at a smaller scale.

kostrzewa · 2024-01-27T07:08:45Z

@Marcogarofalo this is really awesome, thanks!

Marcogarofalo · 2024-01-29T09:44:24Z

Thank you for the help!

simone-romiti added 2 commits October 6, 2022 16:58

first steps: building separate function for quda initialization

0185475

1st draft quda_cloverdet_derivative()

c318eb4

kostrzewa changed the title ~~Quda work clover force~~ WIP Quda work clover force Oct 6, 2022

reorder_spinor_eo_toQuda

1abcfb1

third argument of computeCloverForceQuda

1eea6fd

Marcogarofalo and others added 10 commits November 25, 2022 16:40

using computeTMCloverForceQuda

b86e617

set CG as solver for cloverder derivative test

2c1eac1

passing the correct gauge filed

cbaa9d4

check for nan or inf

71cac33

xchange_deri before comparing to quda

a240e8b

QUDA only support even odd fermionic forces

a30f44a

inconsistent stopwatch labelling

a52021f

compression type

b3596ec

Merge branch 'quda_work_clover_force' of github.com:etmc/tmLQCD into …

bf372f4

…quda_work_clover_force

Merge branch 'quda_work' into quda_work_clover_force

474dcf1

Marcogarofalo added 5 commits October 1, 2023 23:09

remove unused param

6baaca3

detratio derivative with quda

288a305

restore parameters after using QUDA MG

7b21655

parity flags more meaningful

c5becee

fix omp in reorder_mom_fromQuda

020dfa3

kostrzewa commented Dec 7, 2023

View reviewed changes

monomial/cloverdetratio_monomial.c Show resolved Hide resolved

OMP volume loop

0df1801

Merge pull request #577 from etmc/compare_derivative

d830e42

move compare_derivative into its own compilation unit and make the strictness variable, specify a 'name' argument

In compare_derivative, output a status message even if n_diff = 0. Ma…

28298a0

…ke sure to not miss anything by doing a MPI_MAX reduction on rank 0.

documentation of UseExternalLibrary

ac74bfe

remove duplicate item DET,CLOVERDET

44449ad

documentation: move UseExternalInverter to DET and DETRATIO

09d8797

merge with quda_work

9699e15

add check if UseExternalLibrary=quda you must also have UseExternalIn…

fa3b051

…verter=quda

kostrzewa commented Jan 23, 2024

View reviewed changes

read_input.l Outdated Show resolved Hide resolved

Marcogarofalo added 2 commits January 23, 2024 16:55

fix check of UseExternalLibrary=quda && UseExternalInverter=quda

a86fee5

add configure flag --enable-quda_fermionic_forces=yes/no for old vers…

6d2f3fe

…ions of QUDA

Marcogarofalo reviewed Jan 25, 2024

View reviewed changes

monomial/monomial.c Show resolved Hide resolved

kostrzewa merged commit 28295ac into quda_work Jan 27, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP Quda work clover force #549

WIP Quda work clover force #549

kostrzewa commented Oct 6, 2022

Marcogarofalo commented Oct 12, 2022

Marcogarofalo commented Oct 12, 2022

kostrzewa commented Oct 18, 2022

Marcogarofalo commented Oct 18, 2022

Marcogarofalo commented May 10, 2023

kostrzewa commented May 10, 2023

kostrzewa commented Dec 7, 2023

kostrzewa commented Dec 8, 2023

kostrzewa commented Jan 17, 2024

kostrzewa commented Jan 17, 2024

kostrzewa commented Jan 17, 2024

Marcogarofalo commented Jan 18, 2024

kostrzewa commented Jan 18, 2024

kostrzewa commented Jan 18, 2024 •

edited

Loading

kostrzewa commented Jan 18, 2024

Marcogarofalo commented Jan 18, 2024

kostrzewa commented Jan 19, 2024

Marcogarofalo commented Jan 19, 2024

kostrzewa commented Jan 19, 2024

Marcogarofalo commented Jan 19, 2024

kostrzewa commented Jan 19, 2024

kostrzewa commented Jan 19, 2024

kostrzewa commented Jan 23, 2024

kostrzewa commented Jan 27, 2024

kostrzewa commented Jan 27, 2024

Marcogarofalo commented Jan 29, 2024

WIP Quda work clover force #549

WIP Quda work clover force #549

Conversation

kostrzewa commented Oct 6, 2022

Marcogarofalo commented Oct 12, 2022

Marcogarofalo commented Oct 12, 2022

kostrzewa commented Oct 18, 2022

Marcogarofalo commented Oct 18, 2022

Marcogarofalo commented May 10, 2023

kostrzewa commented May 10, 2023

kostrzewa commented Dec 7, 2023

kostrzewa commented Dec 8, 2023

kostrzewa commented Jan 17, 2024

kostrzewa commented Jan 17, 2024

kostrzewa commented Jan 17, 2024

Marcogarofalo commented Jan 18, 2024

kostrzewa commented Jan 18, 2024

kostrzewa commented Jan 18, 2024 • edited Loading

kostrzewa commented Jan 18, 2024

Marcogarofalo commented Jan 18, 2024

kostrzewa commented Jan 19, 2024

Marcogarofalo commented Jan 19, 2024

kostrzewa commented Jan 19, 2024

Marcogarofalo commented Jan 19, 2024

kostrzewa commented Jan 19, 2024

kostrzewa commented Jan 19, 2024

kostrzewa commented Jan 23, 2024

kostrzewa commented Jan 27, 2024

kostrzewa commented Jan 27, 2024

Marcogarofalo commented Jan 29, 2024

kostrzewa commented Jan 18, 2024 •

edited

Loading