Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automatic tuning of (QUDA)-MG parameters [WIP, DO NOT MERGE] #537

Open
wants to merge 42 commits into
base: master
Choose a base branch
from

Conversation

kostrzewa
Copy link
Member

started work on a simple algorithm to automatically tune the (QUDA)-MG parameters which can be tuned without rebuilding the setup

@kostrzewa kostrzewa added the WIP DO NOT MERGE Label for pull-requests which exist to track progress during development. label Mar 24, 2022
@kostrzewa
Copy link
Member Author

kostrzewa commented Mar 24, 2022

The preliminary idea for the input is as follows but this has to be fine-tuned depending how the algorithm will turn out in the end:

BeginExternalInverter QUDA
  Pipeline = 24
  gcrNkrylov = 24
  MGNumberOfLevels = 3
  MGNumberOfVectors = 24, 32
  MGSetupSolver = cg
  MGSetup2KappaMu = 0.000224102400
  MGVerbosity = summarize, silent, silent
  MGSetupSolverTolerance = 5e-7, 5e-7
  MGSetupMaxSolverIterations = 1500, 1500
  MGCoarseSolverType = gcr, gcr, cagcr
  MGSmootherType = cagcr, cagcr, cagcr
  MGBlockSizesX = 4,3
  MGBlockSizesY = 4,3
  MGBlockSizesZ = 3,2
  MGBlockSizesT = 4,2
  
  MGCoarseMuFactor = 1.0, 1.0, 20.0
  MGCoarseMaxSolverIterations = 50, 50, 50
  MgCoarseSolverTolerance = 0.1, 0.1, 0.1
  MGSmootherPostIterations = 2, 2, 2
  MGSmootherPreIterations = 0, 0, 0
  MGSmootherTolerance = 0.1, 0.1, 0.1
  MGOverUnderRelaxationFactor = 0.85, 0.85, 0.85
  
EndExternalInverter

BeginTuneMGParams QUDA
  MGCoarseMuFactorSteps = 10, 10, 10
  MGCoarseMuFactorDelta = 0.1, 0.2, 5

  MGCoarseMaxSolverIterationsSteps = 10, 10, 10
  MGCoarseMaxSolverIterationsDelta = -5, -5, -5

  MGCoarseSolverToleranceSteps = 10, 10, 10
  MGCoarseSolverToleranceDelta = 0.05, 0.05, 0.05

  MGSmootherPreIterationsSteps = 4, 4, 4
  MGSmootherPreIterationsDelta = 1, 1, 1

  MGSmootherPostIterationsSteps = 4, 4, 4
  MGSmootherPostIterationsDelta = 1, 1, 1

  MGSmootherToleranceSteps = 4, 4, 4
  MGSmootherToleranceDelta = 0.1, 0.1, 0.1

  MGOverUnderRelaxationFactorSteps = 4, 4, 4
  MGOverUnderRelaxationFactorDelta = 0.05, 0.05, 0.05

  MGTuningIterations = 1000

  # when in a particular tuning step the improvement is less than 1%, we
  # move on to the next parameter to be tuned
  MGTuningTolerance = 0.99
EndTuneMGParams

There may be some adaptive process added to dynamically reduce the search space if certain parameter changes don't affect the tts.

@kostrzewa
Copy link
Member Author

I will probably change the input format such that one doesn't specify min/max and a number of steps but a "delta" for each parameter and level and a number of steps that this delta should be applied for

The current "algorithm" (I use the word very cautiously) can start with a completely useless setup which doesn't converge and finds something which does. Unfortunately, it doesn't yet find a better minimum than I can find by hand. However, I've tested this only on small lattices (16c32 and 24c48, albeit at the physical point) and I suspect that it will work better on larger lattices.

@kostrzewa
Copy link
Member Author

Funnily enough, this actually works and seems to find parameter sets that I would have never considered. For example, on cA211.12.48, this is a parameter set that it evolves to:

QUDA-MG param tuner: BEST SET OF PARAMETERS
-------------------------------------------
             mg_mu_factor: (1.000000, 3.000000, 27.000000)
 mg_coarse_solver_maxiter: (20, 10, 50)
     mg_coarse_solver_tol: (0.200000, 0.400000, 0.200000)
               mg_nu_post: (6, 6, 8)
                mg_nu_pre: (0, 4, 2)
          mg_smoother_tol: (0.200000, 0.200000, 0.100000)
                 mg_omega: (0.950000, 1.050000, 0.850000)
Timing: 1.989135, Iters: 51
-------------------------------------------

@kostrzewa kostrzewa changed the title skeleton for automatic tuning of (QUDA)-MG parameters [WIP, DO NOT MERGE] automatic tuning of (QUDA)-MG parameters [WIP, DO NOT MERGE] Apr 7, 2022
@kostrzewa
Copy link
Member Author

First experience on a large volume (64c128) at the physical point suggests that this tuner, surprisingly, really seems to work.

Setting

BeginTuneMGParams QUDA
  MGCoarseMuFactorSteps = 10, 10, 11
  MGCoarseMuFactorDelta = 0.25, 0.5, 5

  MGCoarseMaxSolverIterationsSteps = 10, 10, 10
  MGCoarseMaxSolverIterationsDelta = 5, 5, 5

  MGCoarseSolverToleranceSteps = 10, 10, 10
  MGCoarseSolverToleranceDelta = 0.05, 0.05, 0.05

  MGSmootherPreIterationsSteps = 2, 2, 2
  MGSmootherPreIterationsDelta = 1, 1, 1

  MGSmootherPostIterationsSteps = 2, 2, 2
  MGSmootherPostIterationsDelta = 2, 2, 2

  MGSmootherToleranceSteps = 4, 4, 4
  MGSmootherToleranceDelta = 0.1, 0.1, 0.1

  MGOverUnderRelaxationFactorSteps = 3, 3, 3
  MGOverUnderRelaxationFactorDelta = 0.05, 0.05, 0.05

  MGTuningIterations = 1000

  # when in a particular tuning step the improvement is less than 1%, we
  # move on to the next parameter to be tuned
  MGTuningTolerance = 0.99
EndTuneMGParams

and starting from

BeginExternalInverter QUDA
  Pipeline = 24
  gcrNkrylov = 24
  MGNumberOfLevels = 3
  MGNumberOfVectors = 24, 32
  MGSetupSolver = cg
  MGSetup2KappaMu = 0.000215613244
  MGVerbosity = silent, silent, silent
  MGSetupSolverTolerance = 5e-7, 5e-7
  MGSetupMaxSolverIterations = 1500, 1500
  MGCoarseSolverType = gcr, gcr, cagcr
  MGSmootherType = cagcr, cagcr, cagcr
  MGBlockSizesX = 4,2
  MGBlockSizesY = 4,2
  MGBlockSizesZ = 4,2
  MGBlockSizesT = 4,2
  MGResetSetupMDUThreshold = 1.0
  MGRefreshSetupMDUThreshold = 0.0263
  MGRefreshSetupMaxSolverIterations = 30, 30
 
  MGCoarseMuFactor = 1.0, 1.0, 20.0
  MGCoarseMaxSolverIterations = 15, 15, 15
  MGCoarseSolverTolerance = 0.1, 0.1, 0.1
  MGSmootherPostIterations = 2, 2, 2
  MGSmootherPreIterations = 0, 0, 0
  MGSmootherTolerance = 0.1, 0.1, 0.1
  MGOverUnderRelaxationFactor = 0.90, 0.90, 0.90  
EndExternalInverter

the tuner takes the solver from non-convergence through a successful solve in around 9 seconds (on Meluxina)

QUDA-MG param tuner: BEST SET OF PARAMETERS
-------------------------------------------
             mg_mu_factor: (1.000000, 1.000000, 65.000000)
 mg_coarse_solver_maxiter: (15, 15, 15)
     mg_coarse_solver_tol: (0.100000, 0.100000, 0.100000)
               mg_nu_post: (2, 2, 2)
                mg_nu_pre: (0, 0, 0)
          mg_smoother_tol: (0.100000, 0.100000, 0.100000)
                 mg_omega: (0.900000, 0.900000, 0.900000)
Timing: 8.628203, Iters: 112
-------------------------------------------

down to a solve in 2.5 seconds with parameters that I would not have thought to choose by hand:

QUDA-MG param tuner: BEST SET OF PARAMETERS
-------------------------------------------
             mg_mu_factor: (1.000000, 4.000000, 120.000000)
 mg_coarse_solver_maxiter: (15, 25, 30)
     mg_coarse_solver_tol: (0.100000, 0.200000, 0.150000)
               mg_nu_post: (2, 6, 10)
                mg_nu_pre: (0, 0, 6)
          mg_smoother_tol: (0.200000, 0.200000, 0.200000)
                 mg_omega: (0.900000, 0.900000, 0.950000)
Timing: 2.501800, Iters: 64
-------------------------------------------

@kostrzewa
Copy link
Member Author

Using these parameters in practice and comparing between the "hand-tuned" setup on the left and the auto-tuned setup on the right:

MGCoarseMuFactor = 1.0, 1.0, 80.0              ->  MGCoarseMuFactor = 1.0, 4.0, 120.0                                                                  
MGCoarseMaxSolverIterations = 30, 30, 30       ->  MGCoarseMaxSolverIterations = 15, 25, 30
MGCoarseSolverTolerance = 0.3, 0.2, 0.15       ->  MGCoarseSolverTolerance = 0.1, 0.2, 0.15
MGSmootherPostIterations = 4, 4, 6             ->  MGSmootherPostIterations = 2, 6, 10
MGSmootherPreIterations = 0, 0, 1              ->  MGSmootherPreIterations = 0, 0, 6
MGSmootherTolerance = 0.2, 0.2, 0.2            ->  MGSmootherTolerance = 0.2, 0.2, 0.2 
MGOverUnderRelaxationFactor = 1.00, 0.90, 0.90 ->  MGOverUnderRelaxationFactor = 0.90, 0.90, 0.95  

I seem to obtain very stable timings so far (red is the auto-tuned MG setup):

image

@kostrzewa kostrzewa changed the base branch from quda_work_add_actions to quda_work March 17, 2023 08:09
@kostrzewa
Copy link
Member Author

After some more runtime, extracting the time to solution of the two MG setups, I get the following histograms after resampling to get the same number of solver calls in both cases (logarithmic count axis):

image

@kostrzewa
Copy link
Member Author

kostrzewa commented Mar 21, 2023

Doing the same on a L=48 simulation at the physical point similarly leads to a very nice improvement. Below, untuned refers to a hand-selected MG setup. mk1tuned refers to the auto-tuning result after about 100 tuning iterations and mk2tuned the setup which was reached at the end of the tuning procedure.

The two "peaks" correspond to inversions related to cloverdetratio2light (below and around 1 second in the tuned setups) and cloverdetratio3light (from 1.5 seconds and up) and both timings from the HB/ACC steps as well as from the derivative are included in the histograms.

image

The final setup is:

  MGCoarseMuFactor = 1.0, 2.5, 105.0
  MGCoarseMaxSolverIterations = 15, 15, 15
  MGCoarseSolverTolerance = 0.1, 0.35, 0.25
  MGSmootherPostIterations = 2, 2, 4
  MGSmootherPreIterations = 0, 0, 1
  MGSmootherTolerance = 0.2, 0.1, 0.2
  MGOverUnderRelaxationFactor = 0.90, 0.90, 1.00  

@kostrzewa
Copy link
Member Author

note to self from meeting just now: it should be possible to integrate this directly in the HMC

  • define a time to solution threshold deemed unacceptable
  • when the solve time of monomials using the MG goes above this threshold more than N times -> enter MG tuning loop for k iterations in an attempt to stabilize the MG
    • this would allow to automatically adapt to changes in the behaviour of the MG as the simulation progresses

@kostrzewa kostrzewa changed the base branch from quda_work to master March 13, 2024 12:53
@kostrzewa
Copy link
Member Author

@Marcogarofalo @aniketsen I think I would like to merge this into the master branch as I don't think that I will get around to integrating the deriv_mg_tune functionality into the HMC itself in the near-term. At the same time I think we've shown that the functionality is genuinely useful and that it's not a loss to have it in the code. The fact that it's fully documented helps with this assessment.

What do you think?

@Marcogarofalo
Copy link
Contributor

I agree with the merging. This is already a valuable functionality which has been used multiple times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP DO NOT MERGE Label for pull-requests which exist to track progress during development.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants