Calculating GNS for other optimizers #143

HariSeldon11988 · 2024-08-20T18:16:20Z

Dear all,

I have a question about your calculation of GNS. You use different classes for the calculation (GradientNoiseScale, AdamGradientNoiseScale). As far as I understand, the preconditioner matrices differ for each class (GNS is always 1, AdamGNS is adjusted).

I have three questions regarding this:

For which optimizers does the code work or deliver correct results, and what would need to be done if using other optimizers to correctly calculate the GNS? Is it correct, that it only works for SGD, Adam, Adagrad?
To what extent does the scheduler or scaling rule influence the calculation of the GNS because this is the criteria on which you decide how the GNS is calculated?
If I use another optimizer than Adam or AdamW the "normal" GNS class is used for calculating the GNS (with precondition matrices 1). Does this work for all other opimizer like SGD, LAMB and some other, or is this only valid for SGD. (precondition = 1 should be the vanilla SGD case from the original GNS paper "An Empirical Model of Large-Batch Training".

Here is the relevant code:

if not scaling_rule and (isinstance(optimizer, torch.optim.Adam) or
                                 isinstance(optimizer, torch.optim.AdamW)):
     self.scaling_rule = AdamScale()
else:
     self.scaling_rule = scaling_rule or AdaScale()

if isinstance(scaling_rule, AdamScale):
     self.gns = AdamGradientNoiseScale(self, optimizer,
                                              mp_scaler=mp_scaler)
else:
     self.gns = GradientNoiseScale(self, optimizer, mp_scaler=mp_scaler)
self.scaling_rule.initialize(self, optimizer, patch_optimizer=True)

I would appreciate any kind of help.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculating GNS for other optimizers #143

Calculating GNS for other optimizers #143

HariSeldon11988 commented Aug 20, 2024

Calculating GNS for other optimizers #143

Calculating GNS for other optimizers #143

Comments

HariSeldon11988 commented Aug 20, 2024