Square LR-Schedule #70

ClashLuke · 2022-08-13T06:41:04Z

Our learning rate scheduler currently uses a linear increase and exponential dropoff, so our learning rate curve looks like the following:

where the duration of the initial ramp-up and the decay are tuneable hyperparameters.

However, others pointed out that square ramp-up and square decay can perform significantly better, so we might also want to use them. The modified curve (orange) would look like the following:

ClashLuke added engineering Software-engineering problems that don't require ML-Expertise core Improves core model while keeping core idea intact labels Aug 13, 2022

ClashLuke changed the title ~~Inverse Square Root LR-Schedule~~ Square LR-Schedule Aug 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Square LR-Schedule #70

Square LR-Schedule #70

ClashLuke commented Aug 13, 2022

Square LR-Schedule #70

Square LR-Schedule #70

Comments

ClashLuke commented Aug 13, 2022