Skip to content

Latest commit

 

History

History
32 lines (21 loc) · 2.36 KB

README.md

File metadata and controls

32 lines (21 loc) · 2.36 KB

signSGD: compressed optimisation for non-convex problems

Here I house mxnet code for the original signSGD paper (ICML-18). I've put the code here to facilitate reproducing the results in the paper, and this code isn't intended for development purposes. In particular, this implementation does not gain any speedups from compression. Some links:

[Update Jan 2021] As noted in this issue, this codebase used an implementation of the sign function that maps sign(0) --> 0. A test in this notebook suggests there may be little difference to an implementation that maps sign(0) --> ±1 at random. In the codebase for the ICLR 2019 paper, we used an implementation that maps sign(0) --> +1 deterministically.


General instructions:

  • Signum is implemented as an official optimiser in mxnet, so to use Signum in this codebase, we pass in the string 'signum' as a command line argument.
  • if you do not use our suggested hyperparameters, be careful to tune them yourself.
  • Signum hyperparameters are typically similar to Adam hyperparameters, not SGD!

There are four folders:

  1. cifar/ -- code to train resnet-20 on Cifar-10.
  2. gradient_expts/ -- code to compute gradient statistics as in Figure 1 and 2. Includes Welford algorithm.
  3. imagenet/ -- code to train resnet-50 on Imagenet. Implementation inspired by that of Wei Wu.
  4. toy_problem/ -- simple example where signSGD is more robust than SGD.

More info to be found within each folder.


Any questions / comments? Don't hesitate to get in touch: [email protected].