diff --git a/RULES.md b/RULES.md index 68f47964e..873cc1786 100644 --- a/RULES.md +++ b/RULES.md @@ -414,7 +414,7 @@ While the training time to the *test set* target is used for scoring, we use the #### Benchmarking hardware -All scored runs have to be performed on the benchmarking hardware to allow for a fair comparison of training times. The benchmarking hardware has to be chosen to be easily accessible via common cloud computing providers. The exact hardware specification will be specified in the call for submissions and will most likely change with each iteration of the benchmark. As a placeholder, we are currently planning with 8xV100 GPUs with 16GB of VRAM per card, e.g. the [p3.16xlarge instance on AWS](https://aws.amazon.com/ec2/instance-types/) or the [NVIDIA V100 8 GPUs instance on GCP](https://cloud.google.com/compute/docs/gpus#other_available_nvidia_gpu_models). +All scored runs have to be performed on the benchmarking hardware to allow for a fair comparison of training times. The benchmarking hardware has to be chosen to be easily accessible via common cloud computing providers. The exact hardware specification will be specified in the call for submissions and will most likely change with each iteration of the benchmark. As a placeholder, we are currently planning with 8xV100 GPUs with 16GB of VRAM per card, e.g. the [p3.16xlarge instance on AWS](https://aws.amazon.com/ec2/instance-types/) or the [NVIDIA V100 8 GPUs instance on GCP](https://cloud.google.com/compute/docs/gpus#nvidia_v100_gpus). For self-reported results, it is acceptable to perform the tuning trials on hardware different from the benchmarking hardware, as long as the same hardware is used for all tuning trials. Once the best trial, i.e. the one that reached the *validation* target the fastest, was determined, this run has to be repeated on the competition hardware. For example, submitters can tune using their locally available hardware but have to use the benchmarking hardware, e.g. via cloud providers, for the $5$ scored runs. This allows for a fair comparison to the reported results of other submitters while allowing some flexibility in the hardware. diff --git a/getting_started.md b/getting_started.md new file mode 100644 index 000000000..e5ed30416 --- /dev/null +++ b/getting_started.md @@ -0,0 +1,106 @@ +# Getting Started + +## Workspace set up and installation +To get started you will have to make a few decisions and install the repository along with its dependencies. Specifically: +1. Decide if you would like to develop your submission in either Pytorch or Jax. +2. Set up your workstation or VM. We recommend to use a setup similar to the [benchmarking hardware](https://github.com/mlcommons/algorithmic-efficiency/blob/main/RULES.md#benchmarking-hardware) which consists out of 8 V100 GPUs, 240 GB in RAM and 2 TB in storage for dataset storage. For further recommendations on setting up your own Cloud VM see [here](https://github.com/mlcommons/algorithmic-efficiency/blob/main/docker/README.md#gcp-integration). +4. Clone the algorithmic-efficiency repository and make sure you have installed the dependencies. To install the dependencies we recommend either: + 1. Installing in a python virtual environment as described in the [Installation section in the README](https://github.com/mlcommons/algorithmic-efficiency/blob/main/README.md) instructions. + 2. Use a Docker container with dependencies installed. This option will guarantee that you're code is running on the same CUDA, cudnn, and python dependencies as the competition scoring environment. See the [Docker README](https://github.com/mlcommons/algorithmic-efficiency/blob/main/docker/README.md) for instructions on the Docker workflow. + +## Download the data +The workloads in this benchmark use 6 different datasets across 8 workloads. You may choose to download some or all of the datasets as you are developing your submission, but your submission will be scored across all 8 workloads. For instructions on obtaining and setting up the datasets see [datasets/README](https://github.com/mlcommons/algorithmic-efficiency/blob/main/datasets/README.md#dataset-setup). + + +## Develop your submission +To develop a submission you will write a python module containing your optimizer algorithm. Your optimizer must implement a set of predefined API methods for the initialization and update steps. + +### Set up your directory structure (Optional) +Make a submissions subdirectory to store your submission modules e.g. `algorithmic-effiency/submissions/my_submissions`. + +### Coding your submission +You can find examples of sumbission modules under `algorithmic-efficiency/baselines` and `algorithmic-efficiency/reference_algorithms`. \ +A submission for the external ruleset will consist of a submission module and a tuning search space definition. +1. Copy the template submission module `submissions/template/submission.py` into your submissions directory e.g. in `algorithmic-efficiency/my_submissions`. +2. Implement at least the methods in the template submission module. Feel free to use helper functions and/or modules as you see fit. Make sure you adhere to to the competition rules. Check out the guidelines for [allowed submissions](https://github.com/mlcommons/algorithmic-efficiency/blob/main/RULES.md#disallowed-submissions), [disallowed submissions](https://github.com/mlcommons/algorithmic-efficiency/blob/main/RULES.md#disallowed-submissions) and pay special attention to the [software dependencies rule](https://github.com/mlcommons/algorithmic-efficiency/blob/main/RULES.md#software-dependencies). +3. Add a tuning configuration e.g. `tuning_search_space.json` file to your submission directory. For the tuning search space you can either: + 1. Define the set of feasible points by defining a value for "feasible_points" for the hyperparameters: + ``` + { + "learning_rate": { + "feasible_points": 0.999 + }, + } + ``` + For a complete example see [tuning_search_space.json](https://github.com/mlcommons/algorithmic-efficiency/blob/main/reference_algorithms/target_setting_algorithms/imagenet_resnet/tuning_search_space.json). + + 2. Define a range of values for quasirandom sampling by specifing a `min`, `max` and `scaling` + keys for the hyperparameter: + ``` + { + "weight_decay": { + "min": 5e-3, + "max": 1.0, + "scaling": "log", + } + } + ``` + For a complete example see [tuning_search_space.json](https://github.com/mlcommons/algorithmic-efficiency/blob/main/baselines/nadamw/tuning_search_space.json). + + +## Run your submission +You can evaluate your submission with the `submission_runner.py` module on one workload at a time. + +### JAX submissions +To score your submission on a workload, from the algorithmic-efficency directory run: +```bash +python3 submission_runner.py \ + --framework=jax \ + --workload=mnist \ + --experiment_dir=\ + --experiment_name= \ + --submission_path=submissions/my_submissions/submission.py \ + --tuning_search_space= +``` + +### PyTorch submissions +To score your submission on a workload, from the algorithmic-efficency directory run: +```bash +python3 submission_runner.py \ + --framework=pytorch \ + --workload= \ + --experiment_dir= \ + --experiment_name= \ + --submission_path= \ + --tuning_search_space= +``` + +#### Pytorch DDP +We recommend using PyTorch's [Distributed Data Parallel (DDP)](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html) +when using multiple GPUs on a single node. You can initialize ddp with torchrun. +For example, on single host with 8 GPUs simply replace `python3` in the above command by: +```bash +torchrun --redirects 1:0,2:0,3:0,4:0,5:0,6:0,7:0 --standalone --nnodes=1 --nproc_per_node=N_GPUS +``` +So the complete command is: +```bash +torchrun --redirects 1:0,2:0,3:0,4:0,5:0,6:0,7:0 \ + --standalone \ + --nnodes=1 \ + --nproc_per_node=N_GPUS \ + submission_runner.py \ + --framework=pytorch \ + --workload= \ + --experiment_dir= \ + --experiment_name= \ + --submission_path= \ + --tuning_search_space= +``` + +## Run your submission in a Docker container +TODO(kasimbeg) +## Score your submission +TODO(kasimbeg) +## Good Luck! + + diff --git a/submissions/template/submission.py b/submissions/template/submission.py new file mode 100644 index 000000000..83297a7d9 --- /dev/null +++ b/submissions/template/submission.py @@ -0,0 +1,77 @@ +"""Template submission module. + +See https://github.com/mlcommons/algorithmic-efficiency/blob/main/RULES.md#allowed-submissions +and https://github.com/mlcommons/algorithmic-efficiency/blob/main/RULES.md#disallowed-submissions +for guidelines. +""" + + +def init_optimizer_state(workload: spec.Workload, + model_params: spec.ParameterContainer, + model_state: spec.ModelAuxiliaryState, + hyperparameters: spec.Hyperparameters, + rng: spec.RandomState) -> spec.OptimizerState: + """Creates a Nesterov optimizer and a learning rate schedule. + Returns: + optimizer state + optimizer_update_fn + """ + pass + + +def update_params(workload: spec.Workload, + current_param_container: spec.ParameterContainer, + current_params_types: spec.ParameterTypeTree, + model_state: spec.ModelAuxiliaryState, + hyperparameters: spec.Hyperparameters, + batch: Dict[str, spec.Tensor], + loss_type: spec.LossType, + optimizer_state: spec.OptimizerState, + eval_results: List[Tuple[int, float]], + global_step: int, + rng: spec.RandomState) -> spec.UpdateReturn: + """ + Returns: + (new_optimizer_state, update_fn) + new_params + new_model_state + """ + pass + + +def get_batch_size(workload_name): + """ + Returns batch size for each workload. + Valid workload_name values are in + ["wmt", + "ogbg", + "criteo1tb", + "fastmri", + "imagenet_resnet", + "imagenet_vit", + "librispeech_deepspeech", + "librispeech_conformer"] + Returns: + batch_size + + """ + pass + + +def data_selection(workload: spec.Workload, + input_queue: Iterator[Dict[str, spec.Tensor]], + optimizer_state: spec.OptimizerState, + current_param_container: spec.ParameterContainer, + model_state: spec.ModelAuxiliaryState, + hyperparameters: spec.Hyperparameters, + global_step: int, + rng: spec.RandomState) -> Dict[str, spec.Tensor]: + """Select data from the infinitely repeating, pre-shuffled input queue. + Each element of the queue is a batch of training examples and labels. + Tip: + If you would just like the next batch from the input queue return next(input_queue). + + Returns: + batch: next batch of input data + """ + pass