Skip to content

Latest commit

 

History

History
347 lines (256 loc) · 13.3 KB

GETTING_STARTED.md

File metadata and controls

347 lines (256 loc) · 13.3 KB

MLCommons™ AlgoPerf: Getting Started

Table of Contents

Set Up and Installation

To get started you will have to make a few decisions and install the repository along with its dependencies. Specifically:

  1. Decide if you would like to develop your submission in either PyTorch or JAX.
  2. Set up your workstation or VM. We recommend to use a setup similar to the benchmarking hardware. The specs on the benchmarking machines are:
    • 8xV100 GPUs
    • 240 GB in RAM
    • 2 TB in storage (for datasets).
  3. Install the algorithmic package and dependencies either in a Python virtual environment or use a Docker (recommended) or Singularity/Apptainer container.

Python Virtual Environment

Prerequisites:

  • Python minimum requirement >= 3.8
  • CUDA 11.8
  • NVIDIA Driver version 535.104.05

To set up a virtual enviornment and install this repository

  1. Create new environment, e.g. via conda or virtualenv

    sudo apt-get install python3-venv
    python3 -m venv env
    source env/bin/activate
  2. Clone this repository

    git clone https://github.com/mlcommons/algorithmic-efficiency.git
    cd algorithmic-efficiency
  3. Run the following pip3 install commands based on your chosen framework to install algorithmic_efficiency and its dependencies.

    For JAX:

    pip3 install -e '.[pytorch_cpu]'
    pip3 install -e '.[jax_gpu]' -f 'https://storage.googleapis.com/jax-releases/jax_cuda_releases.html'
    pip3 install -e '.[full]'

    For PyTorch

    pip3 install -e '.[jax_cpu]'
    pip3 install -e '.[pytorch_gpu]' -f 'https://download.pytorch.org/whl/torch_stable.html'
    pip3 install -e '.[full]'
Per workload installations You can also install the requirements for individual workloads, e.g. via
pip3 install -e '.[librispeech]'

or all workloads at once via

pip3 install -e '.[full]'

Docker

We recommend using a Docker container to ensure a similar environment to our scoring and testing environments. Alternatively, a Singularity/Apptainer container can also be used (see instructions below).

Prerequisites:

  • NVIDIA Driver version 535.104.05
  • NVIDIA Container Toolkit so that the containers can locate the NVIDIA drivers and GPUs. See instructions here.

Building Docker Image

  1. Clone this repository

    cd ~ && git clone https://github.com/mlcommons/algorithmic-efficiency.git
  2. Build Docker image

    cd algorithmic-efficiency/docker
    docker build -t <docker_image_name> . --build-arg framework=<framework>

    The framework flag can be either pytorch, jax or both. Specifying the framework will install the framework specific dependencies. The docker_image_name is arbitrary.

Running Docker Container (Interactive)

To use the Docker container as an interactive virtual environment, you can run a container mounted to your local data and code directories and execute the bash program. This may be useful if you are in the process of developing a submission.

  1. Run detached Docker container. The container_id will be printed if the container is run successfully.

    docker run -t -d \
      -v $HOME/data/:/data/ \
      -v $HOME/experiment_runs/:/experiment_runs \
      -v $HOME/experiment_runs/logs:/logs \
      -v $HOME/algorithmic-efficiency:/algorithmic-efficiency \
      --gpus all \
      --ipc=host \
      <docker_image_name> \
      --keep_container_alive true

    Note: You may have to use double quotes around algorithmic-efficiency [path] in the mounting -v flag. If the above command fails try replacing the following line:

    -v $HOME/algorithmic-efficiency:/algorithmic-efficiency2 \

    with

    -v $HOME"/algorithmic-efficiency:/algorithmic-efficiency" \
  2. Open a bash terminal

    docker exec -it <container_id> /bin/bash

Using Singularity/Apptainer instead of Docker

Since many compute clusters don't allow the usage of Docker due to securtiy concerns and instead encourage the use of Singularity/Apptainer (formerly Singularity, now called Apptainer), we also provide instructions on how to build an Apptainer container based on the here provided Dockerfile.

To convert the Dockerfile into an Apptainer definition file, we will use spython:

pip3 install spython
cd algorithmic-efficiency/docker
spython recipe Dockerfile &> Singularity.def

Now we can build the Apptainer image by running

singularity build --fakeroot <singularity_image_name>.sif Singularity.def

To start a shell session with GPU support (by using the --nv flag), we can run

singularity shell --nv <singularity_image_name>.sif 

Similarly to Docker, Apptainer allows you to bind specific paths on the host system and the container by specifying the --bind flag, as explained here.

Download the Data

The workloads in this benchmark use 6 different datasets across 8 workloads. You may choose to download some or all of the datasets as you are developing your submission, but your submission will be scored across all 8 workloads. For instructions on obtaining and setting up the datasets see datasets/README.

Develop your Submission

To develop a submission you will write a Python module containing your training algorithm. Your training algorithm must implement a set of predefined API methods for the initialization and update steps.

Set Up Your Directory Structure (Optional)

Make a submissions subdirectory to store your submission modules e.g. algorithmic-effiency/submissions/my_submissions.

Coding your Submission

You can find examples of sumbission modules under algorithmic-efficiency/baselines and algorithmic-efficiency/reference_algorithms.
A submission for the external ruleset will consist of a submission module and a tuning search space definition.

  1. Copy the template submission module submissions/template/submission.py into your submissions directory e.g. in algorithmic-efficiency/my_submissions.
  2. Implement at least the methods in the template submission module. Feel free to use helper functions and/or modules as you see fit. Make sure you adhere to to the competition rules. Check out the guidelines for allowed submissions, disallowed submissions and pay special attention to the software dependencies rule.
  3. Add a tuning configuration e.g. tuning_search_space.json file to your submission directory. For the tuning search space you can either:
    1. Define the set of feasible points by defining a value for "feasible_points" for the hyperparameters:

      {
          "learning_rate": {
              "feasible_points": 0.999
              },
      }

      For a complete example see tuning_search_space.json.

    2. Define a range of values for quasirandom sampling by specifing a min, max and scaling keys for the hyperparameter:

      {
          "weight_decay": {
              "min": 5e-3, 
              "max": 1.0, 
              "scaling": "log",
              }
      }

      For a complete example see tuning_search_space.json.

Run your Submission

From your virtual environment or interactively running Docker container run your submission with submission_runner.py:

JAX: to score your submission on a workload, from the algorithmic-efficency directory run:

python3 submission_runner.py \
    --framework=jax \
    --workload=mnist \
    --experiment_dir=<path_to_experiment_dir>\
    --experiment_name=<experiment_name> \
    --submission_path=submissions/my_submissions/submission.py \
    --tuning_search_space=<path_to_tuning_search_space>

Pytorch: to score your submission on a workload, from the algorithmic-efficency directory run:

python3 submission_runner.py \
    --framework=pytorch \
    --workload=<workload> \
    --experiment_dir=<path_to_experiment_dir> \
    --experiment_name=<experiment_name> \
    --submission_path=<path_to_submission_module> \
    --tuning_search_space=<path_to_tuning_search_space>

Pytorch DDP

We recommend using PyTorch's Distributed Data Parallel (DDP) when using multiple GPUs on a single node. You can initialize ddp with torchrun. For example, on single host with 8 GPUs simply replace python3 in the above command by:

torchrun --redirects 1:0,2:0,3:0,4:0,5:0,6:0,7:0 --standalone --nnodes=1 --nproc_per_node=N_GPUS

where N_GPUS is the number of available GPUs on the node.

So the complete command is:

torchrun --redirects 1:0,2:0,3:0,4:0,5:0,6:0,7:0 \
    --standalone \
    --nnodes=1 \
    --nproc_per_node=N_GPUS \
    submission_runner.py \
    --framework=pytorch \
    --workload=<workload> \
    --experiment_dir=<path_to_experiment_dir> \
    --experiment_name=<experiment_name> \
    --submission_path=<path_to_submission_module> \
    --tuning_search_space=<path_to_tuning_search_space>

Run your Submission in a Docker Container

The container entrypoint script provides the following flags:

  • --dataset dataset: can be 'imagenet', 'fastmri', 'librispeech', 'criteo1tb', 'wmt', or 'ogbg'. Setting this flag will download data if ~/data/<dataset> does not exist on the host machine. Required for running a submission.
  • --framework framework: can be either 'pytorch' or 'jax'. If you just want to download data, this flag is required for -d imagenet since we have two versions of data for imagenet. This flag is also required for running a submission.
  • --submission_path submission_path: path to submission file on container filesystem. If this flag is set, the container will run a submission, so it is required for running a submission.
  • --tuning_search_space tuning_search_space: path to file containing tuning search space on container filesystem. Required for running a submission.
  • --experiment_name experiment_name: name of experiment. Required for running a submission.
  • --workload workload: can be 'imagenet_resnet', 'imagenet_jax', 'librispeech_deepspeech', 'librispeech_conformer', 'ogbg', 'wmt', 'fastmri' or 'criteo1tb'. Required for running a submission.
  • --max_global_steps max_global_steps: maximum number of steps to run the workload for. Optional.
  • --keep_container_alive : can be true or false. Iftrue the container will not be killed automatically. This is useful for developing or debugging.

To run the docker container that will run the submission runner run:

docker run -t -d \
-v $HOME/data/:/data/ \
-v $HOME/experiment_runs/:/experiment_runs \
-v $HOME/experiment_runs/logs:/logs \
--gpus all \
--ipc=host \
<docker_image_name> \
--dataset <dataset> \
--framework <framework> \
--submission_path <submission_path> \
--tuning_search_space <tuning_search_space> \
--experiment_name <experiment_name> \
--workload <workload> \
--keep_container_alive <keep_container_alive>

This will print the container ID to the terminal.

Docker Tips

To find the container IDs of running containers

docker ps 

To see output of the entrypoint script

docker logs <container_id> 

To enter a bash session in the container

docker exec -it <container_id> /bin/bash

Score your Submission

To produce performance profile and performance table:

python3 scoring/score_submission.py --experiment_path=<path_to_experiment_dir> --output_dir=<output_dir>

We provide the scores and performance profiles for the baseline algorithms in the "Baseline Results" section in Benchmarking Neural Network Training Algorithms.

Good Luck!