Skip to content

Commit

Permalink
add auto configurator to NeMo (#10270)
Browse files Browse the repository at this point in the history
* add base configs

Signed-off-by: dimapihtar <[email protected]>

* add auto configurator functionality

Signed-off-by: dimapihtar <[email protected]>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <[email protected]>

* add runner

Signed-off-by: dimapihtar <[email protected]>

* add end-to-end example for auto configurator

Signed-off-by: dimapihtar <[email protected]>

* add unit tests for auto configurator

Signed-off-by: dimapihtar <[email protected]>

* add GPT configs

Signed-off-by: dimapihtar <[email protected]>

* add GPT configs

Signed-off-by: dimapihtar <[email protected]>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <[email protected]>

* switch to dataclass

Signed-off-by: dimapihtar <[email protected]>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <[email protected]>

* switch to dataclass

Signed-off-by: dimapihtar <[email protected]>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <[email protected]>

* fix dataclasses usage

Signed-off-by: dimapihtar <[email protected]>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <[email protected]>

* remove unused imports

Signed-off-by: dimapihtar <[email protected]>

* remove extra function

Signed-off-by: dimapihtar <[email protected]>

* fix docstring style

Signed-off-by: dimapihtar <[email protected]>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <[email protected]>

* take Config object as input for model

Signed-off-by: dimapihtar <[email protected]>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <[email protected]>

* add nemotron support

Signed-off-by: dimapihtar <[email protected]>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <[email protected]>

* remove search_config.py

Signed-off-by: dimapihtar <[email protected]>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <[email protected]>

* move configs creation to Basic class

Signed-off-by: dimapihtar <[email protected]>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <[email protected]>

* move to common basic class

Signed-off-by: dimapihtar <[email protected]>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <[email protected]>

* rename main config

Signed-off-by: dimapihtar <[email protected]>

* remove base configs for models

Signed-off-by: dimapihtar <[email protected]>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <[email protected]>

* Apply isort and black reformatting

Signed-off-by: artbataev <[email protected]>

* change auto conf functionality

Signed-off-by: dimapihtar <[email protected]>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <[email protected]>

* fix docstring

Signed-off-by: dimapihtar <[email protected]>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <[email protected]>

* remove unused imports

Signed-off-by: dimapihtar <[email protected]>

* add changes

Signed-off-by: dimapihtar <[email protected]>

* remove activations_checkpoint_num_layers

Signed-off-by: dimapihtar <[email protected]>

* remove gbs from config

Signed-off-by: dimapihtar <[email protected]>

* fix logs

Signed-off-by: dimapihtar <[email protected]>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <[email protected]>

* fix performance calculation

Signed-off-by: dimapihtar <[email protected]>

* fix end-to-end example

Signed-off-by: dimapihtar <[email protected]>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <[email protected]>

* fix model config

Signed-off-by: dimapihtar <[email protected]>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <[email protected]>

* minor changes

Signed-off-by: dimapihtar <[email protected]>

* minor changes

Signed-off-by: dimapihtar <[email protected]>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <[email protected]>

* fix unit tests

Signed-off-by: dimapihtar <[email protected]>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <[email protected]>

* add README

Signed-off-by: dimapihtar <[email protected]>

* fix README

Signed-off-by: dimapihtar <[email protected]>

* fix README

Signed-off-by: dimapihtar <[email protected]>

* fix readme

Signed-off-by: dimapihtar <[email protected]>

* fix readme

Signed-off-by: dimapihtar <[email protected]>

* remove extra arg

Signed-off-by: dimapihtar <[email protected]>

* remove unused imports

Signed-off-by: dimapihtar <[email protected]>

* add nemo-run installation

Signed-off-by: dimapihtar <[email protected]>

* fix unit tests

Signed-off-by: dimapihtar <[email protected]>

* fix unit tests

Signed-off-by: dimapihtar <[email protected]>

---------

Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: artbataev <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: artbataev <[email protected]>
  • Loading branch information
3 people authored Sep 7, 2024
1 parent 9e372d3 commit cda2a63
Show file tree
Hide file tree
Showing 16 changed files with 3,339 additions and 0 deletions.
4 changes: 4 additions & 0 deletions Dockerfile.ci
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,10 @@ EOF

WORKDIR /workspace

RUN pip install hatchling # needed to install nemo-run
ARG NEMU_RUN_TAG=34259bd3e752fef94045a9a019e4aaf62bd11ce2
RUN pip install nemo_run@git+https://github.com/NVIDIA/NeMo-Run.git@${NEMU_RUN_TAG}

# Install NeMo requirements
ARG TE_TAG=7d576ed25266a17a7b651f2c12e8498f67e0baea
ARG MODELOPT_VERSION=0.15.0
Expand Down
85 changes: 85 additions & 0 deletions examples/llm/auto_configurator/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
> [!IMPORTANT]
> This is an early version of the Auto Configurator, and the code base can be modified as it will be integrated into the CLI.
Use Auto Configurator to Find the Optimal Configuration
-------------------------------------------------------

Auto Configurator searches for hyperparameters (HPs) that achieve the maximum highest training throughput when working with Large Language Models (LLMs) utilizing the NeMo Framework.

> [!NOTE]
> Auto Configurator is only supported now for GPT-based models: GPT3, LLama, Mixtral, Mistral, Gemma and Nemotron.
Auto Configurator Capabilities
------------------------------

Auto Configurator is intended to iterate over different model configurations quickly and find the best configuration, that is, the configuration that minimizes both time and financial expenditure. It offers a range of features to facilitate this, as detailed in the list below.

- **Model size recommendation**: finds the optimal model size if the parameter is not specified.
- **Training time estimation**: estimates model training time based on input parameters.
- **Base configuration generation**: returns a basic model configuration.
- **Hyperparameters recommendation**: finds the optimal list of hyperparameters to be trained.
- **Optimal configuration recommendation**: calculates the performance after a short training of candidate configurations and finds the optimal model configuration.

Model Size Recommendation
-------------------------

If you have not decided what model size you want to train, Auto Configurator can recommend a model size for your use case. If you know the number of GPUs, TFLOPS per GPU, the maximum time to train, and the number of tokens to train for, it can recommend a model size that can be trained with the specified hardware and time constraints.

For example, if you had 20 NVIDIA DGX nodes available (in 80 GB GPU memory), and wanted to train a GPT model for a maximum of 5 days, Auto Configurator would recommend using a 5B parameter GPT model.

Training Time Estimation
------------------------

Auto Configurator calculates the estimated training time for your model. It provides a projection of the training time in days, based on the input dataset and parameters you provide.

Base Configuration Generation
-----------------------------

When you provide the model size, or Auto Configurator has suggested one, it generates a base configuration for the target model. The base configuration is a valid configuration in NeMo 2.0 format. The optimization of throughput, however, is conducted in the next step.

Hyperparameters Recommendation
------------------------------

After Auto Configurator generates the base configuration, it searches over four critical hyperparameters that have a great impact on training throughput but do not affect model convergence. These hyperparameters include Tensor Parallelism (TP), Pipeline Parallelism (PP), Context Parallelism (CP), Expert Parallelism (EP), Micro Batch Size (MBS), and Activation Checkpointing Layers (ActCkpt). Auto Configurator will also provide optimal Global Batch Size (GBS) if it's not specified.

Auto Configurator initially applies heuristics to identify suitable candidates for the four key parameters, subsequently generating a grid of candidate configurations. It returns all of the candidate configurations in NeMo 2.0 format.

> [!NOTE]
> Some of the candidate configurations may not work due to high-memory usage or other issues.
Once the candidate configurations are generated, you can use NeMo Framework to launch the most promising candidates.

When running the candidates on the cluster, you can limit job time and job max steps by using ``max_minutes_per_run`` and ``max_steps_per_run`` parameters. During this search, the jobs will run with the number of nodes specified in the configuration files, using the ``num_nodes`` parameter. Once all of the jobs have finished running, you'll need to run compare_throughput.py to get a ``.csv`` table with performance results for each succeeded job.

Optimal Configuration Recommendation
------------------------------------

After all of the candidate jobs are done, Auto Configurator calculates performance parameters for each of the candidates.
Auto Configurator generates two ``.csv`` files: one detailing the performance measures of the candidates and another listing the candidates that failed due to out-of-memory errors.

End-To-End Example
------------------

The following list shows the required input parameters for the Auto Configurator runner:

- ``model``: model configuration based on NeMo 2.0.
- ``num_nodes``: number of nodes to be used for the training.
- ``seq_length``: sequence length to be used for the training.
- ``data_paths``: dataset to be used for the training.
- ``tokenizer_path``: path to tokenizer model if custom tokenizer will be used.

The following list shows the optional parameters for the Auto Configurator runner:

- ``global_batch_size``: global batch size to be used.
- ``tensor_parallel_sizes``: a list, such as ``[1, 2, 4]``.
- ``pipeline_parallel_sizes``: a list, such as ``[1, 2, 4]``.
- ``context_parallel_sizes``: a list, such as ``[1, 2, 4]``.
- ``expert_parallel_sizes``: a list, such as ``[1, 2, 4]``.
- ``micro_batch_sizes``: a list, such as ``[1, 2, 4]``.
- ``min_model_parallel_size``: a value for the minimum desired parallelism.
- ``max_model_parallel_size``: a value for the maximum desired parallelism.

For each of the optional parameters, Auto Configurator will find the optimal value if the parameter is not specified. To view the full list of parameters, please visit [this page](https://github.com/NVIDIA/NeMo/blob/dpykhtar/nemo_autoconf/nemo/collections/llm/tools/auto_configurator/runner.py#L51).

To view an end-to-end example of how to generate candidate configs, train them, and calculate the performance using Auto Configurator with NeMo Framework, please visit [this page](https://github.com/NVIDIA/NeMo/blob/dpykhtar/nemo_autoconf/examples/llm/auto_configurator/auto_config.py).

81 changes: 81 additions & 0 deletions examples/llm/auto_configurator/auto_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import argparse
import os

import fiddle as fdl
import nemo_run as run

from nemo.collections.llm import GPTConfig126M
from nemo.collections.llm.tools.auto_configurator import AutoConfigurator, generate_configs, get_results


def get_args():
parser = argparse.ArgumentParser()
parser.add_argument("--run_number", type=int, help="Number of config to run")
parser.add_argument("--logs_dir", type=str, help="Path where to save training logs")
parser.add_argument("--data_path", type=str, help="Path to the dataset")
parser.add_argument("--get_results", action="store_true")

return parser.parse_args()


def train_config(args):
# GPT-3 126M
# This example will generate 3 configs.
# It is expected that this script will be run 3 times with changing --run_number flag for each run from 0 to 2.
# After all configurations are trained, please trigger the script using --get_results flag.
runner = AutoConfigurator(
model=run.Config(GPTConfig126M),
num_nodes=1,
gpus_per_node=1,
gpu_memory_gb=40,
global_batch_size=16,
seq_length=512,
tensor_parallel_sizes=[1],
pipeline_parallel_sizes=[1],
micro_batch_sizes=[1, 2, 4],
max_training_days=1,
max_steps_per_run=25,
num_tokens_in_b=10,
vocab_size=51200,
data_paths=args.data_path,
path_to_logs=args.logs_dir,
)

base_cfg, configs = generate_configs(runner)
if not args.get_results:
# Get generated configs
partials = list(configs.values())
names = list(configs.keys())

# Run pre-training
partial = partials[args.run_number - 1]
partial.log.dir = os.path.join(args.logs_dir, names[args.run_number - 1])
pretrain = fdl.build(partial)
pretrain()
else:
# # Get Auto Configurator results
get_results(base_cfg, runner, args.logs_dir)
print(f"The results were successfully saved to {args.logs_dir}.")


def main():
args = get_args()
train_config(args)


if __name__ == '__main__':
main()
6 changes: 6 additions & 0 deletions nemo/collections/llm/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,12 @@
GemmaConfig7B,
GemmaModel,
GPTConfig,
GPTConfig5B,
GPTConfig7B,
GPTConfig20B,
GPTConfig40B,
GPTConfig126M,
GPTConfig175B,
GPTModel,
Llama2Config7B,
Llama2Config13B,
Expand Down
6 changes: 6 additions & 0 deletions nemo/collections/llm/gpt/model/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,12 @@
from nemo.collections.llm.gpt.model.baichuan import Baichuan2Config, Baichuan2Config7B, Baichuan2Model
from nemo.collections.llm.gpt.model.base import (
GPTConfig,
GPTConfig5B,
GPTConfig7B,
GPTConfig20B,
GPTConfig40B,
GPTConfig126M,
GPTConfig175B,
GPTModel,
MaskedTokenLossReduction,
gpt_data_step,
Expand Down
54 changes: 54 additions & 0 deletions nemo/collections/llm/gpt/model/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,60 @@ def configure_model(self, tokenizer) -> "MCoreGPTModel":
)


@dataclass
class GPTConfig126M(GPTConfig):
seq_length: int = 2048
num_layers: int = 12
hidden_size: int = 768
ffn_hidden_size: int = 3072
num_attention_heads: int = 12


@dataclass
class GPTConfig5B(GPTConfig):
seq_length: int = 2048
num_layers: int = 24
hidden_size: int = 4096
ffn_hidden_size: int = 16384
num_attention_heads: int = 32


@dataclass
class GPTConfig7B(GPTConfig):
seq_length: int = 2048
num_layers: int = 32
hidden_size: int = 4096
ffn_hidden_size: int = 10880
num_attention_heads: int = 32


@dataclass
class GPTConfig20B(GPTConfig):
seq_length: int = 2048
num_layers: int = 44
hidden_size: int = 6144
ffn_hidden_size: int = 24576
num_attention_heads: int = 48


@dataclass
class GPTConfig40B(GPTConfig):
seq_length: int = 2048
num_layers: int = 48
hidden_size: int = 8192
ffn_hidden_size: int = 32768
num_attention_heads: int = 64


@dataclass
class GPTConfig175B(GPTConfig):
seq_length: int = 2048
num_layers: int = 96
hidden_size: int = 12288
ffn_hidden_size: int = 49152
num_attention_heads: int = 96


class GPTModel(L.LightningModule, io.IOMixin, io.ConnectorMixin, fn.FNMixin):
def __init__(
self,
Expand Down
2 changes: 2 additions & 0 deletions nemo/collections/llm/tools/auto_configurator/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
from nemo.collections.llm.tools.auto_configurator.core.calculate_performance import get_results
from nemo.collections.llm.tools.auto_configurator.runner import AutoConfigurator, generate_configs
13 changes: 13 additions & 0 deletions nemo/collections/llm/tools/auto_configurator/core/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Loading

0 comments on commit cda2a63

Please sign in to comment.