Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yash/dev llava next #10749

Open
wants to merge 94 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
94 commits
Select commit Hold shift + click to select a range
dcb38f3
locate weights path within MegatronCheckpointIO
ashors1 Oct 15, 2024
6010377
small refactor
ashors1 Oct 15, 2024
2417023
remove another instance of ckpt_to_weights_subdir
ashors1 Oct 15, 2024
eed4bad
move ckpt_to_weights_subdir
ashors1 Oct 16, 2024
52c0ad3
Apply isort and black reformatting
ashors1 Oct 16, 2024
e5dbd61
Apply isort and black reformatting
artbataev Oct 16, 2024
45df47d
add weights path in save_checkpoint
ashors1 Oct 16, 2024
c49e2a6
fix circular import
ashors1 Oct 16, 2024
d3ffd5d
Apply isort and black reformatting
ashors1 Oct 16, 2024
ea49e20
handle saving in ckpt_to_weights_subdir
ashors1 Oct 16, 2024
c4c3fd5
fix minor typo
ashors1 Oct 16, 2024
3ae933e
bug fixes
ashors1 Oct 16, 2024
f1fbec5
fix undefined variable
ashors1 Oct 17, 2024
8161076
move function
ashors1 Oct 17, 2024
994719e
Apply isort and black reformatting
ashors1 Oct 17, 2024
ea51ab2
fix adapter meta file path
cuichenx Oct 17, 2024
871ac85
Apply isort and black reformatting
cuichenx Oct 17, 2024
f5889ca
Merge branch 'refs/heads/main' into ashors/ckpt-subdirs
cuichenx Oct 17, 2024
df2c4b1
Merge remote-tracking branch 'origin/ashors/ckpt-subdirs' into ashors…
cuichenx Oct 17, 2024
5aec05b
fix mixtral test
ashors1 Oct 18, 2024
2df54e3
fix mixtral test
ashors1 Oct 18, 2024
440a244
use function for weights subdir
cuichenx Oct 18, 2024
b2883a1
address comments
ashors1 Oct 18, 2024
26a8d8d
move asserts
ashors1 Oct 18, 2024
ac1779b
fix undefined vars
ashors1 Oct 21, 2024
f380df7
bug fix
ashors1 Oct 21, 2024
e15cafa
fix mixtral test
ashors1 Oct 22, 2024
131d14e
Integrating mcore export (#10238)
shanmugamr1992 Oct 17, 2024
b4bb088
Fix artifact saving (#10914)
hemildesai Oct 17, 2024
ea08767
Lora improvement (#10918)
cuichenx Oct 17, 2024
ff80ad8
Huvu/t5 nemo2.0 peft (#10916)
huvunvidia Oct 17, 2024
c17a554
Add tie_word_embeddings=True (#10710)
suhara Oct 18, 2024
710e6f0
Use a context-manager when opening files (#10895)
akoumpa Oct 18, 2024
aa797d3
long context performance numbers in doc (#10784)
youngeunkwon0405 Oct 18, 2024
52d5ef8
perf recipes and Mcore DistOpt params (#10883)
malay-nagda Oct 18, 2024
2be9dc5
ci: Fix cherry pick team (#10945)
ko3n1g Oct 18, 2024
186b946
Packed sequence bug fixes (#10898)
cuichenx Oct 18, 2024
9e6e117
Fix requirements for MacOS (#10930)
artbataev Oct 18, 2024
481e380
Fix nemo 2.0 recipes (#10915)
BoxiangW Oct 18, 2024
52c89b9
Akoumparouli/nemo ux fix dir or string artifact (#10936)
akoumpa Oct 18, 2024
ca40849
ckpt convert bug fixes (#10878)
dimapihtar Oct 18, 2024
5a3932e
fix typo in docstring (#10955)
ashors1 Oct 19, 2024
3684fb3
remove deprecated ci tests (#10922)
dimapihtar Oct 19, 2024
7a5d96a
[Nemo CICD] Remove deprecated tests (#10960)
pablo-garay Oct 19, 2024
c6813ce
Adithyare/oai chat completion (#10785)
arendu Oct 19, 2024
739a15d
Update megatron_t5_pretraining.py (#10952)
huvunvidia Oct 19, 2024
6ecee6b
Convert perf plugin env vars to strings (#10947)
hemildesai Oct 21, 2024
38ccc9c
disable dynamo for ddp checker (#10961)
akoumpa Oct 21, 2024
f4aebf3
[🤠]: Howdy folks, let's bump `Dockerfile.ci` to db7d37b ! (#10965)
ko3n1g Oct 21, 2024
03434b0
Mistral-NeMo-12B recipe (#10607)
akoumpa Oct 21, 2024
d4b3adf
Make nemo text processing optional in TTS (#10584)
blisc Oct 21, 2024
c457d45
respect warnings' filters (#10953)
akoumpa Oct 21, 2024
9b3f602
Update T5 tokenizer (adding additional tokens to tokenizer config) (#…
huvunvidia Oct 21, 2024
cb88c41
Alit/mamba recipe (#10935)
JRD971000 Oct 21, 2024
b5c84cf
Long context performance doc hot fix (#10946)
youngeunkwon0405 Oct 21, 2024
0ff77b5
Performance mode (#10926)
malay-nagda Oct 21, 2024
1ca44d7
Add flux inference pipeline (#10752)
Victor49152 Oct 22, 2024
df41eac
Add assertion for always save nemo add model parallel size (#10690)
BoxiangW Oct 22, 2024
02cfe4c
[🤠]: Howdy folks, let's bump `Dockerfile.ci` to 563d5d1 ! (#10979)
ko3n1g Oct 22, 2024
7cf1907
Reflect CLI change nemorun -> nemo (#10443)
marcromeyn Oct 22, 2024
b92866e
minor fix (#10990)
JRD971000 Oct 22, 2024
7788041
Fixed sampler override and audio_key in prepare_audio_data (#10980)
anteju Oct 22, 2024
520f3cb
Add more recipes (#10957)
cuichenx Oct 22, 2024
e50cc14
Fix parallel_embedding (#10975)
meatybobby Oct 22, 2024
7ca0bf8
Upgrade transformers (#10854)
cuichenx Oct 22, 2024
3f464b7
Add support and recipes for HF models via AutoModelForCausalLM (#10962)
akoumpa Oct 23, 2024
25133a9
ci: Update tests (#10987)
ko3n1g Oct 23, 2024
c39d620
[🤠]: Howdy folks, let's bump `Dockerfile.ci` to 425cdd4 ! (#11001)
ko3n1g Oct 23, 2024
046b422
gpt3 175b cli (#10985)
malay-nagda Oct 23, 2024
2704487
Fix for crash with LoRA + tp_overlap_comm=false + sequence_parallel=t…
vysarge Oct 23, 2024
05273b4
llm.generate fixes (#10983)
HuiyingLi Oct 23, 2024
f668f94
use __dict__ in check (#11012)
akoumpa Oct 24, 2024
68e8968
LoRA support for HF::AutoModelForCausalLM (#10982)
akoumpa Oct 24, 2024
a3630de
Change default for always_save_context to True (#11014)
athitten Oct 24, 2024
b5686c2
Add a build option to load_context (#10713)
marcromeyn Oct 24, 2024
a07902a
Fix pip install (#11026)
marcromeyn Oct 24, 2024
e127994
[WIP] Add docs for NEST SSL (#10804)
stevehuang52 Oct 24, 2024
8eaf5a9
Change dist ckpt defaults (#10913)
ShriyaPalsamudram Oct 24, 2024
cde2e02
Akoumparouli/mixtral recipe fix r2.0.0 (#10994)
akoumpa Oct 24, 2024
e2db0be
added datamodule for llava-next
yashaswikarnati Sep 26, 2024
5eb00b0
modified state dict transform
yashaswikarnati Sep 26, 2024
d263a60
neva model changes to support llava-next
Oct 3, 2024
97025ee
remove accidentally checked in files
Oct 3, 2024
37c6c55
Apply isort and black reformatting
yashaswikarnati Oct 3, 2024
bac0f64
remove unused imports
Oct 4, 2024
da05cf1
added io_init to not save task_encoder and image_processor
Oct 16, 2024
cfb521c
Apply isort and black reformatting
yashaswikarnati Oct 16, 2024
d3a718f
added scripts for pretrain and finetune
Oct 16, 2024
438c573
Apply isort and black reformatting
yashaswikarnati Oct 16, 2024
29a2ed8
[🤠]: Howdy folks, let's bump `Dockerfile.ci` to 73e7b58 ! (#10779)
ko3n1g Oct 7, 2024
c93cda7
generation example
Oct 20, 2024
b2689fd
Apply isort and black reformatting
yashaswikarnati Oct 20, 2024
302afb7
small change in llava next example
yashaswikarnati Oct 21, 2024
accc256
edited merge conflict
yashaswikarnati Oct 24, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/cherry-pick-release-commit.yml
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ jobs:
"type": "section",
"text": {
"type": "mrkdwn",
"text": ":alert: Cherrypick bot 🤖: Hey <@'$USERNAME'>: Cherry-pick of <'$URL'|#'$PR_ID'> failed (3-way merge impossible). Please resolve manually and create a PR.\n\ncc: <!subteam^{{ secrets.SLACK_WEBHOOK_ADMIN }}>"
"text": ":alert: Cherrypick bot 🤖: Hey <@'$USERNAME'>: Cherry-pick of <'$URL'|#'$PR_ID'> failed (3-way merge impossible). Please resolve manually and create a PR.\n\ncc: <!subteam^${{ secrets.SLACK_WEBHOOK_ADMIN }}>"
}
}
]
Expand Down
1,379 changes: 117 additions & 1,262 deletions .github/workflows/cicd-main.yml

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion Dockerfile.ci
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ RUN pip install nemo_run@git+https://github.com/NVIDIA/NeMo-Run.git@${NEMO_RUN_T
# Install NeMo requirements
ARG TE_TAG=7d576ed25266a17a7b651f2c12e8498f67e0baea
ARG MODELOPT_VERSION=0.17.0
ARG MCORE_TAG=0d89fc4c0d4394f915fffff11212d6957652337f
ARG MCORE_TAG=425cdd48d5ef5d360d8033288ff7cb0d378f535f

ARG APEX_TAG=810ffae374a2b9cb4b5c5e28eaeca7d7998fca0c
RUN \
Expand Down
4 changes: 4 additions & 0 deletions docs/source/asr/ssl/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ NeMo SSL collection API

Model Classes
-------------
.. autoclass:: nemo.collections.asr.models.EncDecDenoiseMaskedTokenPredModel
:show-inheritance:
:members:

.. autoclass:: nemo.collections.asr.models.SpeechEncDecSelfSupervisedModel
:show-inheritance:
:members:
Expand Down
4 changes: 4 additions & 0 deletions docs/source/asr/ssl/intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@ encoder module of neural ASR models. Here too, majority of SSL effort is focused
While it is common that AM is the focus of SSL in ASR, it can also be utilized in improving other parts of
ASR models (e.g., predictor module in transducer based ASR models).

In NeMo, we provide two types of SSL models, `Wav2Vec-BERT <https://arxiv.org/abs/2108.06209>`_ and `NEST <https://arxiv.org/abs/2408.13106>`_.
The training script for them can be found in `https://github.com/NVIDIA/NeMo/tree/main/examples/asr/speech_pretraining`.


The full documentation tree is as follows:

.. toctree::
Expand Down
134 changes: 134 additions & 0 deletions docs/source/performance/performance_long_sequence.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# Long Sequence Performance

## LLAMA2-7B (FP8)

- The table below shows the pre-training performance of the LLAMA2-7B with CP (context parallelism) and compares it against the results without CP at various input sequence lengths. The detailed model-parallel configurations and the achieved performance are shown in the training results with CP. In non-CP training runs, we use the most performant model- and data-parallel configurations without CP given the memory capacity constraint of the H100 GPU system.

- Container: [NeMo24.03.01.framework](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo/tags)
- System: DGX-H100


<table>
<thead>
<tr>
<th rowspan="2" class="top-border">SeqLen (K)</th>
<th rowspan="2" class="top-border"># of GPUs</th>
<th rowspan="1" class="top-border">Without CP</th>
<th colspan="5" class="top-border">With CP</th>
<th rowspan="2" class="top-border">Speedup with CP/without CP</th>
</tr>
<tr>
<th>TFLOPS / GPU</th>
<th>TP</th>
<th>PP</th>
<th>DP</th>
<th>CP</th>
<th>TFLOPS / GPU</th>
</tr>
</thead>
<tbody>
<tr>
<td>4</td>
<td>4</td>
<td>768</td>
<td>1</td>
<td>1</td>
<td>4</td>
<td>1</td>
<td>768</td>
<td class="speedup">1.00</td>
</tr>
<tr>
<td>8</td>
<td>8</td>
<td>730</td>
<td>1</td>
<td>2</td>
<td>4</td>
<td>1</td>
<td>730</td>
<td class="speedup">1.00</td>
</tr>
<tr>
<td>16</td>
<td>16</td>
<td>660</td>
<td>2</td>
<td>1</td>
<td>8</td>
<td>1</td>
<td>660</td>
<td class="speedup">1.00</td>
</tr>
<tr>
<td>32</td>
<td>32</td>
<td>595</td>
<td>2</td>
<td>1</td>
<td>8</td>
<td>2</td>
<td>610</td>
<td class="speedup">1.03</td>
</tr>
<tr>
<td>64</td>
<td>64</td>
<td>534</td>
<td>4</td>
<td>1</td>
<td>8</td>
<td>2</td>
<td>574</td>
<td class="speedup">1.07</td>
</tr>
<tr>
<td>128</td>
<td>128</td>
<td>424</td>
<td>4</td>
<td>1</td>
<td>8</td>
<td>4</td>
<td>555</td>
<td class="speedup">1.31</td>
</tr>
<tr>
<td>256</td>
<td>256</td>
<td>392</td>
<td>4</td>
<td>1</td>
<td>8</td>
<td>8</td>
<td>549</td>
<td class="speedup">1.40</td>
</tr>
<tr>
<td>512</td>
<td>512</td>
<td>104</td>
<td>8</td>
<td>1</td>
<td>4</td>
<td>16</td>
<td>549</td>
<td class="speedup">5.28</td>
</tr>
<tr>
<td>1024</td>
<td>1024</td>
<td>26.5</td>
<td>8</td>
<td>1</td>
<td>4</td>
<td>32</td>
<td>536</td>
<td class="speedup">20.23</td>
</tr>
</tbody>
</table>


### Speedup of LLAMA2 7B training with CP over without CP
![cp_speedup_figure](https://github.com/NVIDIA/NeMo/releases/download/r2.0.0rc1/tutorial_cp_speedup_figure.png)
4 changes: 2 additions & 2 deletions examples/asr/conf/ssl/nest/nest_fast-conformer.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,8 @@ model:
mask_position: pre_conv # position to apply masking, before or after conv subsampling, choices in ['pre_conv', 'post_conv']

train_ds:
manifest_filepath: ???
noise_manifest: null
manifest_filepath: ??? # path to training manifest, can be a string or list of strings
noise_manifest: ??? # the manifest for noise data, can be a string or list of strings
sample_rate: ${model.sample_rate}
batch_size: 8 # you may increase batch_size if your memory allows
shuffle: true
Expand Down
8 changes: 6 additions & 2 deletions examples/asr/run_helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ def check_missing_values(cfg):
check_missing_values(result)
return result


def check_config_mount_paths(script_config, cluster_config):
# recursively walk all values of the script_config, checking if its a path-like string and if so, check if the path is a mounted path
# if it is not, raise an error
Expand Down Expand Up @@ -154,7 +155,9 @@ def main(cluster_cfg):
if 'exp_manager' in merged_config and 'name' in merged_config['exp_manager']:
exp_name = merged_config['exp_manager']['name']
else:
raise ValueError("Experiment name not provided in the run config file (`exp_name`)) or the cluster config (inside exp_manager.name)")
raise ValueError(
"Experiment name not provided in the run config file (`exp_name`)) or the cluster config (inside exp_manager.name)"
)

with run.Experiment(exp_name) as exp:
cmd = get_execution_script(cluster_script_path, "config.yaml")
Expand All @@ -166,7 +169,8 @@ def main(cluster_cfg):
num_nodes = cluster_cfg.get('num_nodes', merged_config['trainer'].get('num_nodes', 1))
cluster_cfg = OmegaConf.to_object(cluster_cfg)

run_utils.add_task(exp,
run_utils.add_task(
exp,
cmd=cmd,
task_name=job_name,
cluster_config=cluster_cfg,
Expand Down
8 changes: 8 additions & 0 deletions examples/asr/speech_pretraining/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,11 @@ This directory contains example scripts to self-supervised speech models.
There are two main types of supported self-supervised learning methods:
- [Wav2vec-BERT](https://arxiv.org/abs/2108.06209): `speech_pre_training.py`
- [NEST](https://arxiv.org/abs/2408.13106): `masked_token_pred_pretrain.py`
- For downstream tasks that use NEST as multi-layer feature extractor, please refer to `./downstream/speech_classification_mfa_train.py`


For their corresponding usage, please refer to the example yaml config:
- Wav2vec-BERT: `examples/asr/conf/ssl/fastconformer/fast-conformer.yaml`
- NEST: `examples/asr/conf/ssl/nest/nest_fast-conformer.yaml`


2 changes: 2 additions & 0 deletions examples/asr/speech_pretraining/masked_token_pred_pretrain.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,9 @@
python pretrain_masked_token_pred.py \
# (Optional: --config-path=<path to dir of configs> --config-name=<name of config without .yaml>) \
model.train_ds.manifest_filepath=<path to train manifest> \
model.train_ds.noise_manifest=<path to noise manifest> \
model.validation_ds.manifest_filepath=<path to val/test manifest> \
model.validation_ds.noise_manifest=<path to noise manifest> \
trainer.devices=-1 \
trainer.accelerator="gpu" \
strategy="ddp" \
Expand Down
4 changes: 2 additions & 2 deletions examples/audio/process_audio.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,8 +159,8 @@ def main(cfg: ProcessConfig) -> ProcessConfig:
audio_to_audio_model.set_trainer(trainer)
audio_to_audio_model = audio_to_audio_model.eval()

# override sampler
if cfg.sampler is not None:
# override sampler if necessary
if cfg.sampler:
logging.info('Overriding sampler with %s', cfg.sampler)

if hasattr(audio_to_audio_model, 'sampler'):
Expand Down
105 changes: 105 additions & 0 deletions examples/llm/peft/hf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import fiddle as fdl
from pytorch_lightning.loggers import WandbLogger
from nemo import lightning as nl
from nemo.collections import llm


def mk_hf_dataset(tokenizer):
EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN

def formatting_prompts_func(examples):
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""
instruction = examples["context"]
input = examples["question"]
output = examples["answers"]['text']
if isinstance(output, list):
output = output[0]
text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
ans = tokenizer(text)
tokens = ans['input_ids']
return {
'tokens': tokens,
'labels': tokens[1:] + [tokens[-1]],
}

from datasets import load_dataset

dataset = load_dataset("rajpurkar/squad", split="train")
dataset = dataset.map(formatting_prompts_func, batched=False, batch_size=2)
return dataset


if __name__ == '__main__':
import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--model', default='meta-llama/Llama-3.2-1B')
parser.add_argument('--strategy', type=str, default='auto', choices=['auto', 'ddp', 'fsdp'])
parser.add_argument('--devices', default=1)
parser.add_argument('--accelerator', default='gpu', choices=['gpu'])
parser.add_argument('--max-steps', type=int, default=100)
parser.add_argument('--wandb-project', type=str, default=None)
args = parser.parse_args()

wandb = None
if args.wandb_project is not None:
model = '_'.join(args.model.split('/')[-2:])
wandb = WandbLogger(
project=args.wandb_project,
name=f'{model}_dev{args.devices}_strat_{args.strategy}',
)
grad_clip = 0.5
if args.strategy == 'fsdp':
# See: https://github.com/Lightning-AI/pytorch-lightning/blob/8ad3e29816a63d8ce5c00ac104b14729a4176f4f/src/lightning/pytorch/plugins/precision/fsdp.py#L81
grad_clip = None
use_dist_samp = False
tokenizer = llm.HfAutoModelForCausalLM.configure_tokenizer(args.model)

llm.api.finetune(
model=llm.HfAutoModelForCausalLM(args.model),
data=llm.HfDatasetDataModule(
mk_hf_dataset(tokenizer.tokenizer), pad_token_id=tokenizer.tokenizer.eos_token_id
),
trainer=nl.Trainer(
devices=args.devices,
max_steps=args.max_steps,
accelerator=args.accelerator,
strategy=args.strategy,
log_every_n_steps=1,
limit_val_batches=0.0,
num_sanity_val_steps=0,
accumulate_grad_batches=10,
gradient_clip_val=grad_clip,
use_distributed_sampler=use_dist_samp,
logger=wandb,
),
optim=fdl.build(llm.adam.pytorch_adam_with_flat_lr(max_lr=1e-5, clip_grad=0.5)),
log=None,
peft=llm.peft.LoRA(
target_modules=['*_proj'],
dim=32,
),
)
10 changes: 5 additions & 5 deletions examples/llm/pretrain/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
### Listing the available recipes for pretraining

```bash
nemorun llm pretrain --help
nemo llm pretrain --help
```

![recipe-listing](https://github.com/NVIDIA/NeMo/releases/download/v2.0.0rc0/list-recipes.png)
Expand All @@ -12,15 +12,15 @@ nemorun llm pretrain --help
### Run pre-training with a default recipe

```bash
nemorun llm pretrain --factory llama3_8b
nemo llm pretrain --factory llama3_8b
```

![llama3_70b](https://github.com/NVIDIA/NeMo/releases/download/v2.0.0rc0/llama3_70b.png)

We can also call the factory function with custom parameters:

```bash
nemorun llm pretrain --factory "llama3_70b(num_nodes=128)"
nemo llm pretrain --factory "llama3_70b(num_nodes=128)"
```

![llama3_70b-128-nodes](https://github.com/NVIDIA/NeMo/releases/download/v2.0.0rc0/llama3_70b_128nodes.png)
Expand All @@ -29,13 +29,13 @@ nemorun llm pretrain --factory "llama3_70b(num_nodes=128)"
The CLI allows you to overwrite any parameter. For example, to run the recipe with 2000 steps:

```bash
nemorun llm pretrain --factory llama3_70b trainer.max_steps=2000
nemo llm pretrain --factory llama3_70b trainer.max_steps=2000
```

The syntax of the CLI is the same as the Python code. Which is great but in some cases you might want to inspect & edit a recipe interactively. An easy way to do this using the cli is the use the `--repl` flag.

```bash
nemorun llm pretrain --factory llama3_70b --repl
nemo llm pretrain --factory llama3_70b --repl
```

![repl](https://github.com/NVIDIA/NeMo/releases/download/v2.0.0rc0/repl.gif)
Expand Down
Loading
Loading