This utility used to measure throughput and other improvements obtained when using fms-acceleration
plugins.
- benchmark.py: Main benchmark script.
- scenarios.yaml:
sft_trainer.py
arguments organized different scenarios.- Each
scenario
may apply to one ore moreAccelerationFramework
sample configuration. These are the critical arguments needed for correct operation. - See section on benchmark scenarios for more details.
- Each
- defaults.yaml:
sft_trainer.py
arguments that may be used in addition to scenarios.yaml. These are the non-critical arguments that will not affect plugin operation. - accelerate.yaml: configurations required by
accelerate launch
for multi-gpu benchmarks.
An example of a scenario
for accelerated-peft-gptq
given as follows:
scenarios:
# benchmark scenario for accelerated peft using AutoGPTQ triton v2
- name: accelerated-peft-gptq
framework_config:
# one ore more framework configurations that fall within the scenario group.
# - each entry points to a shortname in CONTENTS.yaml
- accelerated-peft-autogptq
# sft_trainer.py arguments critical for correct plugin operation
arguments:
fp16: True
learning_rate: 2e-4
torch_dtype: float16
peft_method: lora
r: 16
lora_alpha: 16
lora_dropout: 0.0
target_modules: "q_proj k_proj v_proj o_proj"
model_name_or_path:
- 'mistralai/Mistral-7B-v0.1'
- 'mistralai/Mixtral-8x7B-Instruct-v0.1'
- 'NousResearch/Llama-2-70b-hf'
A scenario
has the following key components:
framework_config
: points to one or more acceleration configurations.- list of sample config
shortname
. - for each
shortname
is a different bench.
- list of sample config
arguments
: the criticalsft_trainer.py
arguments that need to be passed in alongisideframework_config
to ensure correct operation.model_name_or_path
is a list, and the bench will enumerate all of them.- NOTE: a
plugin
may not work with arbitrary models. This depends on the plugin's setting ofAccelerationPlugin.restricted_model_archs
.
The best way is via tox
which manages the dependencies, including installing the correct version fms-hf-tuning.
- run a small representative set of benches:
tox -e run-benches
- run the full set of benches on for both 1 and 2 GPU cases:
tox -e run-benches -- "1 2"
ationFramework` to demonstrate the various plugins.
The convinience script run_benchmarks.sh
configures and runs benchmark.py
; the command is:
bash run_benchmarks.sh NUM_GPUS_MATRIX RESULT_DIR SCENARIOS_CONFIG SCENARIOS_FILTER
where:
NUM_GPUS_MATRIX
: list ofnum_gpu
settings to bench for, e.g."1 2"
will bench for 1 and 2 gpus.RESULT_DIR
: where the benchmark results will be placed.SCENARIOS_CONFIG
: thescenarios.yaml
file.SCENARIOS_CONFIG
: specify to run only a specificscenario
by providing the specificscenario
name.
The recommended way to run benchmarks.sh
is using tox
which handles the dependencies:
tox -e run-benches -- NUM_GPUS_MATRIX RESULT_DIR SCENARIOS_CONFIG SCENARIOS_FILTER
Alternatively run benchmark.py
directly. To see the help do:
python benchmark.py --help