-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
More aggressively tune the GPU and benchmark (#143)
Summary: Through the OSS metrics tracking we found the day-to-day run variance is larger than we expected (~4%) where we are shooting for 2% max variance. This PR will tune the GPU more aggressively and run gemm and softmax benchmarks with cudagraph to see if metrics can be more stabilized. Pull Request resolved: #143 Test Plan: CI Reviewed By: adamomainz Differential Revision: D68644171 Pulled By: xuzhao9 fbshipit-source-id: ea3b34836da536719176d36d1b301f048d8038cd
- Loading branch information
1 parent
f23ad77
commit 18eaf84
Showing
4 changed files
with
39 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
#!/usr/bin/env bash | ||
# Script to tune NVIDIA H100 GPU on GCP | ||
# To reset GPU status | ||
|
||
# Reset GPU and Memory clocks | ||
sudo nvidia-smi -rgc | ||
sudo nvidia-smi -rmc | ||
|
||
# Restore the default power limit (500W) | ||
sudo nvidia-smi -pl 500 | ||
|
||
# Disable persistent mode | ||
sudo nvidia-smi -pm 0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
#!/usr/bin/env bash | ||
# Script to tune NVIDIA H100 GPU on GCP | ||
# To stablize performance | ||
|
||
set -ex | ||
|
||
# Enable persistent mode | ||
sudo nvidia-smi -pm 1 | ||
# Lock power limit to 650W | ||
sudo nvidia-smi -pl 650 | ||
|
||
# Default Memory Frequency: 2619 MHz | ||
# Default Graphics Frequency: 1980 MHz | ||
sudo nvidia-smi -lgc 1980,1980 | ||
sudo nvidia-smi -lmc 2619,2619 | ||
sudo nvidia-smi -ac 2619,1980 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters