Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding GPU utilization #870

Open
siretru opened this issue May 7, 2024 · 5 comments
Open

Understanding GPU utilization #870

siretru opened this issue May 7, 2024 · 5 comments

Comments

@siretru
Copy link

siretru commented May 7, 2024

I'm having trouble interpreting some of the results...

After an Automatic Brute Search analysis, when I analyse the result_summary, I look at the Avegrage GPU Utilization.

How is this value determined? Is it in relation to the number of SMs (Stream MultiProcessors) used? Is it with dcmg or nvidia-smi? We know that it's quite complex to get a reliable measure of GPU usage (when using tools like Nvidia Nsight in particular), so I'd like to check the relevance of this metric.

What is the objective that is maximised in the Automatic Brute Search? Is it throughput?

My main question is :
I'm trying to understand why, for a given model, when the ideal model configuration is reached, my GPU is only being used at around 30%? What is the limiting factor (i.e. why can't we use more of the GPU to increase throughput)?

Thanks all!

@nv-braf
Copy link
Contributor

nv-braf commented May 7, 2024

GPU utilization is measured in Perf Analyzer and returned to MA as one of many metrics we capture and report to the user.

The default objective to maximize is throughput and there can be a multitude of factors that cause the GPU utilization to be less than 100%.

If you are interested in maximizing GPU utilization you can specify this as the objective (see config.md for documentation on how to do this) when profiling your model.

Have you tried looking at the detailed report generated for the optimal configuration? This might point you in the right direction. It is also possible that you might need to change the maximum instance, batch size or concurrency that MA searches.

I hope this helps.

@siretru
Copy link
Author

siretru commented May 7, 2024

Thank you for your reply,
Could you provide more details on the source of GPU utilization? Given that this metric comes, as you mention, from perf analyzer and that it is an Nvidia tool? I can't find the answer and this is probably the only place I can ask this question.

Thanks

@nv-braf
Copy link
Contributor

nv-braf commented May 8, 2024

@matthewkotila can you provide more details?

@matthewkotila
Copy link
Contributor

@siretru you can find information about the GPU utilization metric that Perf Analyzer offers here:

https://github.com/triton-inference-server/client/blob/main/src/c%2B%2B/perf_analyzer/docs/measurements_metrics.md#server-side-prometheus-metrics

@siretru
Copy link
Author

siretru commented May 9, 2024

Hi
Thanks for this :
GPU utilization : Averaged from each collection taken during stable passes. We want a number representative of all stable passes.

However, this does not provide any information on how the average GPU utilization is calculated. Is it utilisation per time; per SMs occupied; ...?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants