Understanding GPU utilization #870

siretru · 2024-05-07T01:10:38Z

I'm having trouble interpreting some of the results...

After an Automatic Brute Search analysis, when I analyse the result_summary, I look at the Avegrage GPU Utilization.

How is this value determined? Is it in relation to the number of SMs (Stream MultiProcessors) used? Is it with dcmg or nvidia-smi? We know that it's quite complex to get a reliable measure of GPU usage (when using tools like Nvidia Nsight in particular), so I'd like to check the relevance of this metric.

What is the objective that is maximised in the Automatic Brute Search? Is it throughput?

My main question is :
I'm trying to understand why, for a given model, when the ideal model configuration is reached, my GPU is only being used at around 30%? What is the limiting factor (i.e. why can't we use more of the GPU to increase throughput)?

Thanks all!

nv-braf · 2024-05-07T02:42:06Z

GPU utilization is measured in Perf Analyzer and returned to MA as one of many metrics we capture and report to the user.

The default objective to maximize is throughput and there can be a multitude of factors that cause the GPU utilization to be less than 100%.

If you are interested in maximizing GPU utilization you can specify this as the objective (see config.md for documentation on how to do this) when profiling your model.

Have you tried looking at the detailed report generated for the optimal configuration? This might point you in the right direction. It is also possible that you might need to change the maximum instance, batch size or concurrency that MA searches.

I hope this helps.

siretru · 2024-05-07T22:51:58Z

Thank you for your reply,
Could you provide more details on the source of GPU utilization? Given that this metric comes, as you mention, from perf analyzer and that it is an Nvidia tool? I can't find the answer and this is probably the only place I can ask this question.

Thanks

nv-braf · 2024-05-08T14:41:13Z

@matthewkotila can you provide more details?

matthewkotila · 2024-05-08T16:59:55Z

@siretru you can find information about the GPU utilization metric that Perf Analyzer offers here:

https://github.com/triton-inference-server/client/blob/main/src/c%2B%2B/perf_analyzer/docs/measurements_metrics.md#server-side-prometheus-metrics

siretru · 2024-05-09T01:18:57Z

Hi
Thanks for this :
GPU utilization : Averaged from each collection taken during stable passes. We want a number representative of all stable passes.

However, this does not provide any information on how the average GPU utilization is calculated. Is it utilisation per time; per SMs occupied; ...?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding GPU utilization #870

Understanding GPU utilization #870

siretru commented May 7, 2024

nv-braf commented May 7, 2024

siretru commented May 7, 2024

nv-braf commented May 8, 2024

matthewkotila commented May 8, 2024

siretru commented May 9, 2024

Understanding GPU utilization #870

Understanding GPU utilization #870

Comments

siretru commented May 7, 2024

nv-braf commented May 7, 2024

siretru commented May 7, 2024

nv-braf commented May 8, 2024

matthewkotila commented May 8, 2024

siretru commented May 9, 2024