Difficulty optimizing performance using Model and Inference Precision, GPU vs. CPU #27210

rm-kozgun · 2024-10-23T15:04:06Z

rm-kozgun
Oct 23, 2024

I'm performing inference in c++ using openvino 2023.3. I currently have a f32 precision model, and compile with f32 inference precision and ExecutionMode::PERFORMANCE. Using my GPU, I see a good performance boost and ~60% runtime reduction over CPU.

I'd like to further optimize runtime, so I've produced a comparable model using f16 precision. I've made three observations:

Using f16 model precision does not yield a runtime boost over f32 model precision, for either inference precision. (expected result)
Using f16 inference precision does not yield a runtime boost over f32 inference precision, for either model precision. (unexpected result)
Using the f16 inference precision with the GPU yields incorrect results, though it does run accurately with the CPU. (unexpected result)

Am I implementing something wrong?

In my code, I'm adjusting these settings:
compiled_model = core_.compile_model(model, "CPU",
ov::hint::execution_mode(ov::hint::ExecutionMode::ACCURACY),
ov::hint::inference_precision(ov::element::f32));

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difficulty optimizing performance using Model and Inference Precision, GPU vs. CPU #27210

{{title}}

Replies: 0 comments

Select a reply

Difficulty optimizing performance using Model and Inference Precision, GPU vs. CPU #27210

rm-kozgun Oct 23, 2024

Replies: 0 comments

rm-kozgun
Oct 23, 2024