You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm performing inference in c++ using openvino 2023.3. I currently have a f32 precision model, and compile with f32 inference precision and ExecutionMode::PERFORMANCE. Using my GPU, I see a good performance boost and ~60% runtime reduction over CPU.
I'd like to further optimize runtime, so I've produced a comparable model using f16 precision. I've made three observations:
Using f16 model precision does not yield a runtime boost over f32 model precision, for either inference precision. (expected result)
Using f16 inference precision does not yield a runtime boost over f32 inference precision, for either model precision. (unexpected result)
Using the f16 inference precision with the GPU yields incorrect results, though it does run accurately with the CPU. (unexpected result)
Am I implementing something wrong?
In my code, I'm adjusting these settings:
compiled_model = core_.compile_model(model, "CPU",
ov::hint::execution_mode(ov::hint::ExecutionMode::ACCURACY),
ov::hint::inference_precision(ov::element::f32));
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I'm performing inference in c++ using openvino 2023.3. I currently have a f32 precision model, and compile with f32 inference precision and ExecutionMode::PERFORMANCE. Using my GPU, I see a good performance boost and ~60% runtime reduction over CPU.
I'd like to further optimize runtime, so I've produced a comparable model using f16 precision. I've made three observations:
Am I implementing something wrong?
In my code, I'm adjusting these settings:
compiled_model = core_.compile_model(model, "CPU",
ov::hint::execution_mode(ov::hint::ExecutionMode::ACCURACY),
ov::hint::inference_precision(ov::element::f32));
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions