-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPU] Improvement of event-related primitives code #27337
[GPU] Improvement of event-related primitives code #27337
Conversation
e5a5b26
to
78ab0ce
Compare
b6e9892
to
b7a66e1
Compare
Signed-off-by: Vladimir Paramuzov <[email protected]>
Signed-off-by: Vladimir Paramuzov <[email protected]>
Signed-off-by: Vladimir Paramuzov <[email protected]>
Signed-off-by: Vladimir Paramuzov <[email protected]>
b7a66e1
to
43c0957
Compare
@@ -36,7 +36,7 @@ struct network_output { | |||
// TODO: in_order queue doesn't create proper output event in some cases which leads to syncronization issues with user app | |||
// So call finish for associated stream to enusre that the output data is ready. | |||
if (do_sync) { | |||
if (_stream->get_queue_type() == QueueTypes::in_order) { | |||
if (_stream->get_queue_type() == QueueTypes::in_order || !_event) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When could event not be created for out-of-order queue?
If it's really possible, then this can add additional overhead in some cases, because currently some outputs can be processed asynchronously in SyncInferRequest::wait()
function, however with this change it has to wait for all kernels to finish
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it happened when output primitive has CPU impl type which doesn't produce events anymore for barrier-based synchronization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make_output_event
seems to handle this case as well?
…7337) ### Details: - Removed `_events` map from network class. Now dependency and result events are stored in `kernel_impl_params` for each primitive - User events are not created for CPU impls with barrier based synchronization to avoid useless OCL API calls (clCreateUserEvent -> clSetUserEventStatus -> clReleaseEvent). Overall, methods can return nullptr instead of user event. - Update ocl_stream::wait_for_events impl to deal with C event handles (cl_event) instead of C++ wrapper to avoid redundant clRetainEvent call - Introduced ExecutionFlags structure which reflects an execution status of primitive. Now methods which prepares dynamic primitive for execution modify/check some flags instead of some primitive_inst attributes or explicit function arguments. --------- Signed-off-by: Vladimir Paramuzov <[email protected]>
…7337) ### Details: - Removed `_events` map from network class. Now dependency and result events are stored in `kernel_impl_params` for each primitive - User events are not created for CPU impls with barrier based synchronization to avoid useless OCL API calls (clCreateUserEvent -> clSetUserEventStatus -> clReleaseEvent). Overall, methods can return nullptr instead of user event. - Update ocl_stream::wait_for_events impl to deal with C event handles (cl_event) instead of C++ wrapper to avoid redundant clRetainEvent call - Introduced ExecutionFlags structure which reflects an execution status of primitive. Now methods which prepares dynamic primitive for execution modify/check some flags instead of some primitive_inst attributes or explicit function arguments. --------- Signed-off-by: Vladimir Paramuzov <[email protected]>
Details:
_events
map from network class. Now dependency and result events are stored inkernel_impl_params
for each primitive