Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DFT] Allow the descriptor to be modified and recommitted #282

Merged
merged 11 commits into from
Mar 29, 2023

Conversation

FMarno
Copy link
Contributor

@FMarno FMarno commented Feb 20, 2023

Description

Allows the descriptor class for DFTs to be modified and recommitted. Some backends will likely need to regenerate the descriptor completely, but this will allow for those that don't.

This builds on top of #259

Checklist

All Submissions

  • Do all unit tests pass locally? Attach a log.
  • Have you formatted the code using clang-format?

New features

  • Have you provided motivation for adding a new feature?
  • Have you added relevant tests?

@FMarno FMarno force-pushed the finlay/multi_commit branch from aad7522 to 35f0e3b Compare February 20, 2023 13:51
include/oneapi/mkl/dft/detail/commit_impl.hpp Show resolved Hide resolved
include/oneapi/mkl/dft/detail/descriptor_impl.hpp Outdated Show resolved Hide resolved
include/oneapi/mkl/dft/detail/commit_impl.hpp Outdated Show resolved Hide resolved
include/oneapi/mkl/dft/detail/types_impl.hpp Outdated Show resolved Hide resolved
src/dft/backends/mklcpu/commit.cpp Outdated Show resolved Hide resolved
src/dft/descriptor_config_helper.hpp Outdated Show resolved Hide resolved
@FMarno FMarno force-pushed the finlay/multi_commit branch 3 times, most recently from 461f11c to c7cc568 Compare February 21, 2023 12:12
Copy link
Contributor

@hjabird hjabird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the biggest challenge is what to do in situations that the spec doesn't cover - eg. changing queue or backend for the descriptor.

src/dft/descriptor.cxx Outdated Show resolved Hide resolved
src/dft/backends/descriptor.cpp Show resolved Hide resolved
src/dft/backends/mklgpu/descriptor.cpp Show resolved Hide resolved
@FMarno FMarno changed the title Allow the descriptor to be modified and recommited [DFT] Allow the descriptor to be modified and recommited Feb 22, 2023
@FMarno FMarno force-pushed the finlay/multi_commit branch from ece0cc4 to 924a08d Compare March 9, 2023 13:55
@FMarno
Copy link
Contributor Author

FMarno commented Mar 10, 2023

Log of the test runs. My machine only supports single precision and through opencl, but I hope this demonstrates the descriptor tests are passing.
multi_commit_result.txt

@Rbiessy
Copy link
Contributor

Rbiessy commented Mar 13, 2023

Next time this needs to be rebased you could fix the typo recommited -> recommitted in the commit description.

tests/unit_tests/dft/source/descriptor_tests.cpp Outdated Show resolved Hide resolved
}
pimpl_.reset(detail::create_commit(*this, queue));
}
pimpl_->commit(values_);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's worth adding a recommit_values test which would change the queue between commit calls?

Copy link
Contributor Author

@FMarno FMarno Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a tests to show that there is a wait on the old queue here 088fa08
I was unable to prove that work is submitted to the new queue or not submitted to the old.

  • I don't believe there is any direct way to query if a queue has not completed tasks
  • I don't believe there is a way to query if an event is associated with a task.
  • When an command group function object is submitted to the queue, I don't believe there are any guarantees that is will complete before slow work on other queues.

Because of all this I can't think of how to test that reliably.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for that. I think you're right for every bullet point you mentioned and I'm sorry if my original suggestion generated complications/misunderstanding on the testing scope. Those changes look fine to me.
To be clearer, my original concern was about a use case like

  1. create a descriptor desc;
  2. create a queue q1 and commit desc to q1;
  3. delete q1;
  4. create a queue q2 and commit desc to q2;
  5. submit a compute task to desc.
    I am not 100% sure this works as expected for the closed-source mkl. But that is something we can/should check on our side as that would be an issue for us to fix anyways. We may consider this thread resolved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that's interesting. I think this should work because objects like sycl::queue are kept alive by the SYCL runtime with reference counting semantics. I guess I could have a test that follows the steps you've detailed, waits on q2 and then shows that the event returned from the task is complete. It's not perfect but its something. I'm about to finish for the day but I'll do this sometime tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a test for that in 8d0a608

@FMarno FMarno force-pushed the finlay/multi_commit branch from 1d17c5c to 088fa08 Compare March 21, 2023 17:45
@FMarno FMarno changed the title [DFT] Allow the descriptor to be modified and recommited [DFT] Allow the descriptor to be modified and recommitted Mar 21, 2023
Copy link

@raphael-egan raphael-egan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your work!

@lhuot
Copy link
Contributor

lhuot commented Mar 28, 2023

@FMarno conflicts need to be resolved before I can merge this. Thanks!

@FMarno FMarno force-pushed the finlay/multi_commit branch from d6a5430 to 4259829 Compare March 28, 2023 11:49
@FMarno
Copy link
Contributor Author

FMarno commented Mar 28, 2023

In the last rebase I made some changes to compute_inplace.hpp and compute_out_of_place.hpp which meant the same descriptor was used twice with a recommit in the middle to update the strides.

Here is evidence of the passing tests. I still only have access to single precision and the OpenCL backend -
recommit.txt

@lhuot lhuot merged commit f58b93f into uxlfoundation:develop Mar 29, 2023
@FMarno FMarno deleted the finlay/multi_commit branch March 29, 2023 10:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants