Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU Improvements #4

Merged
merged 6 commits into from
May 9, 2024
Merged

CPU Improvements #4

merged 6 commits into from
May 9, 2024

Conversation

pravirkr
Copy link
Owner

@pravirkr pravirkr commented May 6, 2024

  • Buffer memory requirement reduced.
    • Need to use complicated channel indexing.
    • No easy way to visualize intermediate states with numpy.
    • Improved benchmarks.
  • C++20
    • prefer span over raw pointers (safety)
  • Merge for loops in execution
    • Keeping two loops for subband and dt is not good for parallelization.
  • Move if else block outside for loop in execution
    • Keep separate lists for adding and copying.

@pravirkr pravirkr changed the title FDMT GPU implementation CPU Improvements May 9, 2024
This was linked to issues May 9, 2024
@pravirkr pravirkr marked this pull request as ready for review May 9, 2024 22:40
@pravirkr pravirkr merged commit 6931117 into main May 9, 2024
6 checks passed
@pravirkr pravirkr deleted the fdmt_gpu branch May 9, 2024 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Choice of dispersion delay constant clang openmp not compiling
1 participant