Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add atomic element level #515

Merged
merged 24 commits into from
Nov 8, 2024
Merged

add atomic element level #515

merged 24 commits into from
Nov 8, 2024

Conversation

willow-ahrens
Copy link
Collaborator

@willow-ahrens willow-ahrens commented Apr 30, 2024

Co-authored-by: teocollin1995 [email protected], sortof fixes #531

@willow-ahrens willow-ahrens added this to the v0.7.0 milestone Jun 17, 2024
@willow-ahrens
Copy link
Collaborator Author

We should add a lower_assign function here.

@willow-ahrens
Copy link
Collaborator Author

thus, instantiate would be an "early expand" function, that expands levels immediate after unfurling. "unfurl" and "lower_access" would be "late expand" functions, that don't add dimensions and wait until the next call to unfurl to expand themselves.

@@ -141,8 +141,8 @@ end

is_level_injective(ctx, lvl::VirtualSparseListLevel) = [is_level_injective(ctx, lvl.lvl)..., false]
function is_level_atomic(ctx, lvl::VirtualSparseListLevel)
(below, atomic) = is_level_atomic(ctx, lvl.lvl)
return ([below; [atomic]], atomic)
(below, Mutex) = is_level_atomic(ctx, lvl.lvl)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget how this works, but shouldn't a false be being added somewhere here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and in many of these.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how it was before we started. Perhaps I need to make an issue for this. I'll look into it. One related question I had for you: This PR delays the lowering of atomics until just before their child is unfurled or accessed. That doesn't change this analysis right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is going to require some thought for me to answer. Hmm.

Copy link
Collaborator

@wraith1995 wraith1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments + one concern.

@willow-ahrens
Copy link
Collaborator Author

I'm worried that mutex levels can't handle multiple writers to the same level from the same thread. Not sure how we would modify them to support that. I guess we could check and set the lock to the thread id, and wait for the tid to be the same.

@wraith1995
Copy link
Collaborator

I'm worried that mutex levels can't handle multiple writers to the same level from the same thread. Not sure how we would modify them to support that. I guess we could check and set the lock to the thread id, and wait for the tid to be the same.

I think we don't want the various atomic levels to have knowledge of the thread id. That strikes me as something we should avoid.

I think the concern here is a deadlock, right? I think the main thing there is just this is inefficient because we need to release the lock after each write. The key concern is where the compiler places the acquire/releases. IIRC we wanted atomic to bind as tightly as possible - if that is the case, I think this should be fine.

@willow-ahrens
Copy link
Collaborator Author

willow-ahrens commented Nov 8, 2024

if we write

for i = _
   A[i] += 1
   A[i] += 2
end

and A is Tensor(Mutex(Dense(Element(0.0)))), it seems like this is a case where we would be okay if we knew that it was the same thread doing both modifications.

@willow-ahrens
Copy link
Collaborator Author

I guess what I'm saying is that we want that case to be okay if only one thread is running this computation, but it would currently deadlock.

@wraith1995
Copy link
Collaborator

if we write

for i = _
   A[i] += 1
   A[i] += 2
end

and A is Tensor(Mutex(Dense(Element(0.0)))), it seems like this is a case where we would be okay if we knew that it was the same thread doing both modifications.

I have a test meant to stress atomics - let's adapt that to this? Then I think the next thing to do is ask how we can either ensure that releases come before acquires or else ensure both are grouped under one lock. We need one of those, correct?

@willow-ahrens
Copy link
Collaborator Author

willow-ahrens commented Nov 8, 2024

The compiler unwraps both mutex levels before lowering the loop. This means it lowers to

aquire
aquire
for i
  ...
end
release
release

@willow-ahrens willow-ahrens merged commit bbb7961 into main Nov 8, 2024
6 checks passed
@willow-ahrens willow-ahrens deleted the tfc/atomic-element-level branch November 8, 2024 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Documentation needed for separated memory level and Atomic Level
3 participants