-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking progress in implementation of Dynamic Graph struct #3
Comments
Let's think what a graph would require. Concurrent allocators of various sizes.
|
|
|
Interesting to see the runtime split of |
Currently attempting to load into arena graph, but have hit the wall with a double free error during reserving memory for the edges. The relevant code is given below: #pragma omp parallel for schedule(dynamic, 2048)
for (K u=0; u<S; ++u) {
if (!x.hasVertex(u)) continue;
a.allocateEdges(u, x.degree(u));
} Arena-based digraph is now working. The double-free issue was due to using a single allocator across all threads. Another issue we faced was the failure to populate the edges, resulting in abrupt program crash. This was due to using |
|
Note We also try to minimize the memory being held by the arena allocators. Tested on sk-2005 graph. Best config seems to be using up to 8KB arena allocators, with capacity of each allocator being 512KB. |
Note
|
I just implemented a concurrent arena allocator, which given a memory pool, returns allocated memory of fixed size memory blocks. It also maintains a list of freed memory blocks, and any new requests are serviced from the freed blocks.
However, in order to be thread-safe, i.e., to support memory allocation calls from multiple threads concurrently, we are using a
atomic_flag
which is used as a mutex to limit access to the freed list to one thread at a time only. In order to ensure that no thread has to wait for too long, we check if theflag
is set tobusy
, and if so, we go ahead an return a block from the pool (which uses an atomic add). However, it is also possible that we have no free space in the pool. In such a case, we repeatedly retry acquiring the mutex (yield
ing the current thread on failure and retrying), and once obtained, fetch a memory block from the freed list.For freeing a memory block, however, we must access the memory pool, so we we just do repeated retries to access the freed list, and then, once acquired, append the memory block to the freed list.
The results look like the following, when (several) allocations are being performed sequentially
However, when allocations are done in parallel using 64 thread, the results look quite different
It appears that there is high contention in our concurrent arena allocator. Sad! We could try using per-thread freed lists to resolve this. But then, we would have to implement some stealing mechanism in order to fetch memory blocks that have been freed by other threads. Might as well use libc
malloc()
instead - particularly for large size memory allocations.In fact, we have another recursive arena allocator which utilizes multiple pools to allocate smaller memory blocks to requesting thread, and is not thread-safe (for one thread only). A suitable way, then, might be to use
malloc()
for large allocations, and revert to per-thread recursive arena allocator for smaller allocations - we may now called it multi arena allocator instead. LGTM.See #4 for the code.
The text was updated successfully, but these errors were encountered: