The 2D framework derives relaxed concurrent data structures, with k-out-of-order relaxation. Relaxation means that the data structures are allowed to return items in slightly the wrong order. In k-out-of-order, a FIFO queue is allowed to dequeue any of the top k items during a dequeue, instead of just the head, which enables superior scalability over strict designs.
The core idea of the 2D framework is to split a data structure over several disjoint sub-structures. Then a window is superimposed over the sub-structures, which defines an area where operations are allowed at each sub-structure. The idea is that a thread can for the most part work alone on a sub-structure as long as it stays within the window. Then the window can be shifted when full or empty.
This repository is based on the ASCYLIB framework and memory is managed using SSMEM which is a simple object-based memory allocator with epoch-based garbage collection.
- How to Relax Instantly: Elastic Relaxation of Concurrent Data Structures
- Kåre von Geijer, Philippas Tsigas.
- To appear in proceedings of the 30th International European Conference on Parallel and Distributed Computing, Euro-Par 2024.
- Monotonically Relaxing Concurrent Data-Structure Semantics for Increasing Performance: An Efficient 2D Design Framework
- Adones Rukundo, Aras Atalar, Philippas Tsigas.
- In proceedings of the 33rd International Symposium on Distributed Computing, DISC 2019.
- Brief Announcement: 2D-Stack - A Scalable Lock-Free Stack Design that Continuously Relaxes Semantics for Better Performance
- Adones Rukundo, Aras Atalar, Philippas Tsigas.
- In proceedings of the 2018 ACM Symposium on Principles of Distributed Computing, PODC 2018.
The src
folder includes the data structures encompassed by the framework, as well as designs compared to in the publications. Each of these implementations has a README.md file with additional information.
These designs are on a high level described in the DISC paper, and form the foundation of the 2D framework. They have had some optimizations done in conjunction with later publications.
- 2D queue: ./src/2Dd-queue
- Optimized 2D queue: ./src/2Dd-queue_optimized
- 2Dc stack: ./src/2Dc-stack
- Optimized 2Dc stack: ./src/2Dc-stack_optimized
- 2Dd stack: ./src/2Dd-stack
- 2Dd deque: ./src/2Dd-deque
- 2Dc counter: ./src/2Dc-counter
- 2Dd counter: ./src/2Dd-counter
These designs extend the 2D stack and queue to encompass elastic relaxation. This means that their degree of relaxation can be changed (either manually or with a dynamic controller) during runtime. They are described in the coming Euro-Par paper.
- 2D Lateral-as-Window (LaW) queue: ./src/2Dd-queue_elastic-law
- 2D Lateral-plus-Window (LpW) queue: ./src/2Dd-queue_elastic-lpw
- 2D Lateral-plus-Window (LpW) stack: ./src/2Dc-stack_elastic
These are implementations, or copies, of data structure designs used to compare the 2D designs against. These ar the main ones, used in the Euro-Par paper, but there are some additional ones as well in ./src/.
- Michael-Scott lock-free queue: ./src/queue-ms
- Wait-free queue as fast as FAA: ./src/queue-wf
- k-Segment queue: ./src/queue-k-segment
- Treiber stack: ./src/stack-treiber
- Elimination stack: ./src/stack-elimination
- k-Segment stack: ./src/stack-k-segment
Simply clone the repository and run make
to compile all implementations, using their default tests and switches. You can then find and run the respective data structure binary benchmark in bin/
.
The default benchmark is a synthetic test where each thread repeatedly flips a coin to either insert or remove an item. The binary takes several cli flags, which are all described by running it with -h
, such as make 2Dc-queue && ./bin/2Dc-queue -h
. However, the most important arguments to get started might be:
-d
: The duration in ms to run the experiment,-i
: The number of items to insert before starting the benchmark,-n
: The number of threads to use,-l
: The depth of the 2D window, only for the 2D data structures,-w
: The width of the 2D window, mainly for the 2D and k-segment data structures.
The 2D code is designed to be run only on x86-64 machines, such as Intel or AMD. This is in part due to what memory ordering is assumed from the processor, and also due to the use of 128 bit compare and swaps. Even if runnable on other architectures, the relaxation bounds will likely not hold, due to additional possible reorderings.
Furthermore, you need gcc
and make
to compile the tests. To run the helper scripts in scripts/, you need bc
and python 3
. Run the following command to install the required python packages pip3 install numpy==1.26.3 matplotlib==3.8.2 scipy==1.12.0
.
To recreate the plots from the Euro-Par paper, run scripts/recreate-europar.sh, which will output its plots in the results
folder. Parameters such as number of threads can be adjusted at the top of the script.
Either navigate a the data structure directory and run make
, or run make <data structure name>
from top level, which compiles the data structure tests with the default settings. You can further set different environment variables, such as make VERSION=O3 GC=1 INIT=one
to modify the compilation. For all possible compilation switches, see ./common/Makefile.common as well as the individual Makefile for each test. Here are the most common ones:
VERSION
defines the optimisation level e.g.VERSION=O3
. It takes one of the following five values:DEBUG
compile with optimisation zero and debug flagsSYMBOL
compile with optimisation level three and -gO0
compile with optimisation level zeroO1
compile with optimisation level oneO2
compile with optimisation level twoO3
compile with optimisation level three (Default)O4
compile with optimisation level three, and without any asserts
GC
defines if deleted nodes should be recycledGC=1
(Default) or notGC=0
.INIT
defines if data structure initialization should be performed by all active threadsINIT=all
(Default) or one of the threadsINIT=one
RELAXATION_ANALYSIS
can be set to1
to measure relaxation levels for relaxed data structures. Keep in mind that this imposes a sequential order of linearizations, totally destroying the throughput at high thread counts.TEST
can be used to change the benchmark used. This has mainly been used in the elastic data structures for testing dynamic scenarios, and possible switches can for example be seen in src/2Dd-queue_elastic-law/Makefile.
- src/: Contains the data structures' source code.
- scripts/: Contains supporting scripts, such as ones aggregating several benchmark runs into plots.
- results/: Default folder for test output of the scripts in scripts.
- include/: Contains support files e.g. the basic window framework (2Dc-window.c).
- common/: Contains make definitions file.
- external/: Contains external libraries such as the ssmem library.
- bin/: Contains the binary files for the compiled data structure benchmarks.
All tests will pin each software pthread to a hardware thread, using the function pthread_setaffinity_np
. By default, the threads will be pinned sequentially to hardware threads 0,
1, 2, 3...`. However, the numbering of the hardware threads depends on the specific CPU used, and you might often want to use a different order than the default one. For example, on a dual-socket Intel Xeon E5-2695 v4 system, the even hardware threads are on the first socket while the odd numbers are on the second, and you might not want to benchmark intra-socket behavior.
Here is a short step-by-step instruction for how to add a machine-specific pinning order:
- First, see e.g. the output from
lscpu
andlstopo
(here we care about theP#<...>
numbers) to understand the hardware topology. - Then add an entry for your machine in common/Makefile.common. For example, copy the one for the
athena
machine (start withifeq ($(PC_NAME), athena)...
), but changeathena
in the aforementioned line to the name of your computer (see output ofuname -n
), and changeATHENA
in-DATHENA
to a similar identifier for your computer (to be used in include/utils.h). - Finally, add a matching entry to the aforementioned identifier in include/utils.h. This entry primarily defines the order in which to pin the software threads to hardware threads. Here again, you can look at the entry for
ATHENA
for inspiration. There are three memory layouts, but the default one (the bottom-most one) is the most important to add, which should pin a thread to each core in a socket, before continuing with SMT, and finally proceeding to the next socket. - For a simpler example, see e.g.ITHACA
. Now all tests will use this pinning order. You can validate pinning orders by not allocating all hardware threads, and inspecting the output fromhtop
during a test run.