API Reference · Benchmarks · Stdlib Benchmarks
This repository is a collection of parallelism-safe data structures for OCaml 5. They are contained in two packages:
- Saturn that includes all data structures (including the lock-free ones) and should be used by default if you just want parallelism-safe data structures;
- Saturn_lockfree that includes only lock-free data structures.
It aims to provide an industrial-strength, well-tested (and possibly model-checked and verified in the future), well documented, and maintained parallelism-safe data structure library. We want to make it easier for Multicore OCaml users to find the right data structures for their uses.
Saturn is published on opam and is distributed under the ISC license.
- Saturn — Parallelism-Safe Data Structures for Multicore OCaml
- Contents
- Installation
- Introduction
- Usage
- Testing
- Benchmarks
You'll need OCaml 5.0.0 or later. Note that Saturn also works with OCaml 4.14 but only for compatibility reasons, as there is no need for parallelism-safe data structures without OCaml 5.0.
To install OCaml 5.0 yourself, first make sure you have opam 2.1 or later. You can run this command to check:
opam --version
Then use opam to install OCaml 5.0.0:
opam switch create 5.0.0
If you want a later version, you can run the following line to get a list of all available compiler versions:
opam switch list-available
saturn
can be installed from opam
:
opam install saturn
or
opam install saturn_lockfree
if you prefer to use only lock-free data structures.
Name | Module in Saturn (in Saturn_lockfree ) |
Description | Sources |
---|---|---|---|
Treiber Stack | Stack (same) |
A classic multi-producer multi-consumer stack, robust and flexible. Recommended starting point when needing a LIFO structure | |
Michael-Scott Queue | Queue (same) |
A classic multi-producer multi-consumer queue, robust and flexible. Recommended starting point when needing a FIFO structure. | Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms |
Chase-Lev Work-Stealing Dequeue | Work_stealing_deque (same) |
Single-producer, multi-consumer, dynamic-size, double-ended queue (deque). Ideal for throughput-focused scheduling using per-core work distribution. Note, pop and steal follow different ordering (respectively LIFO and FIFO) and have different linearisation contraints. |
Dynamic Circular Work-Stealing Deque and Correct and Efficient Work-Stealing for Weak Memory Models) |
SPSC Queue | Single_prod_single_ cons_queue (same) |
Simple single-producer single-consumer fixed-size queue. Thread-safe as long as at most one thread acts as producer and at most one as consumer at any single point in time. | |
MPMC Bounded Relaxed Queue | Relaxed_queue (same) |
Multi-producer, multi-consumer, fixed-size relaxed queue. Optimised for high number of threads. Not strictly FIFO. Note, it exposes two interfaces: a lockfree and a non-lockfree (albeit more practical) one. See the mli for details. |
|
MPSC Queue | Single_consumer_queue (same) |
A multi-producer, single-consumer, thread-safe queue without support for cancellation. This makes a good data structure for a scheduler's run queue. It is used in Eio. | It is a single consumer version of the queue described in Implementing Lock-Free Queues. |
The following part is a beginner-friendly example to explain why we need data structures specifically designed for multicore programming.
Let's write a classic mutable data structure : a queue. We are going to do a basic implementation the way it may be done for sequential use and show why it is not working well with multiple domains.
type queue = int list ref
let create () : queue = ref []
let push q a = q := a :: !q
What happens if we try to use this queue with multiple domains? First, let's
define the work we want a single domain to do: each domain will push 10 times
its id
in the queue.
let work id q = for i = 0 to 9 do push q id done
Then let's define our test : it spawns 2 domains that each execute work
in
parallel. test
returns the content of the queue as well as its length, so we
can easily see if it contains the 20 elements we expect.
let test () =
let q = create () in
let domainA = Domain.spawn (fun () -> work 1 q) in
let domainB = Domain.spawn (fun () -> work 2 q) in
Domain.join domainA;
Domain.join domainB;
(List.length !q, !q)
Let's try it :
# test ()
- : int * int list =
(20, [2; 2; 2; 2; 2; 2; 2; 2; 2; 2; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1])
Everything seems fine, right? Except, it is not running in parallel, as we can
see from the consecutive 2
(pushed by domainB
) and 1
pushed by domainA
.
This is because spawning a domain takes way more time than executing work
, so
domainA
is long finished before domainB
is even spawned. One way to overpass
this issue is to increase the amount of work done in work
(for example, by
pushing more elements). Another way is to make sure the domains wait for each
other before beginning their workload.
We use a basic barrier implementation to do that. Each domain will now wait for the other to reach the barrier before beginning to push.
let work_par id barrier q =
Barrier.await barrier;
for i = 0 to 9 do
push q id
done
The test
function is now:
let test_par () =
let barrier = Barrier.create 2 in
let q = create () in
let domainA = Domain.spawn (fun () -> work_par 1 barrier q) in
let domainB = Domain.spawn (fun () -> work_par 2 barrier q) in
Domain.join domainA;
Domain.join domainB;
(List.length !q, !q)
Let's run it:
# test_par ();;
- : int * int list =
(18, [2; 1; 2; 1; 2; 1; 2; 1; 2; 1; 2; 2; 1; 1; 1; 1; 2; 1])
Now, the 1
and the 2
are interleaved: domains are running in parallel. The
resulting queue however only contains 18
elements whereas 20
were pushed.
This is because we not only have a
race condition here but also
because push
is a non-atomic operation. It requires first to read the content
of the queue (!q
) then to write in it (q := ...
). So, for example, when two
domains try to push in parallel into an empty queue the following sequence can
happen:
- domain A reads the queue : it is empty
- domain B reads the queue : it is still empty
- domain A pushes
1
on the empty queue it has read before - domain B pushes
2
on the empty queue it has read before.
This sequence results in a queue containing only one element of value 2
. The
element pushed by A is lost because B did not see it.
This is a very common issue in parallel programming. To prevent it, functions need to be atomically consistent (aka linearisable), meaning they must have a linearisation point at which they appear to occur instantly. Such functions can be written with different techniques, including:
However both solutions have their limits. Using mutexes or locks open the way to deadlock, livelock, priority inversion, etc; it also often restrics considerably the performance gained by using multiple cores as the parts of the code effectively running in parallel is limited. On the other hand, atomics are - without a complex algorithm to combine them - only a solution for a single shared variable.
Let's try to replace references by atomics in our code to demonstrate this point:
type queue = int list Atomic.t
let create () : queue = Atomic.make []
let push (q : queue) a =
let curr = Atomic.get q in
let prev = a :: curr in
Atomic.set q prev
We still need to read and write to fulfill the whole push operation.
# test ();;
- : int * int list = (15, [1; 1; 1; 1; 1; 2; 1; 1; 1; 2; 2; 2; 2; 1; 2])
and, as expected it is not working. The interleaving scenario described
previously can still happen, meaning our function is not linearisable (or
atomically consistent). Note that, though it is not observable here, this is
still better than the previous implementation, as we are now race-free (see
here for a quick note about races in OCaml). As
a matter of fact, writting a queue with push
and pop
functions that are both
atomic and parallelism-safe is not as easy as it might sound and often requires
advanced techniques to perform well. This is the type of algorithms
Saturn_lockfree provided.
To continue with our example, here is how it will be written using the queue provided in Saturn.
let work_saturn id barrier q () =
Barrier.await barrier;
for i = 0 to 9 do
Saturn.Queue.push q id
done
let test_saturn () =
let barrier = Barrier.create 2 in
let q = Saturn.Queue.create () in
let d1 = Domain.spawn (work_saturn 1 barrier q) in
let d2 = Domain.spawn (work_saturn 2 barrier q) in
Domain.join d1;
Domain.join d2;
let rec pop_all acc =
match Saturn.Queue.pop q with
| None -> List.rev acc
| Some elt -> pop_all (elt :: acc)
in
let res = pop_all [] in
(List.length res, res)
Running it results in the expected result:
# test_saturn ();;
- : int * int list =
(20, [2; 2; 1; 2; 2; 2; 2; 2; 2; 1; 2; 1; 2; 1; 1; 1; 1; 1; 1; 1])
Because of the great properties of OCaml 5 memory model (see the OCaml Manual for more details), not a lot can go wrong here. At least, data corruption or segmentation fault won't happen like it can in other languages.
Some data structures are available in two versions: a normal version and a more
optimized but unsafe version. The unsafe version utilizes Obj.magic
in
a way that may be unsafe with flambda2
optimizations.
The reason for providing the unsafe version is that certain optimizations require features that are currently not available in OCaml, such as arrays of atomics or atomic fields in records. We recommend using the normal version of a data structure unless its performance is not sufficient for your use case. In that case, you can try the unsafe version.
Currently, there are two data structures with an unsafe version:
Single_cons_single_prod_unsafe
: a single consumer single producer queueQueue_unsafe
: a Michael Scott queue
This part describes how to use the provided data structures, and more exactly, what not to do with them. Two main points are discussed:
- some data structures have restrictions on what operations can be performed in a single domain or a set of domains
- the currently provided data structures are non-composable
There are several provided data structures that are intended to be used in combination with a specific domain configuration. These restrictions make the corresponding implementation optimized but not respected them may break safety properties. Obviously, these restrictions are not only described in the documentation but also on the name of the data structure itself. For example, a single consumer queue can only have a single domain popping at any given time.
Let's take the example of Single_prod_single_cons_queue
. As suggested by the
name, it should be used with only one domain performing push
(a producer) and
one domain performing pop
(a consumer) at the same time. Having two or more
domains simultaneously perform pop
(or push
) will break the safety
properties of the queue
and more likely result in unexpected behaviors.
Let's say we give a bad alias to this queue and misuse it.
module Queue = Saturn.Single_prod_single_cons_queue
Each domain is going to try to push 10 times in parallel.
let work id barrier q =
Barrier.await barrier;
for i = 0 to 9 do
Queue.push q id
done
Our test
function returns the queue after two domains try to simustaneously
push.
let test () =
let q = Queue.create ~size_exponent:5 in
let barrier = Barrier.create 2 in
let d1 = Domain.spawn (fun () -> work 1 barrier q) in
let d2 = Domain.spawn (fun () -> work 2 barrier q) in
Domain.join d1;
Domain.join d2;
q
We can then inspect the content of the queue by popping it into a list.
let get_content q =
let rec loop acc =
match Queue.pop q with
| None -> acc
| Some a -> loop (a :: acc)
in
loop [] |> List.rev
Let's run it :
test () |> get_content;;
- : int list = [2; 1; 1; 1; 1; 1; 1; 1; 1; 1; 2]
This run results in a queue with only 11 elements. 9 elements are lost because of the misuse of the single consumer single producer queue.
Composability is the ability to compose functions while preserving their properties. For Saturn data structures, the properties one could expect to preserve are atomic consistency (or linearizability) and all eventual progress properties, like lock freedom. Unfortunately, Saturn's data structures are not composable.
Let's illustrate that with an example. We want to write a slitting algorithm on
Saturn's queue: several domains simultaneously slit a source queue into two
destination queues in respect to a predicate. We expect our splitting function
to be linearisable, which would manifest here by the source queue order is
preserved in the destination queues. For example, slit [0;1;2;3;4]
, with a
predicate that returns true
for even numbers and false
otherwise, should
returns [0';2;4]
and [1;3]
.
Here is how we can write slit
with the functions provided by Saturn's queue.
let slit source pred true_dest false_dest : bool =
match Queue.pop source with
| None -> false
| Some elt ->
if pred elt then Queue.push true_dest elt else Queue.push false_dest elt;
true
Domains run split
until the source queue is empty:
let work source pred true_dest false_dest =
while split source pred true_dest false_dest do
()
done
Now to test it, we will run:
let test input =
(* Initialisation *)
let true_dest = Queue.create () in
let false_dest = Queue.create () in
let source = Queue.create () in
List.iter (Queue.push source) input;
let barrier = Barrier.create 2 in
(* Predicate : split by parity *)
let pred elt = elt mod 2 = 0 in
let d1 =
Domain.spawn (fun () ->
Barrier.await barrier;
work source pred true_dest false_dest)
in
let d2 =
Domain.spawn (fun () ->
Barrier.await barrier;
work source pred true_dest false_dest)
in
Domain.join d1;
Domain.join d2;
(get_content true_dest, get_content false_dest)
The expected result for test [0; 1; 2; 3; 4]
is ([0; 2; 4], [1; 3])
. And if
you try it, you will most probably get that result. Except it can also return in
unsorted queues.
As the chance of getting an unsorted queue, we write a check
function that
runs test
multiple times and counts the number of times the result is not what
we wanted.
let check inputs max_round =
let expected_even = List.filter (fun elt -> elt mod 2 = 0) inputs in
let expected_odd = List.filter (fun elt -> elt mod 2 = 1) inputs in
let rec loop round bugged =
let even, odd = test inputs in
if round >= max_round then bugged
else if even <> expected_even || odd <> expected_odd then
loop (round + 1) (bugged + 1)
else loop (round + 1) bugged
in
Format.printf "%d/%d rounds are bugged.@." (loop 0 0) max_round
and try it:
# check [0;1;2;3;4;5;6] 1000;;
35/1000 rounds are bugged.
As expected, it is not working, and the reason is simply because our split
function is not linerisable. We could make it atomic by using mutex, but then we
loose the progress properties of the composed functions.
Note that in the case above, we transfer from and to a queue of the same
int Saturn.Queue.t
type. It is most likely possible to write a
val transfer : t -> t -> unit
function with the right properties and add it
directly to Saturn.Queue
module.
If you think of any such functions, that is useful and missing, let's us know by creating an issue!
If you need composable parallelism-safe data structures, you can check kcas_data.
One of the many difficulties of implementating parallelism-safe data structures is that in addition to providing the same safety properties as sequental ones, they may also have to observe some liveness properties as well as additional safety properties specific to concurrent programming, like deadlock-freedom.
In addition to the expected safety properties, the main properties we want to test for are:
- linearisability
- lock-freedom for all the lock-free data structures
- no potentially harmful data races
Here is a list of the tools we use to ensure them:
- safety : unitary tests and
qcheck
tests check semantics and expected behaviors with one and more domains. - safety and liveness :
STM
tests check linearisability for two domains (seemulticoretests
library). - liveness :
dscheck
checks non-blocking property for as many domains as wanted (for two domains most of the time). See dscheck. - safety : no data race with tsan
See test/README.md for more details.
There are a number of benchmarks in bench
directory. You can run them with
make bench
. See bench/README.md for more details.
Contributions are appreciated! If you intend to add a new data structure, please read this before.