-
Notifications
You must be signed in to change notification settings - Fork 12
/
README
60 lines (50 loc) · 3 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
What is Consus?
===============
Consus is a geo-replicated transactional key-value store that upholds strong
consistency and fault tolerance guarantees across multiple data centers. By
geo-replicating data, Consus can can withstand correlated failures up to and
including entire data centers, and reduce latency for clients by directing
them to nearby replicas.
The latency between geographically distinct locations forces storage systems
to navigate the inherent tradeoff between latency and fault tolerance. Systems
may make an operation withstand a complete data center failure by incurring
the latency cost to propagate it to other data centers before reporting that
the operation has finished. On the other side of the tradeoff systems, may
avoid the latency cost by reporting that an operation is complete before it
propagates to other data centers---at the risk that the operation is lost in a
failure and never takes effect in other data centers. In essence, the tradeoff
is entirely a matter of minimizing latency while upholding desirable degree of
fault tolerance. Consus chooses to maintain fault tolerance at all times.
What makes Consus unique?
=========================
For systems that make cross-data center fault tolerance guarantees, wide-area
latency is often the dominating cost. An operation's overall execution time is
heavily dependent upon the number of messages sent via the wide area and the
latency of each message; consequently, reducing the number of messages on the
critical path is an important aspect of optimizing the overall performance of
geo-replicated systems.
Consus can commit a transaction across multiple data centers in three
wide-area message delays during regular execution. Simply sending a message to
a remote data center and receiving acknowledgement of its receipt---the bare
minimum necessary to tolerate a data center failure---requires two message
delays. Protocols such as 2-phase commit or Paxos require two round trips, or
four message delays.
What is the state of Consus?
============================
Consus was the capstone of my Ph.D. thesis. I have since graduated and have
other commitments that take me away from maintaining the project. Anyone
trying to use the code should beware the following:
- There is a deadlock bug that shows up under contending concurrency that I
never got a chance to track down fully. The remedy will likely involve
changing both the key-value store and the transaction manager to make sure
the wound-wait protocol works correctly.
- I probably would not want to run this system in production without
significant further testing. The generalized paxos pieces are the most
compex and, despite being the focus of most of my testing, probably not
something anybody would want to debug at 3am.
More Resources
==============
- http://consus.io The consus homepage
- http://github.com/rescrv/consus-releng Repository for building and testing
Consus on a variety of platforms
- http://github.com/rescrv/consus.io Source for the consus.io homepage