Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lock service #315

Open
wants to merge 17 commits into
base: master
Choose a base branch
from
98 changes: 98 additions & 0 deletions doc/design/LockService.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
= Lock Service Design
Zhang Yifei <[email protected]>

Lock service maintains the holder and waiters of a specified lockId, lockId could be seen as the identity of a lock,
and a lock could have only one holder and multiple waiters at same time. +
The waiters will be queued, the first waiter could be changed to holder by the unlocking operation of last holder.

== Holder Identity
The identity of a holder or waiter has to be a member of the RAFT cluster, because of there are server-initiated
messages. currently the clients are stateless to the server after reply, so there is no way to send the server-initiated
message to the client. +
I have considered to create a new protocol to maintain sessions for clients, but it will be a lot of work to do, for
example, the session's creation and destruction needs to be recorded in the RAFT log, sessions needs to be available
in the new leader if leadership changed, and the client actually keeps connections to all members. +
Holders and waiters in server are represented by the address(UUID) of the channel, the advantage of doing so is
the server can clear those disconnected holders and waiters base on the view of the cluster.

== Holding Status
The holding status is only for connected members. Disconnected members can assume that they have released all locks,
because the leader of the cluster will clear those leaving members from the locking status when the view change event
arrived. +
For the partition, members are in a minority subgroup will also being cleared by the leader if majority subgroup still
present, if all subgroups are minority, the new elected leader will force clear all previous locking status after cluster
resumed. A new started cluster will clear all previous locking status as well. +
Since the locking status has the same lifecycle as the cluster, the log storage could be in-memory implementation.

== Waiting Status
Waiting status is treated the same as holding status in the case of disconnection and partitioning.
The tricky part is how to let the waiter know that it has become the holder, this is the server-initiated message
mentioned earlier. As members of the cluster, leader can send messages to any lock service, but in what way?
Those messages must be in order and can't be lost or duplicated, assume a dedicated message to do this, leader will
send them after logs are applied, and the sending process could be async, what if the leader left, the new leader can't
ensure those messages are not lost or duplicated. +
Base on the log applying process of each member is a reliable choice, although it's not perfect.

== Commands
LOCK::
With the UUID of the member and the lockId. Hold the lock if possible, otherwise join the waiting queue.
TRY_LOCK::
With the UUID of the member and the lockId. Hold the lock if possible.
UNLOCK::
With the UUID of the member and the lockId. If the member is the holder then remove it from the holder status,
and make the first waiter to be the next holder, if the member is a waiter then remove it from the waiting queue.
UNLOCK_ALL::
With the UUID of the member. Remove the member from all holding and waiting status.
RESET::
With the UUIDs of members that currently connected. Check all holds and waiters if it's in the list,
if not then remove it from all holding and waiting status, notice the waiter that being promoted to the holder during
unlocking should be in the list as well. It's an internal command, it's not exposed to users.
QUERY::
With the UUID of the member and the lockId. It's a read-only command that returns the current lock status.

=== Reset
Members will resign from holder and waiter status when it's disconnected or in a minority subgroup of partition, it
notifies listeners that it has unlocked from all locks, but in the state machine, unlocking hasn't really happened yet.
Unlocking will happen immediately by reset command if the leader still present, or happened eventually after a new
leader present. +
There are two types reset, one is to reset with the list of current members, and another one is to reset with an empty
list which means all state will be cleared. +
The first one is used when the leader found members leaving or a new leader is elected because of previous leader
leaving. +
The second one is used when a new leader is elected and not because of the previous leader leaving. +

Scenarios for electing a new leader::
. Majority is just reached.
.. New member connected
.. Disconnected member reconnected
.. Merging views (no subgroup has majority members)
. Leader leave, and majority still there.
. There is a leader in majority subgroup, but view merging cause the coordinator changed and the new coordinator started
a new term voting before knowing the existence of leader.
.. The new coordinator is elected to be the new leader.
.. The existing leader is re-elected to be the leader of next term.

Above scenario 1 will reset to empty, because potentially all members have resigned. +
Above scenario 2 will reset to current members, because the cluster has majority members all the time, these members
won't resign. +
Scenario 3.1 won't happen I think, because the existing leader will always have longer log because of the reset
command.
Scenario 3.2 will reset to current members.

== Listener
Listeners could be registered to listen on the status change of locks. In the leader node, listeners are notified by
the RAFT working thread, and in followers, it will be notified by the thread that delivered the response message.

== Mutex
With the lock service and the ReentrantLock could implement an exclusive lock cross JVMs.

=== Command executing
The mutex's methods involve executing commands in the lock service, RaftException will be thrown when the command fails
to execute. +
The command executing process is uninterruptible to avoid the inconsistent state, but a timeout could be set to control
the waiting time.

=== Unexpected status
Many factors can cause unexpected unlocking or locking status, for example, disconnect the channel, network partition,
even calling the lock service with the same lockId, so handlers could be set to handle the unexpected status, let users
know the risks and decide how to deal with them, the RaftException also comes from the same idea.
Loading
Loading