-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathreport.tex
318 lines (270 loc) · 15.3 KB
/
report.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
\documentclass{article}
\usepackage{arxiv}
\usepackage[utf8]{inputenc} % allow utf-8 input
\usepackage[T1]{fontenc} % use 8-bit T1 fonts
\usepackage{hyperref} % hyperlinks
\usepackage{url} % simple URL typesetting
\usepackage{booktabs} % professional-quality tables
\usepackage{amsfonts} % blackboard math symbols
\usepackage{nicefrac} % compact symbols for 1/2, etc.
\usepackage{microtype} % microtypography
\usepackage{lipsum}
\usepackage{listings}
\usepackage{color}
\definecolor{dkgreen}{rgb}{0,0.6,0}
\definecolor{gray}{rgb}{0.5,0.5,0.5}
\definecolor{mauve}{rgb}{0.58,0,0.82}
\title{Smart Buffer Project Report}
\author{
Samantha (Yuhan) Deng
%% examples of more authors
\And
Kyrylo Chernyshov
%% \AND
%% Coauthor \\
%% Affiliation \\
%% Address \\
%% \texttt{email} \\
%% \And
%% Coauthor \\
%% Affiliation \\
%% Address \\
%% \texttt{email} \\
%% \And
%% Coauthor \\
%% Affiliation \\
%% Address \\
%% \texttt{email} \\
}
\begin{document}
\maketitle
% \begin{abstract}
% \lipsum[1]
% \end{abstract}
% keywords can be removed
% \keywords{First \and Second keyword \and More}
\section{Introduction}
In the Two-Phase Commit protocol, after a worker sends its decision of
abort/commit to the store, the worker needs to wait for the store's reply. In
other words, the worker is waiting for a message whose content it already knows.
The waiting is necessary to account for the asynchrony of the underlying
communication channel. However, assuming that the communication channel is
synchronous most of the time, the waiting is unnecessary most of the time.
If the worker does not block on the reply of the store, after sending out the
commit message for a transaction, the worker can start an new transaction that
depends on the previous transaction immediately. From the worker's point of
view, this is a 1.5-Phase Commit. However, the messages may arrive the store in
an order different from the worker sends them, and the store needs to sort out
the order of message arrival.
In order to enable the store to deal with out of order messages, we designed a
data structure called smart buffer on the store's side. In Section
\ref{sec:design}, we will present our detailed design of the data structure and
how the data structure is embedded in the framework. Section
\ref{sec:implementation} contains implementation of the data structure in a toy
example of worker-store setup and in Fabric. Finally, performance results are
shown in Section \ref{sec:performance}.
\section{Design}
\label{sec:design}
\subsection{Design Goal}
To illustrate the design goal, consider two scenarios.
\paragraph{Scenario 1} Given a worker $w$, a store $s$ and two transactions $t_1$
and $t_2$. Assume that $t_2$ depends on $t_1$, in other words, there exists some
object $o$ such that $t_1$ writes version $v$ of $o$ and $t_2$ reads version $v$
of $o$. If $w$ sends out commit message for $t_1$ and sends out prepare message
for $t_2$ without waiting for $s$'s reply for $t_1$, $s$ may receive the prepare
message of $t_2$ before receiving the commit message of $t_1$. Then, the store
needs to prepare for $t_2$ that reads $v$ of $o$, but it only has $v - 1$ of
$o$.
One option is to make $s$ abort the prepare for $t_2$. However, if $s$ receives
the messages in an order the same as $w$ sends them, the prepare of $t_2$ won't
fail because of the unknown version $v$ of $o$. Thus, instead of directly
aborting the prepare of $t_2$, $s$ should wait until it receives the commit
message of $t_1$ and updates $o$ to version $v$, and then start the prepare of
$t_2$. Thus, the data structure needs to allow the store block transaction
prepare until all unknown versions of objects of the transaction is learnt by
the store.
\paragraph{Scenario 2} Given a store $s$ and a transaction $t$. Assume that the
store needs a newer version of some object $o$ to start the prepare of $t$ and
$t$ also reads version $v$ of some other object $o'$. If $s$ already has version
$v'$ of $o'$, or $s$ commits version $v'$ of $o$ while waiting for the newer
version of $o$ and $v' > v$, the prepare of $t$ will always fail because of
version conflict. Then, the prepare of $t$ should be aborted immediately even if
the store $s$ has not seen the newer version of $o$.
Thus, the data structure needs to allow the store aborts pending transaction
prepare as soon as version conflicts happen.
\subsection{Smart Buffer Design}
The design of Smart Buffer is inspired by the dependency structure in COPS. The
dependencies of each transaction in Smart Buffer are versions of objects that
the transaction reads. Similar to COPS, when all dependencies are resolved, the
transaction is ready to be prepared.
Transactions that need some versions of objects unknown to
the store to be prepared are added to the Smart Buffer. Each transaction is mapped
to a set of objects with corresponding versions that the transaction depends on.
If all objects are known to the store, the transaction is ready to be prepared.
If a version conflict happens before all objects are known, the transaction needs
to be aborted.
The Smart Buffer interface has three methods: $\mathtt{add}$, $\mathtt{remove}$
and $\mathtt{delete}$.
\begin{itemize}
\item $\mathtt{add(tid, deps)}$: $\mathtt{tid}$ is the ID of a transaction.
$\mathtt{deps}$ is a set of objects with corresponding versions that the
transaction depends on (i.e. reads). The method adds the transaction with
$\mathtt{tid}$ and depends on $\mathtt{deps}$ to the buffer.
\item $\mathtt{remove(object)}$: $\mathtt{object}$ is some version of an
object. The method let the buffer know this version of an object is known to
the store.
\item $\mathtt{delete(tid)}$: $\mathtt{tid}$ is the ID of a transaction. The
method deletes the transaction with $tid$ from the buffer.
\end{itemize}
Buffer notifies the store when it is able to decide that a transaction is ready to be
prepared or a transaction needs to be aborted. Transactions are considered to be
pending if the buffer has not notified the store about its status.
\subsection{Smart Buffer's Usage in the Store}
When the store receives a transaction prepare message, it tries to prepare for
the transaction. If the prepare does not contain version conflict and has some
version of an object newer than the version the store has, the store adds the
transaction and the objects it read to the buffer.
When the store receives a transaction commit message, for each object the commit
updates, the store remove the object from the buffer.
When the buffer notify the store to prepare for some transaction, the store
starts the prepare for the transaction again and notifies the worker of the
result of the prepare. When the buffer notify the store to abort some
transaction, the store aborts the transaction and notifies the worker of the
result.
\subsection{Invariants}
\label{sec:inv}
For each object $o$, at most two versions of $o$ are depended on by some pending
transaction in Smart Buffer. Let the version of $o$ that the store holds to be $v$.
If a transaction wants to read version $v'$ of $o$ and $v' < v$, the transaction
has version conflicts and won't be added to the buffer. If a transaction wants to
read version $v'$ of $o$ and $v' \geq v + 2$, there exists a transaction $t$ that
reads $v + 1$ of $o$ and writes $v + 2$ of $o$ and $t$ has been prepared successfully.
Then, the store must have known $v + 1$ of $o$, which is a contradiction. Therefore,
at most two versions $o$ are depended on by some pending transaction in Smart Buffer,
and the two versions are $v$ and $v + 1$.
\subsection{Proof of Serializability}
Given transactions $t_1$ and $t_2$, assume that both $t_1$ and $t_2$ commits. If $t_2$
reads some object that $t_1$ writes, the smart buffer ensures that $t_2$ prepares
successfully only after the store has the version of the object that $t_1$ writes.
Store knows the version of the object that $t_1$ writes only after $t_1$ commits successfully.
Thus, $t_2$ prepares successfully after $t_1$ commits successfully. Since $t_2$'s prepare
must be completed before $t_2$ commits, $t_2$ commits successfully after $t_1$ commits
successfully. Therefore, all updates that $t_2$ perform happen after all updates that $t_1$ perform.
Therefore, for any two transactions $t_1$ and $t_2$ that completes successfully, if $t_2$
depends on $t_1$, $t_2$ happens after $t_1$. Hence, serializability is ensured.
\section{Implementation}
\label{sec:implementation}
\subsection{NumLinkBuffer}
NumLinkBuffer is an implementation of the SmartBuffer interface. It contains
$\mathtt{depsMap}$ which maps some version of an object known to the store
to the IDs of transactions that depend on that version and $\mathtt{unresolveddepsMap}$
which maps some version of an object unknown to the store to the IDs of
transactions that depend on that version. It also contains $\mathtt{numLink}$
which maps a transaction to the number of its dependencies unknown to the store.
Besides the invariant mentioned in Section \ref{sec:inv}, another invariant is
maintained in this implementation that a transaction is pending if and only if
it is in $\mathtt{numLink}$.
NumLinkBuffer also has locks for each object and for each transaction. These
locks are utilized to ensure proper ordering of updates of fields of the buffer.
\subsection{Testing Tool for 2PC}
In order to test the performance of the smart buffer in an environment easy to
tune and reason about, we implemented a toy model of 2PC. The model contains
stores, workers, transactions and locks similar to the actual ones in real world
distributed setting, except for that principals communicate through inter-thread
communication and function calls. Store objects don't contain any value - they
are just representations of objects on an abstract level; to that end, they
contain version numbers.
A smart buffer is associated with each store. The store interacts with the
buffer when processing transaction prepare and commit. The buffer tracks pending
transactions. When the status of a pending transaction is determined, the buffer
finishes the rest of the prepare phase, including grabbing locks on the store
side and communicating the prepare result to workers.
This toy model has several parameters that can be tuned to simulate different
real world settings, including number of stores, number of workers, number of
threads per worker, number of objects/writes/reads per transaction and network
delay between stores and workers. The toy model tracks the number of
transactions committed and the number of transactions aborted because of
different reasons. The toy model can be run using a CLI that allows you to
specify the values for the different parameters, and the time to run the model
for.
We also designed a script that can run the testing using parameters from a grid
specified by the user. This 2PC simulator can be used to test other 2PC protocol
optimization schemas.
The CLI has the following schema:
\begin{lstlisting}[
basicstyle=\small, %or \small or \footnotesize etc.
]
Usage: fbuffer [-h] [-verbose] [-O=<path>] [-objects=<objects>]
[-size=<txnSize>] [-storefile=<storefile>] [-stores=<stores>]
[-threads=<threads>] [-time=<runtime>]
[-workerfile=<workerfile>] [-workers=<workers>]
[-writeratio=<writes>]
-h, --help Print a synopsis of options.
-O=<path> Specify where to place output diagnostic files
(default: .)
-objects=<objects> Number of objects per store (default: 10000)
-size=<txnSize> Proportion of objects queried per transaction (default:
0.001)
-storefile=<storefile> File name for store benchmarks (default: stores.csv)
-stores=<stores> Number of stores (default: 1)
-threads=<threads> Number of threads per worker (default: 8)
-time=<runtime> Time to run the simulation for (default: 10000)
-verbose Print benchmark output to the console
-workerfile=<workerfile>
File name for worker benchmarks (default: workers.csv)
-workers=<workers> Number of workers (default: 1)
-writeratio=<writes> Proportion of queried objects that are writes (default:
0.1)
\end{lstlisting}
\subsection{Fabric}
We have also implemented NumLinkBuffer in Fabric. The buffer is associated with
each database, and the database interacts with the buffer when it performs
prepare, commit and abort.
To maintain the original structure of transaction handling in Fabric, the buffer
is restricted to minimal functionality. The smart buffer only tracks unresolved
dependencies of transactions. The status of pending transaction is returned to
the store by attaching a $\mathtt{Future}$ to
$\mathtt{TransactionPrepareFailedException}$. The store is responsible for
contacting the workers or starting the prepare again depending on the status
returned.
\section{Performance}
\label{sec:performance}
\subsection{Sources of Overhead}
To track dependencies and version conflicts of pending transactions, the current
design of smart buffer needs to store all objects a pending transaction reads.
For each worker, the number of pending transaction is less than or equal to the
number of transactions a worker can start concurrently. Thus, for each store,
the space overhead of the smart buffer is $O($ \# of workers * \# of transactions
each worker can start * \# of objects a transaction reads$)$.
The $\mathtt{add}$ method takes $O($ \# of objects the transaction reads$)$. The
$\mathtt{remove}$ method removes every transaction in the buffer that depends on
an object. The maximum number of transactions depending on an unknown version of
an object is the number of transactions each worker can start. Thus,
$\mathtt{remove}$ takes $O($\#of transactions each worker can start$)$.
A simpler way to handle unknown objects is message passing. The store sends a
message to the worker when it sees some unknown object and notifies the worker
again when it receives the commit message about that unknown object. Since the
smart buffer handles unknown objects locallhy, we expect the smart buffer
outperforms this message passing solution as network latency becomes larger.
\subsection{Testing}
To perform testing, we designed a random transaction generator in our 2PC model.
Each worker is associated with a transaction generator, and the transaction
generator feeds transactions randomly generated based on parameters set by the
user to the worker.
So far, we have a rather limited data set (we are still working on the test
script), but there are some trends in the data. The total number of completed
transactions decreased when any of the current parameters are increased. The
total number of transactions aborted by the stores increased in general as the
number of workers and stores increased. There may be some unnecessary overhead
in our current implementation, so we will try to optimize it. In general, a lot
more transactions were aborted because of a version conflict than a locking
conflict (on the store side).
We found that when the home-store and remote-store delays are both 0, the
original 2PC algorithm performs better than our version. This is most likely
because the overhead of the buffer is bigger than the benefit of having it. Our
testing so far suggests that increasing the store delays will reverse this, as
the buffer's benefit increases due to the increased cost of having synchronous
transactions. We have not had the opportunity to confirm this with extensive
stress testing (in the same way as we tested without any delays), but this will
likely be fairly straightforward.
\end{document}